1830 ---- Library Use of Web-based Research Guides Jimmy Ghaphery and Erin White INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2012 21 ABSTRACT This paper describes the ways in which libraries are currently implementing and managing web- based research guides (a.k.a. Pathfinders, LibGuides, Subject Guides, etc.) by examining two sets of data from the spring of 2011. One set of data was compiled by visiting the websites of ninety-nine American university ARL libraries and recording the characteristics of each site’s research guides. The other set of data is based on an online survey of librarians about the ways in which their libraries implement and maintain research guides. In conclusion, a discussion follows that includes implications for the library technology community. SELECTED LITERATURE REVIEW While there has been significant research on library research guides, there has not been a recent survey either of the overall landscape or of librarian attitudes and practices. There has been recent work on the efficacy of research guides as well as strategies for their promotion. There is still work to be done on developing a strong return on investment metric for research guides, although the same could probably be said for other library technologies including websites, digital collections, and institutional repositories. Subject-based research guides have a long history in libraries that predates the web as a service- delivery mechanism. A literature-review article from 2007 found that research on the subject gained momentum around 1996 with the advent of electronic research guides, and that there was a need for more user-centric testing.1 By the mid-2000s, it was rare to find a library that did not offer research guides through its website.2 The format of guides has certainly shifted over time to database-driven efforts through local library programming and commercial offerings. A number of other articles start to answer some of the questions about usability posed in the 2007 literature review by Vileno. In 2008, Grays, Del Bosque, and Costello used virtual focus groups as a test bed for guide evaluation.3 Two articles from the August 2010 issue of the Journal of Library Administration contain excellent literature reviews and look toward marketing, assessment, and best practices.4 Also in 2010, Vileno followed up on the 2007 literature review with usability testing that pointed toward a number of areas in which users experienced difficulties with research guides.5 Jimmy Ghaphery (jghapher@vcu.edu) is Head, Library Information Systems and Erin White (erwhite@vcu.edu) is Web Systems Librarian, Virginia Commonwealth University Libraries, Richmond, VA. mailto:jghapher@vcu.edu LIBRARY USE OF WEB-BASED RESEARCH GUIDES | GHAPHERY AND WHITE 22 In terms of cross-library studies, an interesting collaboration in 2008 between Cornell and Princeton Universities found that students, faculty, and librarians perceived value in research guides, but that their qualitative comments and content analysis of the guides themselves indicated a need for more compelling and effective features.6 The work of Morris and Grimes from 1999 should also be mentioned; the authors surveyed 53 university libraries, finding that it was rare to find a library with formal management policies for their research guides.7 Most recently, LibGuides has emerged as a leader in this arena, offering a popular software-as-a- service (SAAS) model and as such is not yet heavily represented in the literature. A multichapter LibGuides LITA guide is pending publication and will cover such topics as implementing and managing LibGuides, setting standards for training and design, and creating and managing guides. ARL GUIDES LANDSCAPE During the week of March 3rd, 2011, the authors visited the websites of 99 American university ARL libraries to determine the prevalence and general characteristics of their subject-based research guides. In general, the visits reinforced the overarching theme within the literature that subject-based research guides are a core component of academic library web services. All 99 libraries offered research guides that were easy to find from the library home page. LibGuides was very prominent as a platform, in production at 67 of the 99 libraries. Among these, it appeared that at least 5 libraries were in the process of migrating from a previous system (either a homegrown, database-driven site or static HTML pages) to LibGuides. In addition to the presence and platform, the authors recorded additional information about the scope and breadth of each site’s research guides. For each site, the presence of course-based research guides was recorded. In some cases the course guides had a separate listing, whereas in others they were intermingled with the subject-based research guides. Course guides were found on 75 of the 99 libraries visited. Of these, 63 were also LibGuides sites. It is certainly possible that course guides are being deployed at some of the other libraries but were not immediately visible in visiting the websites, or that course guides may be deployed through a course management system. Nonetheless, it appears that the use of LibGuides encourages the presence of public-facing course guides. Qualitatively, there was wide diversity of how course guides were organized and presented, varying from a simple A-to-Z listing of all guides to separately curated landing pages specifically organized by discipline. The number of guides was recorded for each LibGuides site. It was possible to append “/browse.php?o=a” to the base URL to determine how many guides and authors were published at each site. This PHP extension was the publicly available listing of all guides on each LibGuides platform. The “/browse.php?o=a” extension no longer publicly reports these statistics; however, findings could be reproduced by manually counting the number of guides and authors on each site. The authors confirmed the validity of this method in the fall of 2011 by revisiting four sites and finding that the numbers derived from manual counting were in line with the previous findings. Of INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2012 23 the 63 LibGuides sites we observed, a total of 14,522 guides were counted from 2,101 authors for an average of 7 guides per author. On average, each site had 220 guides from 32 authors (median of 179 guides; 29 authors). At the high end of the scale, one site had 713 guides from 46 authors. Based on the volume observed, libraries appear to be investing significant time toward the creation, and presumably the maintenance, of this content. In addition to creation and ongoing maintenance, such long lists of topics raise a number of usability issues that libraries will also be wise to keep in mind.8 SURVEY The literature review and website visits call out two strong trends: 1. Research guides are as commonplace as books in libraries, 2. LibGuides is the elephant in the room, so much so that it is hard to discuss research guides without discussing LibGuides. Based on preliminary findings from the literature review and survey, we looked to further describe how libraries are supporting, innovating, implementing, and evaluating their research guides. A ten-question survey was designed to better understand how research guides sit within the cultural environment of libraries. It was distributed to a number of professional discussion lists the week of April 19, 2011 (see appendix). The following lists were used in an attempt to get a balance of opinion from populations of both technical and public services librarians: code4lib, web4lib, lita-l, lib-ref-l, and ili-l. The survey was made available for two weeks following the list announcements. Survey response was very strong, with 198 responses (188 libraries) received without the benefit of any follow-up recruitment. Ten institutions submitted more than one response. In these cases only the first response was included for analysis. We did not complete a response for our own institution. The vast majority (155, 82%) of respondents were from college or university libraries. Of the remaining 33, 24 (13%) were from community college libraries, with only 9 (5%) identifying themselves as public, school, private, or governmental. Among the college and university libraries, 17 (9%) identified themselves as members of the ARL, which comprises 126 members.9 In terms of “what system best describes your research guides by subject?” the results were similar to the survey of ARL websites. Most libraries (129, 69%) reported LibGuides as their system, followed by “customized open source system” and “static HTML pages,” both at 20 responses (11% each). Sixteen libraries (9%) reported using a homegrown system, with three libraries (2%) reporting “other commercial system.” In terms of initiating and maintaining a guides system, much of the work within libraries seems to be happening outside of library systems departments. When asked which statement best described who selected the guides system, 67 respondents (36%) indicated their library research LIBRARY USE OF WEB-BASED RESEARCH GUIDES | GHAPHERY AND WHITE 24 guides were “initiated by Public Services,” followed closely by “more of a library-wide initiative” at 63 responses (34%). In the middle at 34 responses (18%) was “initiated by an informal cross- departmental group.” Only 10 respondents (5%) selected “initiated by Systems,” with the top down approach of “initiated by Administration” gathering 14 responses (7%). When narrowing the responses to those sites that are using LibGuides or Campus Guides, the portrait is not terribly different, with 36% library-wide, 35% Public Services, 18% informal cross-departmental, 7% Administration, and Systems trailing at 4%. Likewise there was not a strong indication of library systems involvement in maintaining or supporting research guides. Sixty-nine responses (37%) indicated “no ongoing involvement” and an additional 35 (19%) indicated “N/A we do not have a Systems Department.” There were only 21 responses (11%) stating “considerable ongoing involvement,” with the balance of 63 responses (34%) for “some ongoing involvement.” Not surprisingly, there was a correlation between the type of research guide and the amount of systems involvement. For sites running a “customized open source system,” “other commercial system,” or “homegrown system,” at least 80% of responses indicated either “considerable” or “some” ongoing systems involvement. In contrast, 37% of sites running LibGuides or CampusGuides indicated “considerable” or “some” technical involvement. Further, the LibGuides and CampusGuides users recorded the highest percentage (43%) of “no ongoing involvement” compared to 37% of all respondents. Interestingly, 20% of LibGuides and Campus Guides users answered “N/A we do not have a Systems Department,” which is not significantly higher than all respondents for this question at 19%. The level of interaction between research guides and enterprise library systems was not reported as strong. When asked “which statement best describes the relationship between your web content management system and your research guides?” 112 responses (60%) indicated that “our content management system is independent of our research guides” with an additional 51 responses (27%) indicating that they did not have a content management system (CMS). Only 12 respondents (6%) said that their CMS was integrated with their research guides with a remaining 13 (7%) saying that their CMS was used for “both our website and our research guides.” A similar portrait was found in seeking out the relationship between research guides and discovery/federated search tools. When asked “which statement best describes the relationship between your discovery/federated search tool and your research guides?” roughly half of the respondents (96, 51%) did not have a discovery system (“N/A we do not have a discovery tool”). Only 12 respondents (6%) selected “we prominently feature our discovery tool on our guides,” whereas more than double that number, 26 (14%), said “we typically do not include our discovery tool on our guides.” Fifty four respondents (29%) took the middle path of “our discovery tool is one of many search options we feature on our guides.” In the case of both discovery systems and content management systems, it seems that research guides are typically not deeply integrated. When asked “what other type of content do you host on your research guides system?” respondents selected from a list of choices as reflected in table 1. INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2012 25 Answer Total Percent LibGuides/CampusGuides Course pages 127 68% 74% “How to” instruction 123 65% 77% Alphabetical list of all databases 76 40% 42% “About the library” information (for example hours, directions, staff directory, event) 59 31% 35% Digital collections 34 18% 19% Everything—we use the research guide platform as our website 16 9% 9% None of the above 17 9% 2% Table 1. Other Types of Content Hosted on Research Guides System These answers reinforce the portrait of integration within the larger library web presence. While the research guides platform is an important part of that presence, significant content is also being managed by libraries through other systems. It is also consistent with the findings from the ARL website visits, where course pages were consistently found within the research guides platform. For sites reporting LibGuides or CampusGuides as their platform, inclusion of course pages and how-to instruction was even higher, at 74% and 77%, respectively. Another multi-answer question sought to determine what types of policies are being used by libraries for the management of research guides: “which of the following procedures or policies do you have in place for your research guides?” Responses are summarized in table 2. LIBRARY USE OF WEB-BASED RESEARCH GUIDES | GHAPHERY AND WHITE 26 Answer Total Percent Percent using LibGuides/CampusGuides Style guides for consistent presentation 105 56 58 Maintenance and upkeep of guides 94 50 53 Link checking 87 46 50 Required elements such as contact information, chat, pictures, etc. 78 41 56 Training for guide creators 73 39 43 Transfer of guides to another author due to separation or change in duties 72 38 41 Defined scope of appropriate content 43 23 22 Allowing and/or moderating user tags, comments, ratings 36 19 25 None of the above 36 19 19 Controlled vocabulary/tagging system for managing guides 23 12 25 Table 2. Management Policies/Procedures for Research Guides While nearly one in five libraries reported none of the policies in place at all, the responses indicate that there is effort being applied toward the management of these systems. The highest percentage for any given policy was 56% for “style guides for consistent presentation.” Best practices in these areas could be emerging or many of these policies could be specific to individual library needs. As with the survey question on content, the research-guides platform also has a role with the LibGuides and CampusGuides users reporting much higher rates of policies for “controlled vocabulary/tagging” (25% vs. 12%) and “required elements” (56% vs. 41%). In both INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2012 27 of these cases, it is likely that the need for policies arise from the availability of these features and options that may not be present in other systems. Based on this supposition, it is somewhat surprising that the LibGuides and CampusGuides sites reported the same lack of policy adoption (none of the above; 19%). The final question in the survey further explored the management posture for research guides by asking a free-text question: “how do you evaluate the success or failure of your research guides?” Results were compiled into a spreadsheet. The authors used inductive coding to find themes and perform a basic data analysis on the responses, including a tally of which evaluation methods were used and how often. One in five institutions (37 respondents, 19.6%) looked only to usage stats, while seven respondents (4%) indicated that their library had performed usability testing as part of the evaluation. Forty-our respondents (23.4%) said they had no evaluation method in place (“Ouch! It hurts to write that.”), though many expressed an interest or plans to begin evaluation. Another emerging theme included ten respondents who quantified success in terms of library adoption and ease of use. This included one respondent who had adopted LibGuides in light of prohibitive IT regulations (“We choose LibGuides because IT would not allow us to create class specific research webpages”). Several institutions also expressed frustration with the survey instrument because they were in the process of moving from one guides system to another and were not sure how to address many questions. Most responses indicated that there are more questions than answers regarding the efficacy of their research guides, though the general sentiment toward the idea of guides was positive, with words such as “positive,” “easy,” “like,” and “love” appearing in 16 responses. Countering that, 5 respondents indicated that their libraries’ research-guides projects had fallen through. CONCLUSION This study confirms previous research that web-based research guides are a common offering, especially in academic libraries. Adding to this, we have recorded a quantitative adoption of LibGuides both through visiting ARL websites and through a survey distributed to library listservs. Further, this study did not find a consistent management or assessment practice for library research guides. Perhaps the most interesting finding from this study is the role of library systems departments with regard to research guides. It appears that many library systems departments are not actively involved in either the initiation or ongoing support of web-based research guides. What are the implications for the library technology community and what questions arise for future research? The apparent ascendancy of LibGuides over local solutions is certainly worth considering and in part demonstrates some comfort within libraries for cloud computing and SAAS. Time will tell how this might spread to other library systems. The popularity of LibGuides, at its heart a specialized content management system, also calls into question the vitality and adaptability of local content management system implementations in libraries. More generally, does the desire to professionally select and steward information for users on research guides indicate librarian misgivings about the usability of enterprise library systems? How do attitudes LIBRARY USE OF WEB-BASED RESEARCH GUIDES | GHAPHERY AND WHITE 28 toward research guides differ between public services and technical services? Hopefully these questions serve as a call for continued technical engagement with library research guides. What shape that engagement may have in the future is an open question, but based on the prevalence and descriptions of current implementations, such consideration by the library technology community is worthwhile. REFERENCES 1. Luigina Vileno, “From Paper to Electronic, the Evolution of Pathfinders: A Review of the Literature,” Reference Services Review 35, no. 3 (2007): 434–51. 2. Martin Courtois, Martha Higgins, Aditya Kapur, “Was this Guide Helpful? Users’ Perceptions of Subject Guides,” Reference Services Review 33 , no. 2 (2005): 188–96. 3. Lateka J. Grays, Darcy Del Bosque, and Kristen Costello, “Building a Better M.I.C.E. Trap: Using Virtual Focus Groups to Assess Subject Guides for Distance Education Students,” Journal of Library Administration 48, no. 3/4 (2008): 431–53. 4. Mira Foster et al., “Marketing Research Guides: An Online Experiment with LibGuides,” Journal of Library Administration 50, no. 5/6 (July/September, 2010): 602–16; Alisa C. Gonzalez and Theresa Westbrock, “Reaching Out with LibGuides: Establishing a Working Set of Best Practices,” Journal of Library Administration 50, no. 5/6 (July/September, 2010): 638–56. 5. Luigina Vileno, “Testing the Usability of Two Online Research Guides,” Partnership: The Canadian Journal of Library and Information Practice and Research 5, no. 2 (2010), http://journal.lib.uoguelph.ca/index.php/perj/article/view/1235 (accessed August 8, 2011). 6. Angela Horne and Steve Adams, “Do the Outcomes Justify the Buzz? An Assessment of LibGuides at Cornell University and Princeton University—Presentation Transcript,” presented at the Association of Academic and Research Libraries, Seattle, WA, 2009, http://www.slideshare.net/smadams/do-the-outcomes-justify-the-buzz-an-assessment-of- LibGuides-at-cornell-university-and-princeton-university (accessed August 8, 2011). 7. Sarah Morris and Marybeth Grimes, “A Great Deal of Time and Effort: An Overview of Creating and Maintaining Internet-based Subject Guides,” Library Computing 18, no. 3 (1999): 213–16. 8. Mathew Miles and Scott Bergstrom, “Classification of Library Resources by Subject on the Library Website: Is There an Optimal Number of Subject Labels?” Information Technology & Libraries 28, no. 1 (March 2009): 16–20, http://www.ala.org/lita/ital/files/28/1/miles.pdf (accessed August 8, 2011). 9. Association of Research Libraries, “Association of Research Libraries: Member Libraries,” http://www.arl.org/arl/membership/members.shtml (accessed October 24, 2011). http://journal.lib.uoguelph.ca/index.php/perj/article/view/1235 http://www.slideshare.net/smadams/do-the-outcomes-justify-the-buzz-an-assessment-of-LibGuides-at-cornell-university-and-princeton-university http://www.slideshare.net/smadams/do-the-outcomes-justify-the-buzz-an-assessment-of-LibGuides-at-cornell-university-and-princeton-university http://www.ala.org/lita/ital/files/28/1/miles.pdf http://www.arl.org/arl/membership/members.shtml INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2012 29 Appendix. Survey Library Use of Web-based Research Guides Please complete the survey below. We are researching libraries’ use of web-based research guides. Please consider filling out the following survey, or forwarding this survey to the person in your library who would be in the best position to describe your library’s research guides. Responses are anonymous. Thank you for your help! Jimmy Ghaphery, VCU Libraries Erin White, VCU Libraries 1) What is the name of your organization? __________________________________ Note that the name of your organization will only be used to make sure multiple responses from the same organization are not received. Any publication of results will not include specific names of organizations. 2) Which choice best describes your library? o ARL o University library o College library o Community college library o Public library o School library o Private library o Governmental library o Nonprofit library 3) What type of system best describes your research guides by subject? o LibGuides or CampusGuides o Customized open source system o Other commercial system o Homegrown system o Static HTML pages 4) Which statement best describes the selection of your current research guides system? o Initiated by Administration o Initiated by Systems o Initiated by Public Services o Initiated by an informal cross-departmental group o More of a library-wide initiative LIBRARY USE OF WEB-BASED RESEARCH GUIDES | GHAPHERY AND WHITE 30 5) How much ongoing involvement does your Systems Department have with the management of your research guides? o No ongoing involvement o Some ongoing involvement o Considerable ongoing involvement o N/A we do not have a Systems Department 6) What other type of content do you host on your research guides system? o Course pages o “How to” instruction o Alphabetical list of all databases o “About the library” information (for example: hours, directions, staff directory, events) o Digital collections o Everything—we use the research guide platform as our website o None of the above 7) Which statement best describes the relationship between your discovery/federated search tool and your research guides? o We typically do not include our discovery tool on our guides o Our discovery tool is one of many search options we promote on our guides o We prominently feature our discovery tool on our guides o N/A We do not have a discovery tool 8) Which statement best describes the relationship between your Web Content Management System and your research guides? o Our content management system is independent of our research guides o Our content management system is integrated with our research guides o Our content management system is used for both our website and our research guides o N/A we do not have a content management system 9) Which of the following procedures or policies do you have in place for your research guides? o Defined scope of appropriate content o Required elements such as contact information, chat, pictures, etc. o Style guides for consistent presentation o Allowing and/or moderating user tags, comments, ratings o Training for guide creators o Controlled vocabulary/tagging system for managing guides o Maintenance and upkeep of guides o Link checking INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2012 31 o Transfer of guides to another author due to separation or change in duties o None of the above 10) How do you evaluate the success or failure of your research guides? [Free Text] 1857 ---- Editorial Board Thoughts: Tools of the Trade Sharon Farnel INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2012 5 As I was trying to settle on a possible topic for this, my second “Editorial Board Thoughts” piece, I was struggling to find something that I’d like to talk about and that ITAL readers would (I hope) find interesting. I had my “Eureka!” moment one day as I was coming out of a meeting, thinking about a conversation that had taken place around tools. Now, by tools, I’m referring not to hardware, but to those programs and applications that we can and do use to make our work easier. The meeting was of our institutional repository team, and the tools discussion specifically focused on data cleanup and normalization, citation integration, and the like. I had just recently returned from a short conference where I had heard mentioned or seen demonstrated a few neat applications that I thought had potential. A colleague also had just returned from a different conference, excited by some of things that he’d learned about. And all of the team members had, in recent days, seen various e-mail messages about new tools and applications that might be useful in our environment. We mentioned and discussed briefly some of the tools that we planned to test. One of the tools had already been test driven by a couple of us, and looked promising; another seemed like it might solve several problems, and so was bumped up the testing priority list. During the course of the conversation, it became clear that each of us had a laundry list of tools that we wanted to explore at greater depth. And it also became clear that, as is so often the case, the challenge was finding the time to do so. As we were talking, my head was full of images of an assembly line, widgets sliding by so quickly that you could hardly keep up. I started thinking how you could stand there forever, overwhelmed by the variety and number of things flying by at what seemed like warp speed. Alternatively, if you ever wanted to get anywhere, do anything, or be a part of it all, you just had to roll up your sleeves and grab something. The meeting drew to a close, and we all left with a sense that we needed to find a way of tackling the tools-testing process, of sharing what we learn and what we know, all in the hope of finding a set of tools that we, as a team, could become skilled with. I personally felt a little disappointed at not having managed to get around to all of the tools I’d earmarked for further investigation. But I also felt invigorated at the thought of being able to share the load of testing and researching. If we could coordinate ourselves, we might be able to test drive even more tools, increasing the Sharon Farnel (sharon.farnel@ualberta.ca) is Metadata and Cataloguing Librarian, University of Alberta, Edmonton, Alberta, Canada. mailto:sharon.farnel@ualberta.ca EDITORIAL BOARD THOUGHTS | FARNEL 6 likelihood we’d stumble on the few that would be just right! We’d taken furtive steps towards this in the past, but nothing coordinated enough to make it really stick and be effective. I started wondering how other individuals and institutions manage not only to keep up with all of the new and potentially relevant tools that appear at an ever-increasing pace, but more so how they manage to determine which they will become expert at and use going forward. (Although I was excited at what we were thinking of doing, I was quite sure that others were likely far ahead of us in this regard!) It made me realize that at some point I—and we—need to stop being bystanders to the assembly line, watching the endless parade of tools pass us by. We need to simply grab on to a tool and take it for a spin. If it works for what we need, we stick with it. If it doesn’t, we put it back on the line, and grab a different one. But at some point we have to take a chance and give something a shot. We’ve decided on a few methods we’ll try for taking full advantage of the tool-rich environment in which libraries exist today. Our metadata team has set up a “test bench,” a workstation that we can all use and share for trying new tools. A colleague is going to organize monthly brown-bag talks at which team members can demonstrate tools that they’ve been working with and that they think have potential uses in our work. And we’re also thinking of starting an informal, and public, blog, where we can post, among other things, about new tools we’ve tried or are trying, what we’re finding works and how, and what doesn’t and why. We hope these and other initiatives will help us all stay abreast or even slightly ahead of new developments, be flexible in incorporating new tools into our workflows when it makes the most sense, and in building skills and expertise that benefit us and that can be shared with others. So, I ask you, our ITAL readers, how do you manage the assembly line of tools? How do you gather information on them, and when do you decide to take one off and give it a whirl? How do you decide when something is worth keeping, or when something isn’t quite the right fit and gets placed back on the line? Why not let us know by posting on the ITALica blog? Or, even better, why not write about your experience and submit it to ITAL? We’re always on the lookout for interesting and instructional stories on the tools of our trade! http://ital-ica.blogspot.com/ 1855 ---- Usability Test Results for a Discovery Tool in an Academic Library Jody Condit Fagan Meris Mandernach Carl S. Nelson Jonathan R. Paulo Grover Saunders INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2012 83 ABSTRACT Discovery tools are emerging in libraries. These tools offer library patrons the ability to concurrently search the library catalog and journal articles. While vendors rush to provide feature-rich interfaces and access to as much content as possible, librarians wonder about the usefulness of these tools to library patrons. To learn about both the utility and usability of EBSCO Discovery Service, James Madison University (JMU) conducted a usability test with eight students and two faculty members. The test consisted of nine tasks focused on common patron requests or related to the utility of specific discovery tool features. Software recorded participants’ actions and time on task, human observers judged the success of each task, and a post–survey questionnaire gathered qualitative feedback and comments from the participants. Participants were successful at most tasks, but specific usability problems suggested some interface changes for both EBSCO Discovery Service and JMU’s customizations of the tool. The study also raised several questions for libraries above and beyond any specific discovery-tool interface, including the scope and purpose of a discovery tool versus other library systems, working with the large result sets made possible by discovery tools, and navigation between the tool and other library services and resources. This article will be of interest to those who are investigating discovery tools, selecting products, integrating discovery tools into a library web presence, or performing evaluations of similar systems. INTRODUCTION Discovery tools appeared on the library scene shortly after the arrival of next-generation catalogs. The authors of this paper define discovery tools as web software that searches journal-article and library-catalog metadata in a unified index and presents search results in a single interface. This differs from federated search software, which searches multiple databases and aggregates the results. Examples of discovery tools include Serials Solutions Summon, EBSCO Discovery Service, Jody Condit Fagan (faganjc@jmu.edu) is Director, Scholarly Content Systems, Meris Mandernach (manderma@jmu.edu) is Collection Management Librarian, Carl S. Nelson (nelsoncs@jmu.edu) is Digital User Experience Specialist, Jonathan R. Paulo (paulojr@jmu.edu) is Education Librarian, and Grover Saunders (saundebn@jmu.edu) is Web Media Developer, Carrier Library, James Madison University, Harrisonburg, VA. mailto:faganjc@jmu.edu mailto:manderma@jmu.edu mailto:nelsoncs@jmu.edu mailto:paulojr@jmu.edu mailto:saundebn@jmu.edu USABILITY TEST RESULTS FOR A DISCOVERY TOOL IN AN ACADEMIC LIBRARY | FAGAN ET AL 84 Ex Libris Primo, and OCLC WorldCat Local; examples of federated search software include Serials Solutions WebFeat and EBSCO Integrated Search. With federated search software, results rely on the search algorithm and relevance ranking as well as each tool’s algorithms and relevance rankings. Discovery tools, which import metadata into one index, apply one set of search algorithms to retrieve and rank results. This difference is important because it contributes to a fundamentally different user experience in terms of speed, relevance, and ability to interact consistently with results. Combining the library catalog, article indexes, and other source types in a unified interface is a big change for users because they no longer need to choose a specific search tool to begin their search. Research has shown that such a choice has long been in conflict with users’ expectati ons.1 Federated search software was unable to completely fulfill users’ expectations because of its limited technology.2 Now that discovery tools provide a truly integrated search experience, with greatly improved relevance rankings, response times, and increased consistency, libraries can finally begin to meet this area of user expectation. However, discovery tools present new challenges for users: will they be able to differentiate between source types in the integrated results sets? Will they be able to limit large results sets effectively? Do they understand the scope of the tool and that other online resources exist outside the tool’s boundaries? The sea change brought by discovery tools also raises challenges for librarians, who have grown comfortable with the separation between the library catalog and other online databases. Discovery tools may mask important differences between disciplinary searching, and they do not currently offer discipline-specific strategies or limits. They also lack authority control, which makes topical precision a challenge. Their usual prominence on library websites may direct traffic away from carefully cultivated and organized collections of online resources. Discovery tools offer both opportunities and challenges for library instruction, depending on the academic discipline, users’ knowledge, and information-seeking need. James Madison University (JMU) is a predominantly undergraduate institution of approximately 18,000 students in Virginia. JMU has a strong information literacy program integrated into the curriculum through the university’s Information Seeking Skills Test (ISST). The ISST is completed before students are able to register for third-semester courses. Additionally, the library provides an information literacy tutorial, “Go for the Gold,” that supports the skills needed for the ISST. JMU launched EBSCO Discovery Service (EDS) in August 2010 after participating as a beta development partner in spring and summer 2010. As with other discovery tools, the predominant feature of EDS is integration of the library catalog with article databases and other types of sources. At the time of this study, EDS had a few differentiating features. First, because of EBSCO’s business as a database and journal provider, article metadata was drawn from a combination of journal-publisher information and abstracts and index records. The latter included robust subject indexing (e.g., the medical subject headings in CINAHL). The content searched by EDS varies by INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2012 85 institution according to the institution’s subscription. JMU had a large number of EBSCO databases and third-party database subscriptions through EBSCO, so the quantity of information searched by EDS at JMU is quite large. EDS also allowed for extensive customization of the tool, including header navigation links, results-screen layout, and the inclusion of widgets in the right-hand column of the results screen. JMU Libraries developed a custom “Quick Search” widget based on EDS for the library home page (see figure 1), which allows users to add limits to the discovery-tool search and assists with local authentication requirements. Based on experience with a pilot test of the open-source VuFind next-generation catalog, JMU Libraries believed users would find the ability to limit up-front useful, so Quick Search’s first drop-down menu contained keyword, title, and author field limits; the second drop-down contained limits for books, articles, scholarly articles, “Just LEO Library Catalog,” and the library website (which did not use EDS). The “Just LEO Library Catalog” option limited the user’s search to the library catalog database records but used the EDS interface to perform the search. To access the native catalog interface, a link to LEO Library Catalog was included immediately above the search box as well as in the library website header. Figure 1. Quick Search Widget on JMU Library Homepage USABILITY TEST RESULTS FOR A DISCOVERY TOOL IN AN ACADEMIC LIBRARY | FAGAN ET AL 86 Evaluation was included as part of the implementation process for the discovery t ool, and therefore a usability test was conducted in October 2010. The purpose of the study was to explore how patrons used the discovery tool, to uncover any usability issues with the chosen system and to investigate user satisfaction. Specific tasks addressed the use of facets within the discovery tool, patrons’ use of date limiters, and the usability of the Quick Search widget. The usability test also had tasks in which users were asked to locate books and articles using only the discovery tool, then repeat the task using anything but the discovery tool. This article interprets the usability study’s results in the context of other local usability tests and web-usage data from the first semester of use. Some findings were used to implement changes to Quick Search and the library website, and to recommend changes to EBSCO; however, other findings suggested general questions related to discovery tool software that libraries will need to investigate further. LITERATURE REVIEW Literature reviewed for this article included some background reading on users and library catalogs, library responses to users’ expectations, usability studies in libraries, and usability studies of discovery tools specifically. The first group of articles comprised a discussion about the limitations of traditional library catalogs. The strengths and weaknesses of library catalogs were reported in several academic libraries’ usability studies.3 Calhoun recognized that library users’ preference for Google caused a decline in the use and value of library catalogs, and encouraged library leaders to “establish the catalog within the framework of online information discovery systems.” 4 This awareness of changes in user expectations during a time when Google set the benchmark for search simplicity was echoed by numerous authors who recognized the limits of library catalogs and expressed a need for the catalog to be greatly modernized to keep pace with the evolution of the web. 5 Libraries have responded in several ways to the call for modernization, most notably through investigations related to federated searching and next-generation catalogs. Several articles have presented usability studies results for various federated searching products.6 Fagan provided a thorough literature review of faceted browsing and next-generation catalogs.7 Western Michigan University presented usability study results for the next-generation catalog VuFind, revealing that participants took advantage of the simple search box but did not use the next-generation catalog features of tagging, comments, favorites, and SMS texting. 8 The University of Minnesota conducted two usability studies of Primo and reported that participants were satisfied with using Primo to find known print items, limit by author and date, and find a journal title.9 Tod Olson conducted a study with graduate students and faculty using the AquaBrowser interface, and his participants located sources for their research they had not previously been able to find.10 INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2012 87 The literature also revealed both opportunities and limitations of federated searching and next- generation catalogs. Allison presented statistics from Google Analytics for an implementation of Encore at the University of Nebraska-Lincoln. 11 The usage statistics revealed an increased use of article databases as well as an increased use of narrowing facets such as format and media type, and library location. Allison concluded that Encore increased users’ exposure to the entire collection. Breeding concluded that federated searching had various limitations, especially search speed and interface design, and was thus unable to compete with Google Scholar. 12 Usability studies of next-generation catalogs revealed a lack of features necessary to fully incorporate an entire library’s collection. Breeding also recognized the limitations of next-generation library catalogs and saw discovery tools as their next step in evolution: “It’s all about helping users discover library content in all formats, regardless of whether it resides within the physical library or among its collections of electronic content, spanning both locally owned materials and those accessed remotely through subscriptions.” 13 The dominant literature related to discovery tools discussed features,14 reviewed them from a library selector perspective,15 summarized academic libraries’ decisions following selection, 16 presented questions related to evaluation after selection,17 and offered a thorough evaluation of common features.18 Allison concluded that “usability testing will help clarify what aspects need improvement, what additions will make [the interface] more useful, and how the interface can be made so intuitive that user training is not needed.”19 Breeding noted “it will only be through the experience of library users that these products will either prove themselves or not.”20 Libraries have been adapting techniques from the field of usability testing for over a decade to learn more about user behavior, usability, and user satisfaction, with library web sites and systems. 21 Rubin and Chisnell and Dumas and Redish provided an authoritative overview of the benefits and best practices of usability testing. 22 In addition, Campbell and Norlin and Winters offered specific usability methodologies for libraries.23 WorldCat Local has dominated usability studies of discovery tools published to date. Ward, Shadle, and Mofield conducted a usability study at the University of Washington. 24 Although the second round of testing was not published, the first round involved seven undergraduate and three graduate students; its purpose “was to determine how successful UW students would be in using WorldCat Local to discover and obtain books and journal articles (in both print and electronic form) from the UW collection, from the Summit consortium, and from other WorldCat libraries.” 25 Although participants were successful at completing these tasks, a few issues arose out of the usability study. Users had difficulty with the brief item display because reviews were listed higher than the actual items. The detailed item display also hindered users’ ability to decipher between various editions and formats. The second round of usability testing, not yet published, included tasks related to finding materials on specific subject areas. USABILITY TEST RESULTS FOR A DISCOVERY TOOL IN AN ACADEMIC LIBRARY | FAGAN ET AL 88 Boock, Chadwell, and Reese conducted a usability study of WorldCat Local at Oregon State University.26 The study included four tasks and five evaluative questions. Forty undergraduate students, sixteen graduate students, twenty-four library employees, four instructors, and eighteen faculty members took part in the study. They summarized that users found known-title searching to be easier in the library catalog but found topical searches to be more effective in WorldCat Local.The participants preferred WorldCat Local for the ability to find articles and search for materials in other institutions. Western Washington University also conducted a usability study of WorldCat Local. They selected twenty-four participants with a wide range of academic experience to conduct twenty tasks in both WorldCat Local and the traditional library catalog.27 The comparison revealed several problems in using WorldCat Local, including users’ inability to determine the scope of the content, confusion over the intermixing of formats, problems with the display of facet option, and difficulty with known-item searches. Western Washington University decided not to implement WorldCat Local. OCLC published a thorough summary of several usability studies conducted mostly with academic libraries piloting the tool, including the University of Washington; the University of California (Berkeley, Davis, and Irvine campuses); Ohio State University; the Peninsula Library System in San Mateo, California; and the Free Library of Urbana and the Des Plaines Public Library, both in Illinois.28 The report conveyed favorable user interest in searching local, group, and global collections together. Users also appreciated the ability to search articles and books together. The authors commented, “however, most academic participants in one test (nine of fourteen) wrongly assumed that journal article coverage includes all the licensed content available at their campuses.”29 OCLC used the testing results to improve the order of search results, provide clarity about various editions, improve facets for narrowing a search, provide links to electronic resources, and increase visibility of search terms. At Grand Valley State University, Doug Way conducted an analysis of usage statistics after implementing the discovery tool Summon in 2009; the usage statistics revealed an increased use of full-text downloads and link resolver software but a decrease in the use of core subject databases.30 The usage statistics showed promising results, but Way recommended further studies of usage statistics over a longer period of time to better understand how discovery tools affect entire library collections. North Carolina State University Libraries released a final report about their usability study of Summon.31 The results of these usability studies were similar to other studies of discovery tools: users were satisfied with the ability to search the library catalog and article databases with a single search, but users had mixed results with known-item searching and confusion about narrowing facets and results ranking. Although several additional academic libraries have conducted usability studies of Encore, Summon, and EBSCO Discovery Service, the results have not yet been published.32 INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2012 89 Only one usability study of EBSCO Discovery Service was found. In a study with six participants, Williams and Foster found users were satisfied and able to adapt to the new system quickly but did not take full advantage of the rich feature set.33 Combined with the rapid changes in these tools, the literature illustrates a current need for more usability studies related to discovery tools. The necessary focus on specific software implementations and different study designs make it difficult to identify common themes. Additional usability studies will offer greater breadth and depth to the current dialogue about discovery tools. This article will help fill the gap by presenting results from a usability study of EBSCO Discovery Service. Publishing such usability results of discovery tools will inform institutional decisions, improve user experiences, and advance the tools’ content, features, and interface design. In addition, libraries will be able to more thoroughly modernize library catalogs to meet users’ changing needs and expectations as well as keep pace with the evolution of the web. METHOD James Madison University Libraries’ usability lab features one workstation with two pieces of usability software: Techsmith’s Morae (version 3) (http://www.techsmith.com/morae.asp), which records screen captures of participant actions during the usability studies, and the Usability Testing Environment (UTE) (version 3), which presents participants with tasks in a web-browser environment. The UTE also presents end-of-task questions to measure time on task and task success. The study of EDS, conducted in October 2010, was covered by an institutional review board – approved protocol. Participants were recruited for the study through a bulk email sent to all students and faculty. Interested respondents were randomly selected to include a variety of grade levels and majors for students and years of service and disciplines taught for faculty members. The study included ten participants with ranging levels of experience: two freshman, two sophomores, two juniors, one senior, one graduate student, and two faculty members. Three of the participants were from the school of business, one from education, two from the arts and humanities, and two from the sciences. The remaining two participants had dual majors in the humanities and the sciences. A usability rule of thumb is that at least five users will reveal more than 75 percent of usability issues.34 Because the goal was to observe a wide range of user behaviors and usability issues, and to gather data about satisfaction from a variety of perspectives, this study used two users of each grade level plus two faculty participants (for a total of ten) to provide as much heterogeneity as possible. Student participants were presented with ten pre–study questions, and faculty participants were asked nine pre–study questions (see appendix A). The pre–study questions were intended to http://www.techsmith.com/morae.asp USABILITY TEST RESULTS FOR A DISCOVERY TOOL IN AN ACADEMIC LIBRARY | FAGAN ET AL 90 gather information about participants’ background, including their time at JMU, their academic discipline, and their experience with the library website, the EBSCOhost interface, the library catalog, and library instruction. Since participants were anonymous, we hoped their answers would help us interpret unusual comments or findings. Pre–test results were not used to form comparison groups (e.g., freshmen versus senior) because these groups would not be representative of their larger populations. These questions were followed by a practice task to help familiarize participants with the testing software. The study consisted of nine tasks designed to showcase usability issues, show the researchers how users behaved in the system, and measure user satisfaction. Appendix B lists the tasks and what they were intended to measure. In designing the test, determining success on some tasks seemed very objective (find a video about a given topic) while others appeared to be more subjective (those involving relevance judgments). For this reason, we asked participants to provide satisfaction information on some tasks and not others. In retrospect, for consistency of interpretation, we probably should have asked participants to rate or comment on every task. All of the tasks were presented in the same order. Tasks were completed either by clicking “Answer” and answering a question (multiple choice or typed response), or by clicking “Finished” after navigating to a particular webpage. Participants also had the option to skip the task they were working on and move to the next task. Allowing participants to skip a task helps differentiate between genuinely incorrect answers and incorrect answers due to participant frustration or guessing. A time limit of 5 minutes was set for tasks 1–7, while tasks 8 and 9 were given time limits of 8 minutes, after which the participant was timed out. Time limits were used to ensure participants were able to complete all tasks within the agreed-upon session. Average time on task across all tasks was 1 minute, 35 seconds. After the study was completed, participants were presented with the System Usability Scale (SUS), a ten-item scale using statements of subjective assessment and covering a variety of aspects of system usability.35 SUS scores, which provide a numerical score out of 100, are affected by the complexity of both the system and the tasks users may have performed before taking the SUS. The SUS was followed by a post–test consisting of six open-ended questions, plus one additional question for faculty participants, intended to gather more qualitative feedback about user satisfaction with the system (see appendix A). A technical glitch with the UTE software affected the study in two ways. First, on seven of the ninety tasks, the UTE failed to enforce the five-minute maximum time limit, and participants exceeding a task’s time limit were allowed to continue the task until they completed or skipped the task. One participant exceeded the time limit on task 1 while three of these errors occurred during both tasks 8 and 9. This problem potentially limits the ability to compare the average time on task across tasks; however, since this study used time on task in a descriptive rather than comparative way, the impact on interpreting results is minimal. The seven instances in which the glitch occurred were included in the average time on task data found in figure 3 because the times INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2012 91 were not extreme and the time limit had been imposed mostly to be sure participants had time to complete all the tasks. A second problem with the UTE was that it randomly and prematurely aborted some users’ tasks; when this happened, participants were informed that their time had run out and were then moved on to the next task. This problem is more serious because it is unknown how much more time or effort the participant would have spent on the task or whether they would have been more successful. Because of this, the results below specify how many participants were affected for each task. Although this was unfortunate, the results of the participants who did not experience this problem still provide useful cases of user behavior, especially because this study does not attempt to generalize observed behavior or usability issues to the larger population. Although a participant mentioned a few technical glitches during testing to the facilitator, the extent of software errors was not discovered until after the tests were complete (and the semester was over) because the facilitator did not directly observe participants during sessions. RESULTS The participants were asked several pre–test questions to learn about their research habits. All but one participant indicated they used the library website no more than six times per month (see figure 2). Common tasks this study’s student participants said they performed on the website were searching for books and articles, searching for music scores, “research using databases,” and checking library hours. The two faculty participants mentioned book and database searches, electronic journal access, and interlibrary loan. Participants were shown the Quick Search widget and were asked “how much of the library’s resources do you think the Quick Search will search?” Seven participants said “most”; only one person, a faculty member, said it would search “all” the library’s resources. Figure 2. Monthly Visits to Library Website < 1 visit (2) 1 - 3 visits (4) 4 - 6 visits (3) > 7 visits (1) USABILITY TEST RESULTS FOR A DISCOVERY TOOL IN AN ACADEMIC LIBRARY | FAGAN ET AL 92 When shown screenshots of the library catalog and an EBSCOhost database, seven participants were sure they had used LEO Library Catalog, and three were not sure. Three indicated that they had used an EBSCO database before, five had not, and two were not su re. Participants were also asked how often they had used library resources for assignments in their major field of study; four said “often,” two said “sometimes,” one “rarely/never,” and one “very often.” Students were also asked “has a librarian spoken to a class you’ve attended about library research?” and two said yes, five said no, and one was not sure. A “practice task” was administered to ensure participants were comfortable with the workstation and software: “Use Quick Search to search a topic relating to your major/discipline or another topic of interest to you. If you were writing a paper on this topic how satisfied would you be with these results?” No one selected “no opinion” or very unsatisfied”; Sixty percent were “very satisfied” or “satisfied” with their results; forty percent were “somewhat unsatisfied.” Figure 3 shows the time spent on each task, while figure 4 describes participants’ success on the tasks. Task 1 Task 2 Task 3 Task 4 Task 5 Task 6 Task 7 Task 8 Task 9 No. Of Responses (not including timeouts) 10 9 5 7 9 10 10 8 10 Avg. Time on Task (in seconds) 175* 123 116 97 34 120 92 252* 255* Standard Deviation 212 43 50 49 26 36 51 177 174 *Includes time(s) in excess of the set time limit. Excess time allowed by software error. Figure 3. Average Time Spent on Tasks 175 123 116 97 34 120 92 292 255 0 50 100 150 200 250 300 350 Task 1 Task 2 Task 3 Task 4 Task 5 Task 6 Task 7 Task 8 Task 9 T im e o n T a sk ( in s e co n d s) Average Time for All Tasks (not including timeouts) INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2012 93 The first task (“What was the last thing you searched for when doing a research assignment for class? Use Quick Search to re-search for this.”) started participants on the library homepage. Participants were then asked to “Tell us how this compared to your previous experience” using a text box. The average time on task was almost 2 minutes; however one faculty participant took more than 12 minutes on this task; if his or her time was removed, the time on task average was 1 minute, 23 seconds. Figure 5 shows the participants’ search terms and their comments. Task 1 Task 2 Task 3 Task 4 Task 5 Task 6 Task 7 Task 8 Task 9 How Success Deter- mined Users only asked to provide feedback Valid typed-in response provided How many subtasks completed (out of 3) How many subtasks completed (out of 2) Correct multiple choice answer How many subtasks completed (out of 2) End task at correct web location How many subtasks complete d (out of 4) How many subtasks completed (out of 4) P01 N/A Correct 3 2 TIMEOUT 2 Correct 0* 0** P02 N/A Correct 3* 1 Correct 2 Correct 0** 3 P03 N/A Correct 0* 1 Incorrect 2 Correct 4 3 P04 N/A Correct 2 0* Correct 2 SKIP 3 2 P05 N/A Correct* 2 2 Correct 1 Correct 4 2 P06 N/A Correct 3* 1 Correct 1 Correct 3 0** P07 N/A Correct 2 1* Correct 1 Correct 0 2 P08 N/A Correct 2 0* Correct 0 SKIP TIMEOUT 0** P09 N/A Correct 2* SKIP Correct 2 Correct 4 2 P10 N/A Correct 1* 1 Correct 2 SKIP 4 2 Note: “TIMEOUT” indicates an immediate timeout error. Users were unable to take any action on the task. *User experienced a timeout error while working on the task. This may have affected their ability to complete the task. **User did not follow directions. Figure 4. Participants’ Success on Tasks USABILITY TEST RESULTS FOR A DISCOVERY TOOL IN AN ACADEMIC LIBRARY | FAGAN ET AL 94 Participant JMU Status Major/Discipline Search Terms P01 Faculty Geology large low shear wave velocity province Comments: Ebsco did a fairly complete job. There were some irrelevant results that I don’t remember seeing when I used GeoRef. P02 Faculty Computer Information Systems & Management Science (statistics) student cheating Comments: This is a topic that I am somewhat familiar with the related literature. I was pleased with the diversity of journals that were found in the search. The topics of the articles was right on target. The recency of the articles was great. This is a topic for which I am somewhat familiar with the related literature. I was impressed with the search results regarding: diversity of journals; recency of articles; just the topic in articles I was looking for. P03 Graduate Student Education Death of a Salesman Comments: There is a lot of variety in the types of sources that Quick Search is pulling up now. I would still have liked to see more critical sources on the play but I could probably have found more results of that nature with a better search term, such as “death of a salesman criticism.” P04 1st year Voice Performance current issues in Russia Comments: It was somewhat helpful in the way that it gave me information about what had happened in the past couple months, but not what was happening now in russia. P05 3rd year Nursing uninsured and health care reform Comments: The quick search gave very detailed articles I thought, which could be good, but were not exactly what I was looking for. Then again, I didn’t read all these articles either P06 1st year History headscarf law Comments: This search yielded more results related to my topic. I needed other sources for an argument on the French creating law banning religious dress and symbols in school. Using other methods with the same keyword, I had an enormous amount of trouble finding articles that pertained to my essay. P07 3rd year English Jung Comments: I like the fact that it can be so defined to help me get exactly what I need. P08 4th year Spanish restaurant industry Comments: This is about the same as the last time that I researched this topic. P09 2nd year Hospitality aphasia Comments: There are many good sources, however there are also completely irrelevant sources. P10 2nd year Management Rogers five types of feedback Comments: There is not many documents on the topic I searched for. This may be because the topic is not popular or my search is not specific/too specific. Figure 5. Participants’ Search Terms and Comments INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2012 95 The second task started on the library homepage and asked participants to find a video related to early childhood cognitive development. This task was chosen because JMU Libraries have significant video collections and because the research team hypothesized users might have trouble because there was no explicit way to limit to videos at the time. The average time on this task was two minutes, with one person experiencing an arbitrary time out by the software. Participants were judged to be successful on this task by the researchers if they found any video related to the topic. All participants were successful on this task, but four entered, then left the discovery tool interface to complete the task. Five participants looked for a video search option in the drop-down menu, and of these, three immediately used something other than Quick Search when they saw that there was no video search option. Of those who tried Quick Search, six opened the source type facet in EDS search results and four selected a source type limit, but only two selected a source type that led directly to success (“non-print resources”). Task 3 started participants in EDS (see figure 6) and asked them to search on speech pathology, find a way to limit search results to audiology, and limit their search results to peer-reviewed sources. Participants spent an average of 1 minute, 40 seconds on this task, with five participants being artificially timed out by the software. Participants’ success on this task was determined by the researchers’ examination of the number of subtasks they completed. The three subtasks consisted of successfully searching for the given topic (speech language pathology) limiting the search results to audiology, and further limiting the results to peer reviewed sources. Four participants were able to complete all three subtasks, including two who were timed out. (The times for those who were timed out were not included in time on task averages, but they were given credit for success.) Five completed just two of the subtasks, failing to limit to peerreviewed; one of these because of a timeout. It was unclear why the remaining participants did not attempt to alter the search results to “peer reviewed.” Looking at the performed actions, six of the ten typed “AND audiology” into search keywords to narrow the search results, while one found and used “audiology” in the Subject facet on the search results page. Six participants found and used the “Scholarly (Peer Reviewed) Journals” checkbox limiter. USABILITY TEST RESULTS FOR A DISCOVERY TOOL IN AN ACADEMIC LIBRARY | FAGAN ET AL 96 Figure 6. EBSCO Discovery Service Interface Beginning with the results they had from task 3, task 4 asked participants to find more recent sources and to select the most recent source available. Task success was measured by correct completion of two subtasks: limiting the search results to the last five years and finding the most recent source. The average time on task was 1 minute, 14 seconds, with three artificial timeouts. Of those who did not time out, all seven were able to limit their sources to be more recent in some way, but only three were able to select the most recent source. In addition to this being a common research task, the team was interested to see how users accomplished this task. Three typed in the limiter in the left-hand column, two typed in the limiter on the advanced search screen, and two used the date slider. Two participants used the “sort” drop-down menu to change the sort order to “Date Descending,” which helped them complete this task. Other participants changed the dates, and then selected the first result, which was not the most recent. Task 5, which started within EDS, asked participants to find a way to ask a JMU librarian for help. The success of this task was measured by whether they reached the correct URL for the Ask-a- INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2012 97 Librarian page; eight of the ten participants were successful. This task took an average of only 31 seconds to complete, and eight of the ten used the Ask-a-Librarian link at the top of the page. Of the two unsuccessful participants, one was timed out, while another clicked “search modes” for no apparent reason, then clicked back and decided to finish the task. Task 6 started in the EDS interface and asked participants to locate the journal Yachting and Boating World and select the correct coverage dates and online status from a list of four options; participants were deemed successful at two subtasks if they selected the correct option and successful at one subtask if they chose an option that was partially correct. Participants took an average of two minutes on this task; only five answered correctly. During this task, three participants used the EBSCO search option “SO Journal Title/Source,” four used quotation marks, and four searched or re-searched with the “Title” drop-down menu option. Three chose the correct dates of coverage, but were unable to correctly identify the online availability. It is important to note that only searching and locating the journal title were accomplished with the discovery tool; to see dates of coverage and online availability, users clicked JMU’s link resolver button, and the resulting screen was served from Serials Solutions’ Article Linker product. Although some users spent more time than perhaps was necessary using the EDS search options to locate the journal, the real barriers to this task were encountered when trying to interpret the Serials Solutions screen. Task 7, where participants started in EDS, was designed to determine whether users could navigate to a research database outside of EDS. Users were asked to look up the sculpture Genius of Mirth and were told the library database Camio would be the best place to search. They were instructed to “locate this database and find the sculpture.” The researcher observed the recordings to determine success on this task, which was defined as using Camio to find the sculpture. Participants took an average of 1 minute, 32 seconds on this task; seven were observed to complete the task successfully, while three chose to skip the task. To accomplish this task, seven participants used the JMU Research Databases link in the header navigation at some point, but only four began the task by doing this. Six participants began by searching within EDS. The final two tasks started on the library homepage and were a pair: participants were asked to find two books and two recent, peer-reviewed articles (from the last five years) on rheumatoid arthritis. Task 8 asked them to use the library’s EDS widget, Quick Search, to accomplish this, and task 9 asked them to accomplish the same task without using Quick Search. When they found sources, they were asked to enter the four relevant titles in a text-entry box. The average time spent on these tasks was similar: about four minutes per task. Comparing these tasks was somewhat confusing because some participants did not follow instructions. User s uccess was determined by the researchers’ observation of how many of the four subtasks the user was able to complete successfully: find two books, find two articles, limit to peer reviewed, and select articles from last five years (with or without using a limiter); figure 4 shows their success. USABILITY TEST RESULTS FOR A DISCOVERY TOOL IN AN ACADEMIC LIBRARY | FAGAN ET AL 98 Looking at the seven users who used Quick Search on the Quick Search tasks, six limited to “Scholarly (Peer Reviewed) Journals”; six limited to the last five years; and seven narrowed results using the source type facet. The average number of subtasks completed on task eight was 3.14 out of 4. Looking at the seven users who followed instructions and did not use Quick Search on task 9, all began with the library catalog and tried to locate articles within the library catalog. The average number of subtasks completed on task 9 was 2.29 out of 4. Some users tried to locate articles by setting the catalog’s material type drop-down menu to “Periodicals” and others used the catalog’s “Periodical” tab, which performed a title keyword search of the e-journal portal. For task 9, only two users eventually chose a research database to find articles. User behavior can only be compared for the six users (all students) who followed instructions on both tasks; a summary is provided in figure 4. After completing all nine tasks, participants were presented with the System Usability Scale. EDS scored 56 out of 100. Following the SUS, participants were asked a series of post–test questions. Only one of the faculty members chose to answer the post–test questions. When asked how they would use Quick Search, all eight students explicitly mentioned class assignments, and the participating faculty member replied “to search for books.” Two students mentioned books specifically, while the rest used the more generic term “sources” to describe items for which they would search. When asked “when would you not use this search tool?” the faculty member said “I would just have to get used to using it. I mainly go to [the library catalog] and then research databases.” Responses from the six students who answered this question were vague and hard to categorize: • “Not really sure for more general question/learning” • “When just browsing” • “For quick answers” • “If I could look up the information on the internet” • “When the material I need is broad” • “Basic searching when you do not need to say where you got the info from” When asked for the advantages of Quick Search, four specifically mentioned the ability to narrow results, three respondents mentioned “speed,” three mentioned ease of use, and three mentioned relevance in some way (e.g., “it does a pretty good job associating keywords with sources”). Two mentioned the broad coverage and one compared it to Google, “which is what students are looking for.” When asked to list disadvantages, the faculty member mentioned he/she was not sure what part of the library home page was actually “Quick Search,” and was not sure how to get to his/her library account. Three students talked about Quick Search being “overwhelming” or “confusing” because of the many features, although one of these also stated, “like anything you need to learn in order to use it efficiently.” One student mentioned the lack of an audio recording limit and another said “when the search results come up it is hard to tell if they are usable results.” INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2012 99 Knowing that Quick Search may not always provide the best results, the research team also asked users what they would do if they were unable to find an item using Quick Search. A faculty participant said he or she would log into the library catalog and start from there. Five students mentioned consulting a library staff member in some fashion. Three mentioned moving on from library resources, although not necessarily as their first step. One said “find out more information on it to help narrow down my search.” Only one student mentioned the library catalog or any other specific library resource. When participants were asked if “Quick Search” was an appropriate name, seven agreed that it was. Of those who did not agree, one participant’s comment was “not really, though I don’t think it matters.” And another’s was “I think it represents the idea of the search, but not the action. It could be quicker.” The only alternative name suggestion was “Search Tool.” Web Traffic Analysis Web traffic through Quick Search and in EDS provides additional context for this study’s results. During August–December 2010, Quick Search was searched 81,841 times from the library homepage. This is an increase from traffic into the previous widget in this location that searched the catalog, which received 41,740 searches during the same period in 2009. Even adjusting for an approximately 22 percent increase in website traffic from 2009 to 2010, this is an increase of 75 percent. Interestingly, the traffic to the most popular link on the library homepage, Research Databases, went from 55,891 in 2009 to 30,616 in 2010, a decrease of 55 percent when adjusting for the change in website traffic. During fall 2010, 28 percent of Quick Search searches from the homepage were executed using at least one drop-down menu. Twelve percent changed Quick Search’s first drop-down menu to something other than the keyword default, with “title” being the most popular option (7 percent of searches) followed by author (4 percent of searches). Twenty percent of users changed the second drop-down option; “Just Articles” and “Just Books” were the most popular options, garnering 7 percent and 6 percent of searches, respectively, followed by “Just Scholarly Articles,” which accounted for 4 percent of searches. Looking at EBSCO’s statistical reports for JMU’s implementation of EDS, there were 85,835 sessions and approximately 195,400 searches during August–December 2010. This means about 95 percent of EDS sessions were launched using Quick Search from the homepage. There were an average of 2.3 searches per session, which is comparable to past behavior in JMU’s other EBSCOhost databases. DISCUSSION USABILITY TEST RESULTS FOR A DISCOVERY TOOL IN AN ACADEMIC LIBRARY | FAGAN ET AL 100 The goal of this study was to gather initial data about user behavior, usability issues, and user satisfaction with discovery tools. The task design and technical limitations of the study mean that comparing time on task between participants or tasks would not be particularly illuminating; and, while the success rates on tasks are interesting, they are not generalizable to the larger JMU population. Instead, this study provided observations of user behavior that librarians can use to improve services, it suggested some “quick fixes” to usability issues, and it pointed to several research questions. When possible, these observations are supplemented by comparisons between this study and the only other published usability study of EDS.36 This study confirmed a previous finding of user studies of federated search software and discovery tools: students have trouble determining what is searched by various systems.37 On the tasks in which they were asked to not use Quick Search to find articles, participants tried to search for articles in the library catalog. Although all but one of this study’s participants correctly answered that Quick Search did not search “all” library resources, seven thought it searched “most.” Both “most” or “some” would be considered correct; however, it is interesting that answering this question more specifically is challenging even for librarians. Many journals in subject article indexes and abstracts are included in the EDS Foundation Index; furthermore, JMU’s implementation of EDS includes all of JMU’s EBSCO subscription resources as well, making it impractical to assemble a master list of indexed titles. Of course, there are numerous online resources with contents which may never be included in a discovery tool, such as political voting records, ethnographic files, and financial data. Users often have access to these resources through their library. However, if they do not know the library has a database of financial data, they will certainly not consider this content in their response to a question of how many of the library resources are included in the discovery tool. As discovery tools begin to fulfill users’ expectations for a “single search,” libraries will need to share best practices for showcasing valuable, useful collections that fall outside the discovery tool’s scope or abilities. This is especially critical when reviewing the 72 percent increase in homepage traffic to the homepage search widget compared with the 55 percent decrease in homepage traffic to the research databases page. It is important to note these trends do not mean the library’s other research databases have fallen in usage by 55 percent. Though there was not a comprehensive examination of usage statistics, spot-checking suggested EBSCO and non-EBSCO subject databases had both increases and decreases in usage from previous years. Another issue libraries should consider, especially when preparing for instruction classes, is that users do not seem to understand which information needs are suited to a discovery tool versus the catalog or subject-specific databases. Several tasks provided additional information about users’ mental models of the tool, which may help libraries make better decisions about navigation customizations in discovery tool interfaces and on library websites. Task 7 was designed to discover whether users could find their way to a database outside of EDS if they knew they needed to use a specific database. Six participants, including one of the faculty members, began by searching EDS for the name of the sculpture and/or the database name. On task 1, a graduate INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2012 101 student who searched on “Death of a Salesman” and was asked to comment on how Quick Search results compared to his or her previous experience, said, “I would still have liked to see more critical sources on the play but I could probably have found more results of that nature with a better search term, such as ‘death of a salesman criticism.’” While true, most librarians would suggest using a literary criticism database, which would target this information need. Librarians may have differing opinions regarding the best research starting point, but their rationale would be much different than that of the students in this study. This study’s participants said they would use Quick Search/EDS when they were doing class work or research, but would not use it for general inquiries. If librarians were to list which user information needs are best met by a discovery tool versus a subject-specific database, the types of information needs listed would be much more numerous and diverse, regardless of differences over how to classify them. In addition to helping users choose between a discovery tool or a subject-specific database, libraries will need to conceptualize how users will move in and out of the discovery tool to other library resources, services, and user accounts. While users had no trouble finding the Ask-a- Librarian link in the header, it might have been more informative if users started from a search- results page to see if they would find the right-hand column’s Ask-a-Librarian link or links to library subject guides and database lists. Discovery tools vary in their abilities to connect users with their online library accounts and are changing quickly in this area. This study also provided some interesting observations about discovery tool interfaces. The default setting for EBSCO Discovery Service is a single search box. However, this study suggests that while users desire a single search, they are willing to use multiple interface options. This was supported by log analysis of the library’s locally developed entry widget, Quick Search, in which 28 percent of searches included the use of a drop-down menu. On the first usability task, users left Quick Search’s options set to the default. On other tasks, participants frequently used the drop- down menus and limiters in both Quick Search and EDS. For example, on task 2, which asked them to look for videos, five users looked in the Quick Search format drop-down menu. On the same task within EDS, six users attempted to use the source type facet. Use of limiters was similarly observed by Williams and Foster in their EDS usability study.38 One EDS interface option that was not obvious to participants was the link to change the sort order. When asked to find the most recent article, only two participants changed the sort option. Most others used the date input boxes to limit their search, then selected the first result even thought it was not the most recent one. It is unclear whether the participant assumed the first result was the most recent or whether they could not figure out how to display the most recent sources. Finding a journal title from library homepages has long been a difficult task,39 and this study provided no exception, even with the addition of a discovery tool. It is important to note that the standard EDS implementation would include a “Publications” or “Journals A–Z” link in the header; USABILITY TEST RESULTS FOR A DISCOVERY TOOL IN AN ACADEMIC LIBRARY | FAGAN ET AL 102 in EDS, libraries can customize the text of this link. JMU did not have this type of link enabled in our test, since the hope was that users could find journal titles within the EDS results. However, neither EDS nor the Quick Search widget’s search interfaces offered a way to limit the search to a journal title at the time of this study. During the usability test, four participants changed the field search drop-down menu to “Title” in EDS, and three participants changed the EDS field search drop-down menu to “SO Journal Title/Source,” which limits the search to articles within that journal title. While both of these ideas were good, neither one resulted in a precise results set in EDS for this task unless the user also limited to “JMU Catalog Only,” a nonintuitive option. Since the test, JMU has added a “Journal Titles” option to Quick Search that launches the user’s search into the journal A–Z list (provided by Serials Solutions). In two months after the change (February and March 2011), only 391 searches were performed with this option. This was less than 1 percent of all searches, indicating that while it may be an important task, it is not a popular one. Like many libraries with discovery tools, JMU added federated search capabilities to EDS using EBSCOhost Integrated Search software in an attempt to draw some traffic to databases not included in EDS (or not subscribed to through EBSCO by JMU), such as MLA International Bibliography, Scopus, and Credo Reference. Links to these databases appeared in the upper-right- hand column of EDS during the usability study (see figure 6.) Usage data from EBSCO showed that less than 1 percent of all JMU’s EDS sessions for fall 2010 included any interaction with this area. Likewise, Williams and Foster observed their participants did not use their federated search until explicitly asked to do so.40 Perhaps users faced with discovery tool results simply have no motivation to click on additional database results. Since the usability test, JMU has replaced the right-hand column with static links to Ask-a-Librarian, subject guides, and research database lists. Readers may wonder why one of the most common tasks, finding a specific book title, was not included in this usability study; this was because JMU Libraries posed this task in a concurrent homepage usability study. On that study, twenty of the twenty-five participants used Quick Search to find the title “Pigs in Heaven” and choose the correct call number. Eleven of the twenty used the Quick Search drop-down menu to choose a title search option, further confirming users’ willingness to limit up-front. The average time on this task was just under a minute, and all participants completed this task successfully, so this task was not repeated in the EDS usability test. Other studies have reported trouble with this type of task;41 much could depend on the item chosen as well as the tool’s relevance ranking. User satisfaction with EDS can be summarized from the open-ended post–study questions, from the responses to task 1 (figure 5), and the SUS scale. Answers to the post–study questions indicated participants liked the ability to narrow results, the speed and ease of use, and relevance of the system. A few participants did describe the system as being “overwhelming” or “confusing” because of the many features, which was also supported by the SUS scores. JMU has been using the SUS to understand the relative usability of library systems. The SUS offers a benchmark for system improvement; for example, EBSCO Discovery Service received an SUS of only 37 in spring 2010 (N INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2012 103 = 7) but a 56 on this study in fall 2010 (N = 10). This suggests the interface has become more usable. In 2009, JMU Libraries also used the SUS to test the library catalog’s classic interface as well as a VuFind interface to the library catalog, which received scores of 68 (N = 15) and 80 (N = 14), respectively. The differences between the catalog scores and EDS indicate an important distinction between usability and usefulness, with the latter concept encompassing a system’s content and capabilities. The library catalog is, perhaps, a more straightforward tool than a discovery tool and attempts to provide access to a smaller set of information. It has none of the complexity involved in finding article-level or book chapter information. All else being equal, simpler tools will be more usable. In an experimental study, Tsakonas and Paptheodorou found that while users did not distinguish between the concepts of usability and usefulness, they prefer attributes composing a useful system in contrast to those supporting usability.42 Discovery tools, which support more tasks, must make compromises in usability that simpler systems can avoid. In their study of EDS, Williams and Foster also found overall user satisfaction with EDS. Their participants made positive comments about the interface as well as the usefulness and relevance of the results.43 JMU passed on several suggestions to EBSCO related to EDS based on the test results. EBSCO subsequently added “Audio” and “Video” to the source types, which enabled JMU to add a “Just Videos at JMU” option to Quick Search. While it is confusing that “Audio” and “Video” source types currently behave differently than the others in EDS, in that they limit to JMU’s catalog as well as to the source type, this behavior produces what most local users expect. A previous usability study of WorldCat Local showed users have trouble discriminating between source types in results lists, so the source types facet is important.44 Another piece of feedback provided to EBSCO was that on the task where users needed to choose the most recent result, only two of our participants sorted by date descending. Perhaps the textual appearance of the sort option (instead of a drop-down menu) was not obvious to participants (see figure 6); however, Williams and Foster did not observe this to be an issue in their study.45 FUTURE RESEARCH The findings of this study suggest many avenues for future research. Libraries will need to revisit the scope of their catalogs and other systems to keep up with users’ mental models and information needs. Catalogs and subject-specific databases still perform some tasks much better than discovery tools, but libraries will need to investigate how to situate the discovery tool and specialized tools within their web presence in a way that will make sense to users. When should a user be directed to the catalog versus a discovery tool? What items should libraries continue to include in their catalogs? What role do institutional repositories play in the suite of library tools, and how does the discovery tool connect to them (or include them?) How do library websites begin to make sense of the current state of library search systems? Above all, are users able to find the best resources for their research needs? Although research on searchers’ mental models has been extensive,46 librarians’ mental models have not been studied as such. Yet placing the USABILITY TEST RESULTS FOR A DISCOVERY TOOL IN AN ACADEMIC LIBRARY | FAGAN ET AL 104 discovery tool among the library’s suite of services will involve compromises between these two models. Another area needing research is how to instruct users to work with the large numbers of results returned by discovery tools. In subject-specific databases, librarians often help users measure the success of their strategy—or even their topic—by the number of results returned: in Criminal Justice Abstracts, 5,000 results means a topic is too broad or the search strategy needs refinement. In a discovery tool, a result set this large will likely have some good results on the first couple of pages if sorted by relevance; however, users will still need to know how to grow or reduce their results sets. Participants in this study showed a willingness to use limiters and other interface features, but not always the most helpful ones. When asked to narrow a broad subject on task 3 of this study, only one participant chose to use the “Subject” facet even when the subtopic, audiology, was clearly available. Most added search terms. It will be important for future studies to investigate the best way for users to narrow large results set in a discovery tool. This study also suggested possible areas of investigation for future user studies. One interesting finding related to this study’s users’ information contexts was that when users were asked to search on their last research topic, it did not always match up with their major: a voice performance student searched on “current issues in Russia,” and the hospitality major searched on “aphasia.” To what extent does a discovery tool help or hinder students who are searching outside their major area of study? One of JMU’s reference librarians noted that while he would usually teach a student majoring in a subject how to use that subject’s specific indexes, as opposed to a discovery tool, a student outside the major might not need to learn the subject-specific indexes for that subject and could be well served by the discovery tool. Future studies could also investigate the usage and usability of discovery tool features in order to continue informing library customizations and advice to vendors. For example, this study did not have a task related to logging into a patron account or requesting items, but that would be good to investigate in a follow-up study. Another area ripe for further investigation is discovery tool limiters. This study’s participants frequently attempted to use limiters, but didn’t always choose the correct ones for the task. What are the ideal design choices for making limiters intuitive? This study found almost no use of the embedded federated search add-on: is this true at other institutions? Finally, this study and others reveal difficulty in distinguishing source types. Development and testing of interface enhancements to support this ability would be helpful to many libraries’ systems. CONCLUSION This usability test of a discovery tool at James Madison University did not reveal as many interface-specific findings as it did questions about the role of discovery tools in libraries. Users were generally able to navigate through the Quick Search and EDS interfaces and complete tasks successfully. Tasks that are challenging in other interfaces, such as locating journal articles and discriminating between source types, continued to be challenging in a discovery tool interface. INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2012 105 This usability test suggested that while some interface features were heavily used, such as drop - down limits and facets, other features were not used, such as federated search results. As discovery tools continue to grow and evolve, libraries should continue to conduct usability tests, both to find usability issues and to understand user behavior and satisfaction. Although discovery tools challenge libraries to think not only about access but also about the best research pathways for users, they provide users with a search that more closely matches their expectations. ACKNOWLEDGEMENT The authors would like to thank Patrick Ragland for his editorial assistance in preparing this manuscript. CORRECTION April 12, 2018: At the request of the author, this article was revised to remove a link to a website. REFERENCES 1. Emily Alling and Rachael Naismith, “Protocol Analysis of a Federated Search Tool: Designing for Users,” Internet Reference Services Quarterly 12, no. 1 (2007): 195, http://scholarworks.umass.edu/librarian_pubs/1/ (accessed Jan. 11, 2012); Frank Cervone, “What We've Learned From Doing Usability Testing on OpenURL Resolvers and Federated Search Engines,” Computers in Libraries 25, no. 9 (2005): 10 ; Sara Randall, “Federated Searching and Usability Testing: Building the Perfect Beast,” Serials Review 32, no. 3 (2006): 181–82, doi:10.1016/j.serrev.2006.06.003; Ed Tallent, “Metasearching in Boston College Libraries —A Case Study of User Reactions,” New Library World 105, no. 1 (2004): 69-75, DOI: 10.1108/03074800410515282. 2. S. C. Williams and A. K. Foster, “Promise Fulfilled? An EBSCO Discovery Service Usability Study,” Journal of Web Librarianship 5, no. 3 (2011), http://www.tandfonline.com/doi/pdf/10.1080/19322909.2011.597590 (accessed Jan. 11, 2012). 3. Janet K. Chisman, Karen R. Diller, and Sharon L. Walbridge, “Usability Testing: A Case Study,” College & Research Libraries 60, no. 6 (November 1999): 552–69, http://crl.acrl.org/content/60/6/552.short (accessed Jan. 11, 2012); Frances C. Johnson and Jenny Craven, “Beyond Usability: The Study of Functionality of the 2.0 Online Catalogue,” New Review of Academic Librarianship 16, no. 2 (2010): 228–50, DOI: 10.1108/00012531011015217 (accessed Jan, 11, 2012); Jennifer E. Knievel, Jina Choi Wakimoto, and Sara Holladay, “Does Interface Design Influence Catalog Use? A Case Study,” College & Research Libraries 70, no. 5 (September 2009): 446–58, http://crl.acrl.org/content/70/5/446.short (accessed Jan. 11, 2012); Jia Mi and Cathy Weng, “Revitalizing the Library OPAC: Interface, Searching, and Display Challenges,” Information Technology & Libraries 27, no. 1 (March 2008): 5–22, http://0- http://scholarworks.umass.edu/librarian_pubs/1/ http://www.tandfonline.com/doi/pdf/10.1080/19322909.2011.597590 http://crl.acrl.org/content/60/6/552.short http://crl.acrl.org/content/70/5/446.short http://0-www.ala.org.sapl.sat.lib.tx.us/ala/mgrps/divs/lita/publications/ital/27/1/mi.pdf USABILITY TEST RESULTS FOR A DISCOVERY TOOL IN AN ACADEMIC LIBRARY | FAGAN ET AL 106 www.ala.org.sapl.sat.lib.tx.us/ala/mgrps/divs/lita/publications/ital/27/1/mi.pdf (accessed Jan. 11, 2012). 4. Karen Calhoun, “The Changing Nature of the Catalog and its Integration with Other Discovery Tools,” http://www.loc.gov/catdir/calhoun-report-final.pdf (accessed Mar. 11, 2011). 5. Dee Ann Allison, “Information Portals: The Next Generation Catalog,” Journal of Web Librarianship 4, no. 1 (2010): 375–89, http://digitalcommons.unl.edu/cgi/viewcontent.cgi?article=1240&context=libraryscience (accessed January 11, 2012); Marshall Breeding, “The State of the Art in Library Discovery,” Computers in Libraries 30, no. 1 (2010): 31–34; C. P Diedrichs, “Discovery and Delivery: Making it Work for Users . . . Taking the Sting out of Serials!” (lecture, North American Serials Interest Group, Inc. 23rd Annual Conference, Phoenix, Arizona, June 5–8, 2008), DOI: 10.1080/03615260802679127; Ian Hargraves, “Controversies of Information Discovery,” Knowledge, Technology & Policy 20, no. 2 (Summer 2007): 83, http://www.springerlink.com/content/au20jr6226252272/fulltext.html (accessed Jan. 11, 2012); Jane Hutton, “Academic Libraries as Digital Gateways: Linking Students to the Burgeoning Wealth of Open Online Collections,” Journal of Library Administration 48, no. 3 (2008): 495–507, DOI: 10.1080/01930820802289615; OCLC, “Online Catalogs: What Users and Librarians Want: An OCLC Report,” http://www.oclc.org/reports/onlinecatalogs/default.htm (accessed Mar. 11 2011). 6. C. J. Belliston, Jared L. Howland, and Brian C. Roberts, “Undergraduate Use of Federated Searching: A Survey of Preferences and Perceptions of Value-Added Functionality,” College & Research Libraries 68, no. 6 (November 2007): 472–86, http://crl.acrl.org/content/68/6/472.full.pdf+html (accessed Jan. 11, 2012); Judith Z. Emde, Sara E. Morris, and Monica Claassen‐Wilson, “Testing an Academic Library Website for Usability with Faculty and Graduate students,” Evidence Based Library & Information Practice 4, no. 4 (2009): 24– 36, http://kuscholarworks.ku.edu/dspace/bitstream/1808/5887/1/emdee_morris_CW.pdf (accessed Jan. 11,2012); Karla Saari Kitalong, Athena Hoeppner, and Meg Scharf, “Making Sense of an Academic Library Web Site: Toward a More Usable Interface for University Researchers,” Journal of Web Librarianship 2, no. 2/3 (2008): 177–204, http://www.tandfonline.com/doi/abs/10.1080/19322900802205742 (accessed Jan. 11, 2012); Ed Tallent, “Metasearching in Boston College Libraries—A Case Study of User Reactions,” New Library World 105, no. 1 (2004): 69–75, DOI: 10.1108/03074800410515282; Rong Tang, Ingrid Hsieh-Yee, and Shanyun Zhang, “User Perceptions of MetaLib Combined Search: An Investigation of How Users Make Sense of Federated Searching,” Internet Reference Services Quarterly 12, no. 1 (2007): 211–36, http://www.tandfonline.com/doi/abs/10.1300/J136v12n01_11 (accessed Jan. 11, 2012). 7. Jody Condit Fagan, “Usability Studies of Faceted Browsing: A Literature Review,” Information Technology & Libraries 29, no. 2 (2010): 58–66, http://0-www.ala.org.sapl.sat.lib.tx.us/ala/mgrps/divs/lita/publications/ital/27/1/mi.pdf http://www.loc.gov/catdir/calhoun-report-final.pdf http://digitalcommons.unl.edu/cgi/viewcontent.cgi?article=1240&context=libraryscience http://www.springerlink.com/content/au20jr6226252272/fulltext.html http://www.oclc.org/reports/onlinecatalogs/default.htm http://crl.acrl.org/content/68/6/472.full.pdf+html http://kuscholarworks.ku.edu/dspace/bitstream/1808/5887/1/emdee_morris_CW.pdf http://www.tandfonline.com/doi/abs/10.1080/19322900802205742 http://www.tandfonline.com/doi/abs/10.1300/J136v12n01_11 INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2012 107 http://web2.ala.org/ala/mgrps/divs/lita/publications/ital/29/2/fagan.pdf (accessed Jan. 11, 2012). 8. Birong Ho, Keith Kelley, and Scott Garrison, “Implementing VuFind as an Alternative to Voyager’s Web Voyage Interface: One Library’s Experience,” Library Hi Tech 27, no. 1 (2009): 8292, DOI: 10.1108/07378830910942946 (accessed Jan. 11, 2012). 9. Tamar Sadeh, “User Experience in the Library: A Case Study,” New Library World 109, no. 1 (2008): 7–24, DOI: 10.1108/03074800810845976 (accessed Jan. 11, 2012). 10. Tod A. Olson, “Utility of a Faceted Catalog for Scholarly Research,” Library Hi Tech 25, no. 4 (2007): 550–61, DOI: 10.1108/07378830710840509 (accessed Jan. 11, 2012). 11. Allison, “Information Portals,” 375–89. 12. Marshall Breeding, “Plotting a New Course for Metasearch,” Computers in Libraries 25, no. 2 (2005): 27. 13. Ibid. 14. Dennis Brunning and George Machovec, “Interview About Summon with Jane Burke, Vice President of Serials Solutions,” Charleston Advisor 11, no. 4 (2010): 60–62; Dennis Brunning and George Machovec, “An Interview with Sam Brooks and Michael Gorrell on the EBSCOhost Integrated Search and EBSCO Discovery Service,” Charleston Advisor 11, no. 3 (2010): 62–65, http://www.ebscohost.com/uploads/discovery/pdfs/topicFile-121.pdf (accessed Jan. 11, 2012). 15. Ronda Rowe, “Web-Scale Discovery: A Review of Summon, EBSCO Discovery Service, and WorldCat Local,” Charleston Advisor 12, no. 1 (2010): 5–10; K. Stevenson et al., “Next-Generation Library Catalogues: Reviews of Encore, Primo, Summon and Summa,” SERIALS 22, no. 1 (2009): 68–78. 16. Jason Vaughan, “Chapter 7: Questions to Consider,” Library Technology Reports 47, no. 1 (2011): 54; Paula L. Webb and Muriel D. Nero, “OPACs in the Clouds,” Computers in Libraries 29, no. 9 (2009): 18. 17. Jason Vaughan, “Investigations into Library Web Scale Discovery Services,” Articles (Libraries), paper 44 (2011), http://digitalcommons.library.unlv.edu/lib_articles/44. 18. Marshall Breeding, “The State of the Art in Library Discovery,” 31–34; Sharon Q. Yang and Kurt Wagner, “Evaluating and Comparing Discovery Tools: How Close are We Towards Next Generation Catalog?” Library Hi Tech 28, no. 4 (2010): 690–709. 19. Allison, “Information Portals,” 375–89. 20. Breeding, “The State of the Art in Library Discovery,” 31–34. 21. Galina Letnikova, “Usability Testing of Academic Library Websites: A Selective Bibliography,” Internet Reference Services Quarterly 8, no. 4 (2003): 53–68. http://web2.ala.org/ala/mgrps/divs/lita/publications/ital/29/2/fagan.pdf http://www.ebscohost.com/uploads/discovery/pdfs/topicFile-121.pdf http://digitalcommons.library.unlv.edu/lib_articles/44 USABILITY TEST RESULTS FOR A DISCOVERY TOOL IN AN ACADEMIC LIBRARY | FAGAN ET AL 108 22. Jeffrey Rubin and Dana Chisnell, Handbook of Usability Testing: How to Plan, Design, and Conduct Effective Tests, 2nd ed. (Indianapolis, IN: Wiley, 2008); Joseph S. Dumas and Janice Redish, A Practical Guide to Usability Testing, rev. ed. (Portland, OR: Intellect, 1999). 23. Nicole Campbell, ed., Usability Assessment of Library-Related Web Sites: Methods and Case Studies (Chicago: Library & Information Technology Association, 2001); Elaina Norlin and C. M. Winters, Usability Testing for Library Web Sites: A Hands-On Guide (Chicago: American Library Association, 2002). 24. Jennifer L. Ward, Steve Shadle, and Pam Mofield, “User Experience, Feedback, and Testing,” Library Technology Reports 44, no. 6 (2008): 17. 25. Ibid. 26. Michael Boock, Faye Chadwell, and Terry Reese, “WorldCat Local Task Force Report to LAMP,” http://hdl.handle.net/1957/11167 (accessed Mar. 11 2011). 27. Bob Thomas and Stefanie Buck, “OCLC’s WorldCat Local Versus III’s WebPAC: Which Interface is Better at Supporting Common User Tasks?” Library Hi Tech 28, no. 4 (2010): 648–71. 28. OCLC, “Some Findings from WorldCat Local Usability Tests Prepared for ALA Annual,” http://www.oclc.org/worldcatlocal/about/213941usf_some_findings_about_worldcat_local.pdf (accessed Mar. 11, 2011). 29. Ibid., 2. 30. Doug Way, “The Impact of Web-Scale Discovery on the Use of a Library Collection,” Serials Review 36, no. 4 (2010): 21420. 31. North Carolina State University Libraries, “Final Summon User Research Report,” http://www.lib.ncsu.edu/userstudies/studies/2010_summon/ (accessed Mar. 28, 2011). 32. Alesia McManus, “The Discovery Sandbox: Aleph and Encore Playing Together,” http://www.nercomp.org/data/media/Discovery%20Sandbox%20McManus.pdf (accessed Mar. 28, 2011); PRWeb, “Deakin University in Australia Chooses EBSCO Discovery Service,” http://www.prweb.com/releases/Deakin/ChoosesEDS/prweb8059318.htm (accessed Mar. 28, 2011); University of Manitoba, “Summon Usability: Partnering with the Vendor,” http://prezi.com/icxawthckyhp/summon-usability-partnering-with-the-vendor (accessed Mar. 28, 2011). 33. Williams and Foster, “Promise Fulfilled?” 34. Jakob Nielsen, “Why You Only Need to Test with 5 Users,” http://www.useit.com/alertbox/20000319.html (accessed Aug. 20, 2011). 35. John Brooke, “SUS: A ‘Quick and Dirty’ Usability Scale,” in Usability Evaluation in Industry, ed. P. W. Jordanet al. (London: Taylor & Francis, 1996), http://www.usabilitynet.org/trump/documents/Suschapt.doc (accessed Apr. 6, 2011). 36. Williams and Foster, “Promise Fulfilled?” http://hdl.handle.net/1957/11167 http://www.oclc.org/worldcatlocal/about/213941usf_some_findings_about_worldcat_local.pdf http://www.lib.ncsu.edu/userstudies/studies/2010_summon/ http://www.nercomp.org/data/media/Discovery%20Sandbox%20McManus.pdf http://www.prweb.com/releases/Deakin/ChoosesEDS/prweb8059318.htm http://prezi.com/icxawthckyhp/summon-usability-partnering-with-the-vendor/ http://www.useit.com/alertbox/20000319.html http://www.usabilitynet.org/trump/documents/Suschapt.doc INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2012 109 37. Seikyung Jung et al., “LibraryFind: System Design and Usability Testing of Academic Metasearch System,” Journal of the American Society for Information Science & Technology 59, no. 3 (2008): 375–89; Williams and Foster, “Promise Fulfilled?”; Laura Wrubel and Kari Schmidt, “Usability Testing of a Metasearch Interface: A Case Study,” College & Research Libraries 68, no. 4 (2007): 292–311. 38. Williams and Foster, “Promise Fulfilled?” 39. Letnikova, “Usability Testing of Academic Library Websites,” 53–68; Tom Ipri, Michael Yunkin, and Jeanne M. Brown, “Usability as a Method for Assessing Discovery,” Information Technology & Libraries 28, no. 4 (2009): 181–86; Susan H. Mvungi, Karin de Jager, and Peter G. Underwood, “An Evaluation of the Information Architecture of the UCT Library Web Site,” South African Journal of Library & Information Science 74, no. 2 (2008): 171–82. 40. Williams and Foster, “Promise Fulfilled?” 41. Ward et al., “User Experience, Feedback, and Testing,” 17. 42. Giannis Tsakonas and Christos Papatheodorou, “Analysing and Evaluating Usefulness and Usability in Electronic Information Services,” Journal of Information Science 32, no. 5 (2006): 400– 419. 43. Williams and Foster, “Promise Fulfilled?” 44. Bob Thomas and Stefanie Buck, “OCLC’s WorldCat Local Versus III’s WebPAC: Which Interface is Better at Supporting Common User Tasks?” Library Hi Tech 28, no. 4 (2010): 648–71. 45. Williams and Foster, “Promise Fulfilled?” 46. Tracy Gabridge, Millicent Gaskell, and Amy Stout, “Information Seeking through Students’ Eyes: The MIT Photo Diary Study,” College & Research Libraries 69, no. 6 (2008): 510–22; Yan Zhang, “Undergraduate Students’ Mental Models of the Web as an Information Retrieval System,” Journal of the American Society for Information Science & Technology 59, no. 13 (2008): 2087–98; Brenda Reeb and Susan Gibbons, “Students, Librarians, and Subject Guides: Improving a Poor Rate of Return,” Portal: Libraries and the Academy 4, no. 1 (2004): 123–30; Alexandra Dimitroff, “Mental Models Theory and Search Outcome in a Bibliographic Retrieval System,” Library & Information Science Research 14, no. 2 (1992): 141–56. USABILITY TEST RESULTS FOR A DISCOVERY TOOL IN AN ACADEMIC LIBRARY | FAGAN ET AL 110 APPENDIX A Task Pre–Test 1: Please indicate your JMU status (1st Year, 2nd Year, 3rd Year, 4th Year, Graduate Student, Faculty, Other) Pre–Test 2: Please list your major(s) or area of teaching (open ended) Pre–Test 3: How often do you use the library website? (Less than once a month, 1–3 visits per month, 4–6 visits per month, more than 7 visits per month) Pre–Test 4: What are some of the most common things you currently do on the library website? (open ended) Pre–Test 5: How much of the library’s resources do you think the Quick Search will search? (Less than a third, Less than half, Half, Most, All) Pre–Test 6: Have you used LEO? (show screenshot on printout) (Yes, No, Not Sure) Pre–Test 7: Have you used EBSCO? (show screenshot on printout) (Yes, No, Not Sure) Pre–Test 8 (Student participants only): How often have you used library web resources for course assignments in your major? (Rarely/Never, Sometimes, Often, Very Often) Pre–Test 9 (Student participants only): How often have you used library resources for course assignments outside of your major? (Rarely/Never, Sometimes, Often, Very Often) Pre–Test 10 (Student participants only): Has a librarian spoken to a class you've attended about library research? (Yes, No, Not Sure) Pre–Test 11 (Faculty participants only): How often do you give assignments that require the use of library resources? (Rarely/Never, Sometimes, Often, Very Often) Pre–Test 12 (Faculty participants only): How often have you had a librarian visit one of your classes to teach your students about library research? (Rarely/Never, Sometimes, Often, Very Often) Post–Test 1: When would you use this search tool? Post–Test 2: When would you not use this search tool? Post–Test 3: What would you say are the major advantages of Quick Search? INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2012 111 Post–Test 4: What would you say are the major problems with Quick Search? Post–Test 5: If you were unable to find an item using Quick Search/EBSCO Discovery Service what would your next steps be? Post–Test 6: Do you think the name “Quick Search” is fitting for this search tool? If not, what would you call it? Post–Test 7 (Faculty participants only): If you knew students would use this tool to complete assignments would you alter how you structure assignments and how? APPENDIX B Task Purpose • Practice Task: Use Quick Search to search a topic relating to your major / discipline or another topic of interest to you. If you were writing a paper on this topic how satisfied would you be with these results? Help users get comfortable with the usability testing software. Also, since the first time someone uses a piece of software involves behaviors unique to that case, we wanted participants’ first use of EDS to be with a practice task. 1. What was the last thing you searched for when doing a research assignment for class? Use Quick Search to re-search for this. Tell us how this compared to your previous experience. Having participants re-search a topic with which they had some experience and interest would motivate them to engage with results and provide a comparison point for their answer. We hoped to learn about their satisfaction with relevance, quality, and quantity of results. (user behavior, user satisfaction) 2. Using Quick Search find a video related to early childhood cognitive development. When you’ve found a suitable video recording, click ANSWER and copy and paste the title. This task aimed to determine whether participants could complete the task, as well as show us which features they used in their attempts. (usability, user behavior) 3. Search on speech pathology and find a way to limit your search results to audiology. Then, limit your search results to peer reviewed sources. How satisfied are you with the results? Since there are several ways to limit results in EDS, we designed this task to show us which limiters participants tried to use, and which limiters resulted in success. We also hoped to learn about whether they thought the limiters provided satisfactory results. (usability, user behavior, user satisfaction) USABILITY TEST RESULTS FOR A DISCOVERY TOOL IN AN ACADEMIC LIBRARY | FAGAN ET AL 112 4. You need more recent sources. Please limit these search results to the last 5 years, then select the most recent source available. Click Finished when you are done. Since there are several ways to limit by date in EDS, we designed this task to show us which limiters participants tried to use, and which limiters resulted in success. (usability, user behavior) 5. Find a way to ask a JMU librarian for help using this search tool. After you’ve found the correct web page, click FINISHED. We wanted to determine whether the user could complete this task, and which pathway they chose to do it. (usability, user behavior) 6. Locate the journal Yachting and Boating World. What are the coverage dates? Is this journal available in online full text? We wanted to determine whether the user could locate a journal by title. (usability) 7. You need to look up the sculpture Genius of Mirth. You have been told that the library database, Camio, would be the best place to search for this. Locate this database and find the sculpture. We wanted to know whether users who knew they needed to use a specific database could find that database from within the discovery tool. (usability, user behavior). 8. Use Quick Search to find 2 books and 2 recent peer reviewed articles (from the last 5 years) on rheumatoid arthritis. When you have found suitable source click ANSWER and copy and paste the titles. Click BACK TO WEBPAGE if you need to return to your search results. These two tasks were intended to show us how users completed a common, broad task with and without a discovery tool, whether they would be more successful with or without the tool, and what barriers existed with and without the tool (usability, user behavior) 9. Without using Quick Search, find 2 books and 2 recent peer reviewed articles (from the last 5 years) on rheumatoid arthritis. When you have found suitable sources click ANSWER and copy and paste the titles. Click BACK TO WEBPAGE if you need to return to your search results. 1859 ---- Copyright: Regulation Out of Line with Our Digital Reality? Abigail J. McDermott INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2012 7 ABSTRACT This paper provides a brief overview of the current state of copyright law in the United States, focusing on the negative impacts of these policies on libraries and patrons. The article discusses four challenges current copyright law presents to libraries and the public in general, highlighting three concrete ways intellectual property law interferes with digital library services and systems. Finally, the author suggests that a greater emphasis on copyright literacy and a commitment among the library community to advocate for fairer policies is vital to correcting the imbalance between the interests of the public and those of copyright holders. INTRODUCTION In July 2010, the library community applauded when Librarian of Congress James H. Billington announced new exemptions to the Digital Millennium Copyright Act (DMCA). Those with visual disabilities and the librarians who serve them can now circumvent digital rights management (DRM) software on e-books to activate a read-aloud function.1 In addition, higher education faculty in departments other than film and media studies can now break through DRM software to include high-resolution film clips in class materials and lectures. However, their students cannot, since only those who are pursuing a degree in film can legally do the same.2 That means that English students who want to legally include high-resolution clips from the critically acclaimed film Sense and Sensibility in their final projects on Jane Austin’s novel will have to wait another three years, when the Librarian of Congress will again review the DMCA. The fact that these new exemptions to the DMCA were a cause for celebration is one indicator of the imbalanced state of the copyright regulations that control creative intellectual property in this country. As the consumer-advocacy group Public Knowledge asserted, “We continue to be disappointed that the Copyright Office under the Digital Millennium Copyright Act can grant extremely limited exemptions and only every three years. This state of affairs is an indication that the law needs to be changed.”3 This paper provides a brief overview of the current state of U.S. copyright law, especially developments during the past fifteen years, with a focus on the negative impact these policies have had and will continue to have on libraries, librarians, and the patrons they serve. This paper does not provide a comprehensive and impartial primer on copyright law, a complex Abigail J. McDermott (ajmcderm@umd.edu) is Graduate Research Associate, The Information Policy and Access Center (iPAC), and Masters Candidate in Library Science, University of Maryland, College Park. mailto:ajmcderm@umd.edu COPYRIGHT: REGULATION OUT OF LINE WITH OUR DIGITAL REALITY | MCDERMOTT 8 and convoluted topic, instead identifying concerns about the effects an out-of-balance intellectual property system is having on the library profession, library services, and creative expression in our digital age. As with any area of public policy, the battles over intellectual property issues create an every fluctuating copyright environment, and therefore, this article is written to be current with policy developments as of October 2011. Finally, this paper recommends that librarians seek to better educate themselves about copyright law, and some innovative responses to an overly restrictive system, so that we can effectively advocate on our own behalf, and better serve our patrons. THE STATE OF U.S. COPYRIGHT LAW Copyright law is a response to what is known as the “progress clause” of the Constitution, which charges Congress with the responsibility “to promote the Progress of Science and the useful Arts . . . to this end, copyright assures authors the right to their original expression, but encourages others to build freely upon the ideas and information conveyed by a work.”4 Fair use, a statutory exception to U.S. copyright law, is a complex subject, but a brief examination of the principle gets to the heart of copyright law itself. When determining fair use, courts consider 1. the purpose and character of the use; 2. the nature of the copyrighted work; 3. the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and 4. the effect of the use upon the potential market for the copyrighted work.5 While fair use is an “affirmative defense” to copyright infringement,6 invoking fair use is not the same as admitting to copyright infringement. Teaching, scholarship, and research, as well as instances in which the use is not-for-profit and noncommercial, are all legitimate examples of fair use, even if fair use is determined on a case-by-case basis.7 Despite the byzantine nature of copyright law, there are four key issues that present the greatest challenges and obstacles to librarians and people in general: the effect of the DMCA on the principle of fair use; the dramatic extension of copyright terms codified by the Sonny Bono Copyright Term Extension Act; the disappearance of the registration requirement for copyright holders; and the problem of orphan works. The Digital Millennium Copyright Act (DMCA) The DMCA has been controversial since its passage in 1998. Title I of the DMCA implements two 1996 World Intellectual Property Organization (WIPO) treaties that obligate member states to enforce laws that make tampering with DRM software illegal. The DMCA added chapter 12 to the U.S. Copyright Act (17 U.S.C. §§ 1201–1205), and it criminalized the trafficking of “technologies designed to circumvent access control devices protecting copyrighted material from unauthorized INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2012 9 copying or use.”8 While film studios, e-book publishers, and record producers have the right to protect their intellectual property from illegal pirating, the DMCA struck a serious blow to the principle of fair use, placing librarians and others who could likely claim fair use when copying a DVD or PDF file in a Catch-22 scenario. While the act of copying the file may be legal according to fair use, breaking through any DRM technology that prevents that copying is now illegal.9 The Sonny Bono Copyright Term Extension Act While the Copyright Act of 1790 only provided authors and publishers with twenty-eight years of copyright protection, the Sonny Bono Copyright Term Extension Act of 1998 increased the copyright terms of all copyrighted works that were eligible for renewal in 1998 to ninety-five years after the year of the creator’s death. In addition, all works copyrighted on or after January 1, 1978, now receive copyright protection for the life of the creator plus seventy years (or ninety-five years from the date of publication for works produced by multiple creators).10 Jack Valenti, former president of the Motion Picture Association of American, was not successful in pushing copyright law past the bounds of the Constitution, which mandates that copyright be limited, although he did try to circumvent this Constitutional requirement by suggesting that copyright terms last forever less one day.11 The Era of Automatic Copyright Registration Perhaps the most problematic facet of modern U.S. copyright law appears at first glance to be the most innocuous. The Copyright Act of 1976 did away with the registration requirement established by the Copyright Act of 1790.12 That means that any creative work “fixed in any tangible medium of expression” is automatically copyrighted at the moment of its creation.13 That includes family vacation photos stored on a computer hard drive; they are copyrighted and your permission is required to use them. The previous requirement of registration meant authors and creators had to actively register their works, so anything that was not registered entered the public domain, replenishing that important cultural realm.14 Now that copyright attaches at the moment an idea is expressed through a cocktail napkin doodle or an outline, virtually nothing new enters the public domain until its copyright term expires—at least seventy years later. In fact, nothing new will enter the public domain through copyright expiration until 2019. Until then, the public domain is essentially frozen in the year 1922.15 The Problem of Orphan Works In addition, the incredibly long copyright terms that apply to all books, photographs, and sound recordings have created the problem of orphan works. Orphan works are those works that are under copyright protection, but whose owners are difficult or impossible to locate, often due to death.16 These publications are problematic for researchers, librarians, and the public in general: Orphan works are perceived to be inaccessible because of the risk of infringement liability that a user might incur if and when a copyright owner subsequently appears. Consequently, many works that are, COPYRIGHT: REGULATION OUT OF LINE WITH OUR DIGITAL REALITY | MCDERMOTT 10 in fact, abandoned by owners are withheld from public view and circulation because of uncertainty about the owner and the risk of liability.17 If copyright expired with the death of the author, or if there were a clause that would allow these works to pass into the public domain if the copyright holder’s heirs did not actively renew copyright for another term, then these materials would be far less likely to fall into legal limbo. Currently, many are protected despite the fact that acquiring permission to use them is all but impossible. A study of orphan works in the collections of United Kingdom public sector institutions found that these works are likely to have little commercial value, but high “academic and cultural significance,” and when contacted, these difficult-to-trace rights holders often grant permission for reproduction without asking for compensation.18 Put another way, orphan works are essentially “locking up culture and other public sector content and preventing organizations from serving the public interest.”19 The row that arose in September 2011 between the HathiTrust institutions and the Authors Guild over the University of Michigan’s orphan works digitization project, with J. R. Salamanca’s long- out-of-print 1958 novel The Lost Country serving as the pivot point in the dispute, is an example of the orphan works problem. The fact that University of Michigan Associate University Librarian John Price Wilkin was forced to assure the public that “no copyrighted books were made accessible to any students” illustrates the absurdity in arguing over whether it’s right to digitize books that are no longer even accessible in their printed form.20 LIBRARIES, DIGITIZATION, AND COPYRIGHT LAW: THE QUIET CRISIS While one can debate if U.S. copyright law is still oriented toward the public good, the more relevant question in this context is the effect copyright law has on the library profession. DRM technology can get in the way of serving library patrons with visual disabilities and every library needs to place a copyright disclaimer on the photocopiers, but how much more of a stumbling block is intellectual property law to librarians in general, and the advance of library systems and technology in particular? The answer is undeniably that current U.S. copyright legislation places obstacles in the way of librarians working in all types of libraries. While there are many ways that copyright law affects library services and collections in this digital area, three challenges are particularly pressing: the problem of ownership and licensing of digital content or collections; the librarian as de facto copyright expert; and copyright law as it relates to library digitization programs generally, and the Google Book settlement in particular. Digital Collections: Licenses Replace Ownership In the past, people bought a book, and they owned that copy. There was little they could accidentally or unknowingly do to infringe on the copyright holder’s rights. Likewise, when physical collections were their only concern, librarians could rely on Sections 108 and 109 of the copyright law to protect them from liability when they copied a book or other work and when they loaned materials in their collections to patrons.21 Today, we live partly in the physical world and https://exch.mail.umd.edu/owa/WebReadyViewBody.aspx?t=att&id=RgAAAADXsLSgBeEwTJ9q0yHNKIt2BwBoUJgPO3tVSoU0x%2bkwIYfQALrqJtSLAABoUJgPO3tVSoU0x%2bkwIYfQAPIULedYAAAJ&attid0=EACjse6ZzPHuQ6QbFqVhBhu8&attcnt=1&pn=1#footnote19#footnote19 INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2012 11 partly in the digital world, reaching out and connecting to each other across fiber optic lines in the same way we once did around the water cooler. Likewise, the digital means of production are widely distributed. In a multimedia world, where sharing an informative or entertaining video clip is as easy as embedding a link onto someone’s Facebook wall, the temptation to infringe on rights by distributing, reproducing, or displaying a creative work is all too common, and all too easy.22 Many librarians believe that disclaimers on public-access computer terminals will protect them from lawsuit, but they do not often consider placing such disclaimers on their CD or DVD collections. Yet a copyright holder would not have to prove the library is aware of piracy to accuse the library of vicarious infringement of copyright. The copyright holder may even be able to argue that the library sees some financial gain from this piracy if the existence of the material that is being pirated serves as the primary reason a patron visits the library.23 Even the physical CD collection in the public library can place the institution in danger of copyright infringement; yet the copyright challenges raised by cutting-edge digital resources, like e-books, are undoubtedly more complicated. E-books are replacing traditional books in many contexts. Like most digital works today, e-books are licensed, not purchased outright. The problem licensing presents to libraries is that licensed works are not sold, they are granted through contracts, and contracts can change suddenly and negate fair-use provisions of U.S. copyright law.24 While libraries are now adept at negotiating contracts with subscription database providers, e-books are in many ways even more difficult to manage, with many vendors requiring that patrons delete or destroy the licensed content on their personal e-readers at the end of the lending period.25 The entire library community was rocked by HarperCollins’s February 2011 decision to limit licenses on e-books offered through library e- book vendors like OverDrive to twenty-six circulations, with many librarians questioning the publisher’s assertion that this seemingly arbitrary limitation is related to the average lifespan of a single print copy.26 License holders have an easy time arguing that any use of their content without paying fees is a violation of their copyright. That is not the case when a fair use argument is justified, and while many in the library community may acquiesce to these arguments, “in recent cases, courts have found the use of a work to be fair despite the existence of a licensing market.”27 When license agreements are paired with DRM technology, libraries may find themselves managing thousands of micropayments to allow their users to view, copy, move, print, or embed, for example, the PDF of a scholarly journal article.28 In the current climate of reduced staff and shrinking budgets, managing these complex licensing agreements has the potential to cripple many libraries. The Librarian as Accidental Copyright Czar During a Special Libraries Association (SLA) Q&A session on copyright law in the digital age, the questions submitted to the panel came from librarians working in hospitals, public libraries, academic libraries, and even law libraries. Librarians are being thrust into the position of de facto copyright expert. One of the speakers mentioned that she must constantly remind the lawyers at COPYRIGHT: REGULATION OUT OF LINE WITH OUR DIGITAL REALITY | MCDERMOTT 12 the firm she works for that they should not copy and paste the full text of news or law review journal articles into their e-mails, and instead, they should send a link. The basis of her argument is the third factor of fair use mentioned earlier: the amount or substantiality of the portion of the copyrighted work being used.29 Since fair use is not a “bright line” principle, the more factors you have on your side the better when you are using a copyrighted work without the owners express permission.30 Librarians working in any institution must seek express permission from copyright holders for any video they wish to post, or embed, on library-managed websites. E-reserves and streaming video, mainstays of many educators and librarians seeking to capture the attention of this digital generation, have become bright red targets for litigious copyright holders who want to shrink the territory claimed under the fair-use banner even further. Many in the library community are aware of the Georgia State University e-reserves lawsuit, Cambridge University Press et al. v. Patton, in which a group of academic publishers have accused the school of turning its e-reserves system into a vehicle for intentional piracy.31 University librarians are implicated for not providing sufficient oversight. It has come to light that the Association of American Publishers (AAP) approached other schools, including Cornell, Hofstra, Syracuse, and Marquette, before filing a suit against Georgia State. Generally, the letters come from AAPs outside counsel and are accompanied by “the draft of a federal court legal complaint that alleges copyright infringement.”32 The AAP believes that e-reserves are by nature an infringement of copyright law, so they demand these universities work with their association to draft guidelines for electronic content that support AAPs “cost-per-click theory of contemporary copyright: no pay equals no click.”33 It seems that Georgia State was not willing to quietly concede to AAP’s view on the matter, and they are now facing the association in court.34 A decision in this case was pending at the time this article went to press. The case brought by the Association for Information and Media Equipment (AIME) against UCLA is similar, except it focuses on the posting of videos so they can be streamed by students on password-protected university websites that do not allow the copying or retention of the videos.35 UCLA argued that the video streaming services for students are protected by the Technology Education and Copyright Harmonization (TEACH) Act of 2002, which is the same act that allows all libraries to offer patrons online access to electronic subscription databases off-site through a user-authentication system.36 In addition, UCLA argues that it is simply allowing its students to “time shift” these videos, a practice deemed not to infringe on copyright law by the Supreme Court in its landmark Sony Corp. v. Universal City Studios, Inc. decision of 1984.37 The American Library Association (ALA), Association of Research Libraries (ARL), and the Association of College and Research Libraries (ACRL) jointly published an opinion supporting UCLA in this case. Many in the wider library community sympathized with UCLA’s library administrators, who cite budget cuts that reduced hours at the school’s media laboratory as one reason they must now offer students a video-streaming option.38 In the end, the case was dismissed, mostly due to the lack of standing AIME had to bring the suite against UCLA, a state agency, in federal court. While the judge did not https://exch.mail.umd.edu/owa/WebReadyViewBody.aspx?t=att&id=RgAAAADXsLSgBeEwTJ9q0yHNKIt2BwBoUJgPO3tVSoU0x%2bkwIYfQALrqJtSLAABoUJgPO3tVSoU0x%2bkwIYfQAPIULedYAAAJ&attid0=EACjse6ZzPHuQ6QbFqVhBhu8&attcnt=1&pn=1#footnote30#footnote30 INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2012 13 expressly rule on the fair-use argument UCLA made, the ruling did confirm that streaming is not a form of video distribution and that the public-performance argument UCLA made regarding the videos was not invalidated by the fact that they made copies of the videos in question.39 Digitization Programs and the Google Book Settlement Librarians looking to digitize print collections, either for preservation or to facilitate online access, are also grappling with the copyright monopoly. Librarians who do not have the time or resources to seek permission from publishers and authors before scanning a book in their collection cannot touch anything published after 1922. LibraryLaw.com provides a helpful chart directed at librarians considering digitization projects, but the overwhelming fine print below the chart speaks to the labyrinthine nature of copyright.40 The Google Book settlement continues to loom large over both the library profession and the publishing industry. At the heart of debate is Google’s Library Project, which is part of Google Book Search, originally named Google Print.41 The Library Project allows users to search for books using Google’s algorithms to provide at its most basic a “snippet view” of the text from a relevant publication. Authors and publishers could also grant their permission to allow a view of select sample pages, and of course if the book is in the public domain, then Google can make the entire work visible online.42 In all cases, the user will see a “buy this book” link so that he or she could purchase the publication from online vendors on unrelated sites.43 Google hoped to sidestep the copyright permission quandary for a digitization project of this scale, announcing that it would proceed with the digitization of cooperative library collections and that it would be the responsibility of publishers and authors to actively opt out or vocalize their objection to seeing their works digitized and posted online.44 Google attempted to turn the copyright permissions process on its head, which was the basis of the class action lawsuit Authors Guild v. Google Inc.45 Before the settlement was reached, Google pointed to Kelly v. Arriba Soft Corp as proof that the indexing functions of an Internet search engine constitute fair use. In that 2002 case, the Ninth Circuit Court of Appeals found that a website’s posting of thumbnail images, or “imprecise copies of low resolution, scaled down images,” constitutes fair use, and Google argued its “snippet view” function is equivalent to a thumbnail image.46 However, Judge Denny Chin rejected the Google Book settlement in March 2011, citing the fact that Google would in essence be “exploiting books without the permission of copyright owners” and could also establish a monopoly over the digitized books market. The decision did in the end hinge on the fact that Google wanted to follow an opt-out program for copyright holders rather than an affirmative opt-in system.47 The Google Book settlement was dismissed without prejudice, leaving the door open to further negotiations between the parties concerned. Going forward, the library community should be concerned with how Google will handle orphan works and how its index of digitized works will be made available to libraries and the public. The 2008 settlement granted Google the nonexclusive right to digitize all books published before January 5, 2009, and in exchange, Google would have https://exch.mail.umd.edu/owa/WebReadyViewBody.aspx?t=att&id=RgAAAADXsLSgBeEwTJ9q0yHNKIt2BwBoUJgPO3tVSoU0x%2bkwIYfQALrqJtSLAABoUJgPO3tVSoU0x%2bkwIYfQAPIULedYAAAJ&attid0=EACjse6ZzPHuQ6QbFqVhBhu8&attcnt=1&pn=1#footnote40#footnote40 COPYRIGHT: REGULATION OUT OF LINE WITH OUR DIGITAL REALITY | MCDERMOTT 14 “paid 70% of the net revenue earned from uses of Google Book Search in the United States to rights holders.”48 In addition, Google would have established the Book Rights Registry to negotiate with Google and others seeking to “digitize, index or display” those works on behalf of the rights holders.49 Approval of the settlement would have allowed Google to move forward with plans to expand Google Book Search and “to sell subscriptions to institutions and electronic versions of books to individuals.”50 The concern that Judge Denny Chin expressed over a potential Google Book monopoly was widespread among the library community. While the settlement would not have given Google exclusive rights to digitize and display these copyrighted works, Google planned to ensure via the settlement that it would have received the same terms the Book Rights Registry negotiated with any third-party digital library, while also inoculating itself against the risk of any copyright infringement lawsuits that could be filed against a competitor.51 That would have left libraries vulnerable to any subscription price increases for the Google Books service.52 Libraries should carefully watch the negotiations around any future Google Books settlement, paying attention to a few key issues.53 There was considerable concern that under the terms of the 2008 settlement, even libraries participating in the Google Books Library Project would need to subscribe to the service to have access to digitized copies of the books in their own collections.53 Many librarians also vocalized their disappointment in Google’s abandonment of its fair-use argument when it agreed to the 2008 settlement, which, if it succeeded, would have been a boon to nonprofit, library-driven digitization programs.54 Finally, many librarians were concerned that Google’s Book Rights Registry was likely to become the default rights holder for the orphan works in the Google Books library, and that claims that Google Books is an altruistic effort to establish a world library conceals the less admirable aim of the project—to monetize out-of-print and orphan works.55 Librarians as Free Culture Advocates: Implications and Recommendations Our digital nation has turned copyright law into a minefield for both librarians and the public at large. Intellectual property scholar Lawrence Lessig failed in his attempt to argue before the Supreme Court that the Sonny Bono Copyright Term Extension Act was an attempt to regulate free speech and therefore violated the First Amendment.56 But many believe that our restrictive copyright laws at least violate the intent of the progress clause of the Constitution, if not the First Amendment: “unconstrained access to past works helps determine the richness of future works. Inversely, when past works are inaccessible except to a privileged minority, future works are impoverished.”57 While technological advances have placed the digital means of production into the hands of the masses, intellectual property law is leading us down a path to self-censorship.58 As the profession “at the heart of both the knowledge economy and a healthy democracy,”59 it is in our best interest as librarians to recognize the important role we have to play in restoring the balance to copyright law. To engage in the debate over copyright law in the digital age, the library community needs to educate itself and advocate for our own self-interests, focusing on three key areas: https://exch.mail.umd.edu/owa/WebReadyViewBody.aspx?t=att&id=RgAAAADXsLSgBeEwTJ9q0yHNKIt2BwBoUJgPO3tVSoU0x%2bkwIYfQALrqJtSLAABoUJgPO3tVSoU0x%2bkwIYfQAPIULedYAAAJ&attid0=EACjse6ZzPHuQ6QbFqVhBhu8&attcnt=1&pn=1#footnote50#footnote50 INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2012 15 1. Copyright law in the classroom and at the conference. We must educate new and seasoned librarians on the nature of copyright law, and the impact it has on library practice and systems. Library schools must step up to the plate and include a thorough overview of copyright law in their library science curriculum. While including copyright law in a larger legal-issues class is acceptable, the complexity of current U.S. copyright law demonstrates that this is not a subject that can be glossed over in a single lecture. Furthermore, there needs to be a stronger emphasis on continuing education and training on copyright law within the library profession. The SLA offers a copyright certificate program, but the reach of such programs is not wide enough. Copyright law, and the impacts current policy has on the library profession, must be prominently featured at library conferences. The University of Maryland University College’s Center for Intellectual Property offers an online community forum for discussing copyright issues and policies, but it is unclear how many librarians are members.60 2. Librarians as standard-bearers for the free culture movement. While the Library Copyright Alliance, to which the ALA, ARL, and ACRL all belong, files amicus briefs in support of balanced copyright law and submits comments to WIPO, the wider library community must also advocate for copyright reform, since this is an issue that affects all librarians, everywhere. As a profession, we need to throw our collective weight behind legislative measures that address the copyright monopoly. There have been a number of unfortunate failures in recent years. S. 1621, or the Consumers, Schools, and Libraries Digital Management Awareness Act of 2003, attempted to address a number of DRM issues, including a requirement that access controlled digital media and electronics include disclosures on the nature of the DRM technology in use.61 H.R. 107, the Digital Media Consumers Rights Act of 2003, would have amended the DMCA to allow those researching the technology to circumvent DRM software while also eliminating the Catch-22 that makes circumventing DRM software for fair-use purposes illegal. The BALANCE Act of 2003 (H.R. 1066) included provisions to expand fair use to the act of transmitting, accepting, and saving a copyrighted digital work for personal use. All of this legislation died in committee, as did H.R. 5889 (Orphan Works Act of 2008) and S. 2913 (Shawn Bentley Orphan Works Act of 2008). Both bills would have addressed the orphan works dilemma, clearly spelling out the steps one must take to use an orphan work with no express permission from the copyright holder, without fear of a future lawsuit. Could a show of support from the library community have saved these bills? It is impossible to know, but it is in our best interest to follow these legislative battles in the future and make sure our voice is heard. 3. Libraries and the Creative Commons phenomenon. In addition, librarians need to take part in the Creative Commons (CC) movement by actively directing patrons towards this world of digital works that have clear, simple use and attribution requirements. Creative Commons was founded in 2001 with the support of the Center for the Study of the Public Domain at Duke University School of Law.62 The movement is essentially about free culture, and the idea that many people want to share their creative works and allow others to use or build off of their efforts easily and without seeking their permission. It is not intended to supplant copyright law, and Lawrence COPYRIGHT: REGULATION OUT OF LINE WITH OUR DIGITAL REALITY | MCDERMOTT 16 Lessing, one of the founders of Creative Commons, has said many times that he believes intellectual property law is necessary and that piracy is inexcusable.63 Instead, a CC license states in clear terms exactly what rights the creator reserves, and conversely, what rights are granted to everyone else.64 As Lawrence Lessig explains, You go to the Creative Commons Website (http://creativecomms.org); you pick the opportunity to select a license: do you want to permit commercial uses or not? Do you want to allow modifications or not? If you allow modifications, do you want to require a kind of copyleft idea that other people release the modifications under a similarly free license? That is the core, and that produces a license.65 There are currently six CC licenses, and they include some combination of the four license conditions defined by Creative Commons: attribution (by), share alike (sa), noncommercial (nc), and no derivatives (nd).66 Each of the four conditions is designated by a clever symbol, and the six licenses display these symbols after the Creative Commons trademark itself, two small c’s inside a circle.67 There are “hundreds of millions of CC licensed works” that can be searched through Google and Yahoo, and some notable organizations that rely on CC licenses include Flickr, the Public Library of Science, Wikipedia, and now Whitehouse.gov.68 All librarians not already familiar with this approach need to educate themselves on CC licenses and how to find CC licensed works.69 While librarians must still inform their patrons about the realities of copyright law, it is just as important to direct patrons, students, and colleagues to CC licensed materials, so that they can create the mash-ups, videos, and podcasts that are the creative products of our Web 2.0 world.70 The Creative Commons system is not perfect, and “Creative Commons gives the unskilled an opportunity to fail at many junctures.”71 Yet that only speaks to the necessity of educating the library community about the “some rights reserved” movement, so that librarians, who are already called upon to understand traditional copyright law, are also educating our society about how individuals can protect their intellectual property while preserving and strengthening the public domain. CONCLUSION The library community can no longer afford to consider intellectual property law as a foreign topic appropriate for law schools but not library schools. Those who are behind the slow extermination of the public domain rely on the complexity of copyright law, and the misunderstanding of the principle of fair use, to make their arguments easier and to brow beat libraries and the public into handing over the rights the Constitution bestows on everyone. Librarians need to engage in the debate over copyright law to retain control over their collections, and to better serve their patrons. In the past, the library community has not hesitated to stand up for the freedom of speech and self-expression, whether it means taking a stand against banning books from school libraries or fighting to repeal clauses of the USA PATRIOT Act. Today’s library patrons are not just information consumers—they are also information producers. Therefore it is just as critical for librarians to advocate for their creative rights as it is for them to defend their freedom to read. https://exch.mail.umd.edu/owa/WebReadyViewBody.aspx?t=att&id=RgAAAADXsLSgBeEwTJ9q0yHNKIt2BwBoUJgPO3tVSoU0x%2bkwIYfQALrqJtSLAABoUJgPO3tVSoU0x%2bkwIYfQAPIULedYAAAJ&attid0=EACjse6ZzPHuQ6QbFqVhBhu8&attcnt=1&pn=1#footnote60#footnote60 https://exch.mail.umd.edu/owa/redir.aspx?C=01cfbeb60fb24d1594b179edf974dcfd&URL=http%3a%2f%2fcreativecomms.org https://exch.mail.umd.edu/owa/WebReadyViewBody.aspx?t=att&id=RgAAAADXsLSgBeEwTJ9q0yHNKIt2BwBoUJgPO3tVSoU0x%2bkwIYfQALrqJtSLAABoUJgPO3tVSoU0x%2bkwIYfQAPIULedYAAAJ&attid0=EACjse6ZzPHuQ6QbFqVhBhu8&attcnt=1&pn=1#footnote65#footnote65 INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2012 17 The Internet has become such a strong incubator of creative expression and innovation that the innovators are looking for a way to shirk the very laws that were designed to protect their interests. In the end, the desire to create and innovate seems to be more innate than those writing our intellectual property laws expected. Perhaps financial gain is less of a motivator than the pleasure of sharing a piece of ourselves and our worldview with the rest of society. Whether that’s the case or not, what is clear is that if we do not roll back legislation like The Sonny Bono Copyright Term Extension Act and the DMCA so as to save the public domain, the pressure to create outside the bounds of the law is going to turn more inventors and artists into anarchists, threatening the interests of reasonable copyright holders. As librarians, we must curate and defend the creative property of the established, while fostering the innovative spirit of the next generation. As information, literature, and other creative works move out of the physical world, and off the shelves, into the digital realm, librarians need to do their part to ensure legislation is aligned with this new reality. If we do not, our profession may suffer first, but it will not be the last casualty of the copyright wars. REFERENCES 1. Beverly Goldberg, “LG Unlocks Doors for Creators, Consumers with DMCA Exceptions,” American Libraries 41, no. 9 (Summer 2010): 14. 2. Ibid. 3. Goldberg, “LG Unlocks Doors.” 4. Christopher Alan Jennings, Fair Use on the Internet, prepared by the Congressional Research Service (Washington, DC: Library of Congress, 2002), 2. 5. Ibid., 1. 6. Ibid. 7. Brandon Butler, “Urban Copyright Legends,” Research Library Issues 270 (June 2010): 18. 8. Robin Jeweler, “Digital Rights” and Fair Use in Copyright Law, prepared by the Congressional Research Service (Washington, DC: Library of Congress, 2003), 5. 9. Rachel Bridgewater, “Tipping the Scales: How Free Culture Helps Restore Balance in the Age of Copyright Maximalism,” Oregon Library Association Quarterly 16, no. 3 (Fall 2010): 19. 10. Charles W. Bailey Jr., “Strong Copyright + DRM + Weak Net Neutrality = Digital Dystopia?” Information Technology & Libraries 25, no. 3 (Summer 2006): 117; U.S. Copyright Office, “Copyright Law of the United States,” under “Chapter 3: Duration of Copyright,” http://www.copyright.gov/title17 (accessed December 8, 2010). 11. Dan Hunter, “Culture War,” Texas Law Review 83, no. 4 (2005): 1130. 12. Bailey, “Strong Copyright,” 118. 13. U.S. Copyright Office, “Copyright Law of the United States,” under “Chapter 1: Subject Matter and Scope of Copyright,” http://www.copyright.gov/title17 (accessed December 8, 2010). 14. Bailey, “Strong Copyright,” 118. 15. Mary Minnow, “Library Digitization Table,” http://www.librarylaw.com/DigitizationTable.htm (accessed December 8, 2010). https://exch.mail.umd.edu/owa/redir.aspx?C=01cfbeb60fb24d1594b179edf974dcfd&URL=http%3a%2f%2fwww.copyright.gov%2ftitle17 https://exch.mail.umd.edu/owa/redir.aspx?C=01cfbeb60fb24d1594b179edf974dcfd&URL=http%3a%2f%2fwww.copyright.gov%2ftitle17 https://exch.mail.umd.edu/owa/redir.aspx?C=01cfbeb60fb24d1594b179edf974dcfd&URL=http%3a%2f%2fwww.librarylaw.com%2fDigitizationTable.htm COPYRIGHT: REGULATION OUT OF LINE WITH OUR DIGITAL REALITY | MCDERMOTT 18 16. Brian T. Yeh, “Orphan Works” in Copyright Law, prepared by the Congressional Research Service (Washington, DC: Library of Congress, 2002), summary. 17. Ibid. 18. JISC, In from the Cold: An Assessment of the Scope of “Orphan Works” and its Impact on the Delivery of Services to the Public (Cambridge, UK: JISC, 2009), 6. 19. Ibid. 20. Andrew Albanese, “HathiTrust Suspends its Orphan Works Release,” Publishers Weekly, Sept, 16, 2011, http://www.publishersweekly.com/pw/by- topic/digital/copyright/article/48722-hathitrust-suspends-its-orphan-works-release-.html (accessed October 13, 2011). 21. U.S. Copyright Office, “Copyright Law of the United States,” under “Chapter 1.” 22. U.S. Copyright Office, Copyright Basics (Washington, DC: U.S. Copyright Office, 2000), www.copyright.gov/circs/circl/html (accessed December 8, 2010). 23. Mary Minnow, California Library Association, “Library Copyright Liability and Pirating Patrons,” http://www.cla-net.org/resources/articles/minow_pirating.php (accessed December 10, 2010). 24. Bailey, “Strong Copyright,” 118. 25. Overdrive, “Copyright,” http://www.overdrive.com/copyright.asp (accessed December 13, 2010). 26. Josh Hadro, “HarperCollins Puts 26 Loan Cap on EBook Circulations,” Library Journal (February 25 2011), http://www.libraryjournal.com/lj/home/889452- 264/harpercollins_puts_26_loan_cap.html.csp (accessed October 13, 2011). 27. Butler, “Urban Copyright Legends,” 18. 28. Bailey, “Strong Copyright,” 118. 29. Library of Congress, Fair Use on the Internet, 3. 30. Ibid., summary. 31. Matthew K. Dames, “Education Use in the Digital Age,” Information Today 27, no. 4 (April 2010): 18. 32. Ibid. 33. Dames, “Education Use in the Digital Age,”18. 34. Matthew K. Dames, “Making a Case for Copyright Officers,” Information Today 25, no. 7 (July 2010): 16. 35. William C. Dougherty, “The Copyright Quagmire,” Journal of Academic Librarianship 36, no. 4 (July 2010): 351. 36. Ibid. 37. Library of Congress, “Digital Rights” and Fair Use in Copyright Law, 9. 38. Dougherty, “The Copyright Quagmire,” 351. 39. Kevin Smith, “Streaming Video Case Dismissed,” Scholarly Communications @ Duke, October 4, 2011, http://blogs.library.duke.edu/scholcomm/2011/10/04/streaming-video-case- dismissed/ (accessed October 13, 2011). http://www.publishersweekly.com/pw/by-topic/digital/copyright/article/48722-hathitrust-suspends-its-orphan-works-release-.html http://www.publishersweekly.com/pw/by-topic/digital/copyright/article/48722-hathitrust-suspends-its-orphan-works-release-.html https://exch.mail.umd.edu/owa/redir.aspx?C=01cfbeb60fb24d1594b179edf974dcfd&URL=http%3a%2f%2fwww.copyright.gov%2fcircs%2fcircl%2fhtml https://exch.mail.umd.edu/owa/redir.aspx?C=01cfbeb60fb24d1594b179edf974dcfd&URL=http%3a%2f%2fwww.cla-net.org%2fresources%2farticles%2fminow_pirating.php https://exch.mail.umd.edu/owa/redir.aspx?C=01cfbeb60fb24d1594b179edf974dcfd&URL=http%3a%2f%2fwww.overdrive.com%2fcopyright.asp http://www.libraryjournal.com/lj/home/889452-264/harpercollins_puts_26_loan_cap.html.csp http://www.libraryjournal.com/lj/home/889452-264/harpercollins_puts_26_loan_cap.html.csp http://blogs.library.duke.edu/scholcomm/2011/10/04/streaming-video-case-dismissed/ http://blogs.library.duke.edu/scholcomm/2011/10/04/streaming-video-case-dismissed/ INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2012 19 40. Dougherty, “The Copyright Quagmire,” 351. 41. LibraryLaw.com, “Library Digitization Table.” 42. Kate M. Manuel, The Google Library Project: Is Digitization for Purposes of Online Indexing Fair Use Under Copyright Law, prepared by the Congressional Research Service (Washington, DC: Library of Congress, 2009), 1–2. 43. Jeweler, “Digital Rights” and Fair Use in Copyright Law, 2. 44. Ibid. 45. Ibid. 46. Manuel, The Google Library Project, 2. 47. Amir Efrati and Jeffrey A. Trachtenberg, “Judge Rejects Google Books Settlement,” Wall Street Journal, March 23, 2011, http://online.wsj.com/article/SB10001424052748704461304576216923562033348.html (accessed October 13, 2011). 48. Jennings, Fair Use on the Internet, 7. 49. Manuel, The Google Library Project, 2. 50. Ibid., 9–10. 51. Ibid. 52. Ibid. 53. Pamela Samuelson, “Google Books is Not a Library,” Huffington Post, October 13, 2009, http://www.huffingtonpost.com/pamela-samuelson/google-books-is-not-a-lib_b_317518.html (accessed December 10, 2009). 54. Ivy Anderson, “Hurtling Toward the Finish Line: Should the Google Book Settlement be Approved?” Against the Grain 22, no. 3 (June 2010): 18. 55. Samuelson, “Google Books is not a Library.” 56. Jeweler, “‘Digital Rights” and Fair Use in Copyright Law, 3. 57. Bailey, “Strong Copyright,” 116. 58. Cushla Kapitzke, “Rethinking Copyrights for the Library through Creative Commons Licensing,” Library Trends 58, no. 1 (Summer 2009): 106. 59. Ibid. 60. University of Maryland University College, “Member Community,” Center for Intellectual Property, http://cipcommunity.org/s/1039/start.aspx (accessed February 21, 2011). 61. Robin Jeweler, Copyright Law: Digital Rights Management Legislation, prepared by the Congressional Research Service (Washington, DC: Library of Congress, 2004), summary. 62. Creative Commons, “History,” http://creativecommons.org/about/history/ (accessed December 8, 2010). 63. Lawrence Lessig, “The Vision for the Creative Commons? What are We and Where are We Headed? Free Culture,” in Open Content Licensing: Cultivating the Creative Commons, ed. Brian Fitzgerald (Sydney: Sydney University Press, 2007), 42. 64. Steven J. Melamut, “Free Creativity: Understanding the Creative Commons Licenses,” American Association of Law Libraries 14, no. 6 (April 2010): 22. http://online.wsj.com/article/SB10001424052748704461304576216923562033348.html https://exch.mail.umd.edu/owa/redir.aspx?C=01cfbeb60fb24d1594b179edf974dcfd&URL=http%3a%2f%2fwww.huffingtonpost.com%2fpamela-samuelson%2fgoogle-books-is-not-a-lib_b_317518.html https://exch.mail.umd.edu/owa/redir.aspx?C=01cfbeb60fb24d1594b179edf974dcfd&URL=http%3a%2f%2fcipcommunity.org%2fs%2f1039%2fstart.aspx https://exch.mail.umd.edu/owa/redir.aspx?C=01cfbeb60fb24d1594b179edf974dcfd&URL=http%3a%2f%2fcreativecommons.org%2fabout%2fhistory%2f COPYRIGHT: REGULATION OUT OF LINE WITH OUR DIGITAL REALITY | MCDERMOTT 20 65. Lessig, “The Vision for the Creative Commons?” 45. 66. Creative Commons, “About,” http://creativecommons.org/about/ (accessed December 8, 2010). 67. Ibid. 68. Ibid. 69. Bridgewater, “Tipping the Scales,” 21. 70. Ibid. 71. Woody Evans, “Commons and Creativity,” Searcher 17, no. 9 (October 2009): 34. https://exch.mail.umd.edu/owa/redir.aspx?C=01cfbeb60fb24d1594b179edf974dcfd&URL=http%3a%2f%2fcreativecommons.org%2fabout%2f 1861 ---- Batch Ingesting into EPrints Digital Repository Sof tware Tomasz Neugebauer and Bin Han INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2012 113 ABSTRACT This paper describes the batch importing strategy and workflow used for the import of theses metadata and PDF documents into the EPrints digital repository software. A two-step strategy of importing metadata in MARC format followed by attachment of PDF documents is described in detail, including Perl source code for scripts used. The processes described were used in the ingestion of 6,000 theses metadata and PDFs into an EPrints institutional repository. INTRODUCTION Tutorials have been published about batch ingestion of ProQuest metadata and electronic theses and dissertations (ETDs),1 as well as EndNote library,2 into the Digital Commons platform. The procedures for bulk importing of ETDs using DSpace have also been reported.3 However, bulk importing into the EPrints digital repository software has not been exhaustively addressed in the literature.4 A recent article by Walsh provides a literature review of batch importing into institutional repositories.5 The only published report on batch importing into the EPrints platform describes Perl scripts for metadata-only records import from Thomson Reuters Reference Manager.6 Bulk importing is often one of the first tasks after launching a repository, so it is unsurprising that requests for reports and documentation on EPrints-specific workflow have been a recurring question on the EPrints Tech List.7 A recently published review of EPrints identifies “the absence of a bulk uploading feature” as its most significant weakness.8 Although EPrints’ graphical user interface for bulk importing is limited to the use of the installed import plugins, the software does have a versatile infrastructure for this purpose. Leveraging EPrints’ import functionality requires some Perl scripting, structuring the data for import, and using the command line interface. In 2009, when Concordia University launched Spectrum,9 its research repository, the first task was a batch ingest of approximately 6,000 theses dated from 1967 to 2003. The source of the metadata for this import consisted in MARC records from an integrated library system powered by Innovative Interfaces and ProQuest PDF documents. This paper is a report on the strategy and workflow adopted for batch ingestion of this content into the EPrints digital repository software. Import Strategy EPrints has a documented import command line utility located in the /bin folder.10 Documents can also be imported through EPrints’ graphical interface. Using the command line utility for Tomasz Neugebauer (tomasz.neugebauer@concordia.ca) is Digital Projects and Systems Development Librarian and Bin Han (bin.han@concordia.ca) is Digital Repository Developer, Concordia University Libraries, Montreal, Quebec, Canada. mailto:tomasz.neugebauer@concordia.ca mailto:bin.han@concordia.ca BATCH INGESTING INTO EPRINTS DIGITAL REPOSITORY SOFTWARE| NEUGEBAUER AND HAN 114 importing is recommended because it is easier to monitor the operation in real time by adding progress information output to the import plugin code. The task of batch importing can be split into the following subtasks: 1. import of metadata of each item 2. import of associated documents, such as full-text PDF files The strategy adopted was to first import the metadata for all of the new items into the inbox of an editor’s account. After this first step was completed, a script was used to loop through the newly imported eprints and attach the corresponding full-text documents. Although documents can be imported from the local file system or via HTTP, import of the files from the local file system was used. The batch import procedure varies depending on the format of the metadata and documents to be imported. Metadata import requires a mapping of the source schema fields to the default or custom fields in EPrints. The source metadata must also be converted into one of the formats supported by EPrints’ import plugins, or a custom plugin must be created. Import plugins are available for many popular formats, including BibTeX, DOI, EndNote, and PubMedXML. In addition, community-contributed import plugins such as MARC and ArXiv are available at EPrints Files.11 Since most repositories use custom metadata fields, some customization of the import plugins is usually necessary. MARC Plugin for EPrints In EPrints, the import and export plugins ensure interoperability of the repository with other systems. Import plugins read metadata from one schema and load it into the EPrints system through a mapping of the fields into the EPrints schema. Loading MARC-encoded files into EPrints requires the installation of the import/export plugin developed by Romero and Miguel.12 The installation of this plugin requires the following two CPAN modules: MARC::Record and MARC::File::USMARC. The MARC plugin was then subclassed to create an import plugin named “Concordia Theses,” which is customized for thesis MARC records. Concordia Theses MARC Plugin The MARC plugin features a central configuration file (see appendix A) in which each MARC field is paired with a corresponding mapping to an EPrints field. Most of the fields were configured through this configuration file (see table 1). The source MARC records from the Innovative Interfaces Integrated Library System (ILS) encode the physical description of each item using the Anglo American Cataloguing Rules, as in the following example: “ix, 133 leaves : ill. ; 29 cm.” Since the default EPrints field for number of pages is of the type integer and does not allow multipart physical descriptions from the MARC 300 field, a custom text field for these physical descriptions (pages_aacr) had to be added. The marc.pl configuration file cannot be used to map compound fields, such as author names—the fields need custom mapping implementation in Perl. For instance, the MARC 100 and 700 fields INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2012 115 are transferred into the EPrints author compound field (in MARC.pm). Similarly, MARC 599 is mapped into a custom thesis advisor compound field. MARC field EPrints field 020a isbn 020z isbn 022a issn 245a title 250a edition 260a place_of_pub 260b publisher 260c date 300a pages_aacr 362a volume 440a series 440c volume 440x issn 520a abstract 730a publication Table 1. Mapping Table from MARC to EPrints Helge Knüttel’s refinements to the MARC plugin shared on the EPrints Tech List were employed in the implementation of a new subclass of MARC import for the Concordia Theses MARC records. In the implementation of the Concordia Theses plugin, ConcordiaTheses.pm inherits from MARC.pm. (See figure 1.)13 Knüttel added two methods that make it easier to subclass the general MARC plugin and add unique mappings: handle_marc_specialities and post_process_eprint. The post_process_eprint function was not used to attach the full-text documents to each eprint. Instead, the strategy to import the full-text documents using a separate attach_documents script was used (see “Theses Document File Attachment” below). Import of all of the specialized fields, such as thesis type (mapped from MARC 710t), program, department, and proquest id, was implemented in the function handle_marc_specialities of ConcordiaTheses.pm. For instance, 502a in the MARC record contains the department information, whereas an EPrints system like Spectrum stores department hierarchy as subject objects in a tree. Therefore importing the department information based on the value of 502a required regular expression searches of this MARC field to find the mapping into a corresponding subject id. This was implemented in the handle_marc_specialities function. BATCH INGESTING INTO EPRINTS DIGITAL REPOSITORY SOFTWARE| NEUGEBAUER AND HAN 116 Figure 1. Concordia Theses Class Diagram, created with the Perl module UML::Class::Simple Execution of the Theses Metadata Import The depositing user’s name is displayed along with the metadata for each eprint. A batchimporter user with the corporate name “Concordia University Libraries” was created to carry out the import. As a result, the public display of the imported items shows the following as a part of the metadata: “Deposited By: Concordia University Libraries.” The MARC plugin requires the encoding of the source MARC files to be UTF-8, whereas the records are exported from the ILS with MARC-8 encoding. Therefore MarcEdit software developed by Reese was used to convert the MARC file to UTF-8.14 To activate the import, the main MARC import plugin and its subclass, ConcordiaTheses.pm, have to be placed in the plugin folder /perl_lib/EPrints/Plugin/Import/MARC/. The configuration file INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2012 117 (see appendix A) must also be placed with the rest of the configurable files in /archives/REPOSITORYID/cfg/cfg.d. The plugin can then be activated from the command line using the import script in the /bin folder. A detailed description of this script and its usage is documented on the EPrints Wiki. The following EPrints command from the /bin folder was used to launch the import: import REPOSITORYID --verbose --user batchimporter eprint MARC::ConcordiaTheses Theses-utf8.mrc Following the aforementioned steps, all the theses metadata was imported into the EPrints software. The new items were imported with their statuses set to inbox. A status set to inbox means that the imported items are in the work area of batchimporter user and will need to be moved to live public access by switching their status to archive. Theses Document File Attachment After the process of importing the metadata of each thesis is complete, the corresponding document files need to be attached. The proquest id was used to link the full-text PDF documents to the metadata records. All of the MARC records contained the proquest id, while the PDF files, received from ProQuest, were delivered with the corresponding proquest id as the filename. The PDFs were uploaded to a folder on the repository web server using FTP. The attach_documents script (see appendix B for source code) was then used to attach the documents to each of the imported eprints in the batchimporter’s inbox and to move the imported eprints to the live archive. Several variables need to be set at the beginning of the attach_documents operation (see table 2). Variable Comment $root_dir = 'bin/import- data/proquest' This is the root folder where all the associated documents are uploaded by FTP. $depositor = 'batchimporter' Only the items deposited by a defined depositor, in this case batchimporter, will be moved from inbox to live archive. $dataset_id = 'inbox' Limit the dataset to those eprints with status set to inbox $repositoryid = 'library' The internal EPrints identifier of the repository Table 2. Variables to be Set in the attach_documents Script BATCH INGESTING INTO EPRINTS DIGITAL REPOSITORY SOFTWARE| NEUGEBAUER AND HAN 118 The following command is used to proceed with file attachment, while the output log is redirected and saved in the file ATTACHMENT: /bin/attach_documents.pl > ./ATTACHMENT 2>&1 The thesis metadata record was made live even if it did not contain a corresponding document file. A list of eprint ids of theses that did not contain a corresponding full-text PDF document are listed at the end of the log file, along with the count of the number of theses that were made live. After the import operation is complete, all the abstract pages need to be regenerated with the following command: /bin/generate_abstracts REPOSITORYID CONCLUSIONS This paper is a detailed report on batch importing into the EPrints system. The authors believe that this paper and its accompanying source code is a useful contribution to the literature on batch importing into digital repository systems. In particular, it should be useful to institutions that are adopting the EPrints digital repository software. Batch importing of content is a basic and fundamental function of a repository system, which is why the topic has come up repeatedly on the EPrints Tech List and in a repository software review. The methods that we describe for carrying out batch importing in EPrints make use of the command line and require Perl scripting. More robust administrative graphical user interface support for batch import functions would be a useful feature to develop in the platform. ACKNOWLEDGEMENTS The authors would like thank Mia Massicotte for exporting the metadata records from the integrated library system. We would also like to thank Alexandros Nitsiou, Raquel Horlick, Adam Field, and the reviewers at Information Technology and Libraries for their useful comments and suggestions. REFERENCES 1. Shawn Averkamp and Joanna Lee, “Repurposing ProQuest Metadata for Batch Ingesting ETDs into an Institutional Repository,” code{4}lib journal 7 (2009), http://journal.code4lib.org/articles/1647 (accessed June 27, 2011). 2. Michael Witt and Mark P. Newton, “Preparing Batch Deposits for Digital Commons Repositories,” 2008, http://docs.lib.purdue.edu/lib_research/96/ (accessed June 20, 2011). 3. Randall Floyd, “Automated Electronic Thesis and Dissertations Ingest,” 2009, https://wiki.dlib.indiana.edu/display/IUSW/Automated+Electronic+Thesis+and+Dissertations+I ngest (accessed May 26, 2011). 4. EPrints Digital Repository Software, University of Southampton, UK, http://www.eprints.org/ (accessed June 27, 2011). 5. Maureen P. Walsh, “Batch Loading Collections into DSpace: Using Perl Scripts for Automation and Quality Control,” Information Technology & Libraries 29, no. 3 (2010): 117–27, http://journal.code4lib.org/articles/1647 http://docs.lib.purdue.edu/lib_research/96/ https://wiki.dlib.indiana.edu/display/IUSW/Automated+Electronic+Thesis+and+Dissertations+Ingest https://wiki.dlib.indiana.edu/display/IUSW/Automated+Electronic+Thesis+and+Dissertations+Ingest http://www.eprints.org/ INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2012 119 http://search.ebscohost.com/login.aspx?direct=true&db=a9h&AN=52871761&site=ehost-live (accessed June 26, 2011). 6. Lesley Drysdale, “Importing Records from Reference Manager into GNU EPrints,” 2004, http://hdl.handle.net/1905/175 (accessed June 27, 2011). 7. EPrints Tech List, University of Southampton, UK, http://www.eprints.org/tech.php/ (accessed June 27, 2011). 8. Mike Beazly, “Eprints Institutional Repository Software: A Review,” Partnership: the Canadian Journal of Library & Information Practice & Research 5, no. 2 (2010), http://journal.lib.uoguelph.ca/index.php/perj/article/viewArticle/1234 (accessed June 27, 2011). 9. Concordia University Libraries, “Spectrum: Concordia University Research Repository,” http://spectrum.library.concordia.ca (accessed June 27, 2011). 10. EPrints Wiki, “API:bin/import,” University of Southampton, UK, http://wiki.eprints.org/w/API:bin/import (accessed June 23, 2011). 11. EPrints Files, University of Southampton, UK, http://files.eprints.org/ (accessed June 24 2011). 12. Parella Romero and Jose Miguel, “MARC Import/Export Plugins for GNU EPrints3,” EPrints Files, 2008, http://files.eprints.org/323/ (accessed May 31, 2011). 13. Agent Zhang and Maxim Zenin, “UML:Class::Simple,” CPAN, http://search.cpan.org/~agent/UML-Class-Simple-0.18/lib/UML/Class/Simple.pm (accessed September 20, 2011). 14. Terry Reese, “MarcEdit: Downloads,” Oregon State University, http://people.oregonstate.edu/~reeset/marcedit/html/downloads.html (accessed June 27, 2011). http://search.ebscohost.com/login.aspx?direct=true&db=a9h&AN=52871761&site=ehost-live http://hdl.handle.net/1905/175 http://www.eprints.org/tech.php/ http://journal.lib.uoguelph.ca/index.php/perj/article/viewArticle/1234 http://spectrum.library.concordia.ca/ http://wiki.eprints.org/w/API:bin/import http://files.eprints.org/ http://files.eprints.org/323/ http://search.cpan.org/~agent/UML-Class-Simple-0.18/lib/UML/Class/Simple.pm http://people.oregonstate.edu/~reeset/marcedit/html/downloads.html BATCH INGESTING INTO EPRINTS DIGITAL REPOSITORY SOFTWARE| NEUGEBAUER AND HAN 120 Appendix A. marc.pl Configuration File # # Plugin EPrints::Plugin::Import::MARC # # MARC tofro EPrints Mappings # Do _not_ add compound mappings here. $c->{marc}->{marc2ep} = { # MARC to EPrints '020a' => 'isbn', '020z' => 'isbn', '022a' => 'issn', '245a' => 'title', '245b' => 'subtitle', '250a' => 'edition', '260a' => 'place_of_pub', '260b' => 'publisher', '260c' => 'date', '362a' => 'volume', '440a' => 'series', '440c' => 'volume', '440x' => 'issn', '520a' => 'abstract', '730a' => 'publication', }; $c->{marc}->{marc2ep}->{constants} = { }; ################################################################### ### # # Plugin-specific settings. # # Any non empty hash set for a specific plugin will override the # general one above! # ################################################################### ### # # Plugin EPrints::Plugin::Import::MARC::ConcordiaTheses # $c->{marc}->{'EPrints::Plugin::Import::MARC::ConcordiaTheses'}->{marc2ep} = { '020a' => 'isbn', '020z' => 'isbn', '022a' => 'issn', '250a' => 'edition', INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2012 121 '260a' => 'place_of_pub', '260b' => 'publisher', '260c' => 'date', '300a' => 'pages_aacr', '362a' => 'volume', '440a' => 'series', '440c' => 'volume', '440x' => 'issn', '520a' => 'abstract', '730a' => 'publication', }; $c->{marc}->{'EPrints::Plugin::Import::MARC::ConcordiaTheses'}->{constants} = { # MARC to EPrints constants 'type' => 'thesis', 'institution' => 'Concordia University', 'date_type' => 'submitted', }; BATCH INGESTING INTO EPRINTS DIGITAL REPOSITORY SOFTWARE| NEUGEBAUER AND HAN 122 Appendix B. attach_documents.pl #!/usr/bin/perl -I/opt/eprints3/perl_lib =head1 DESCRIPTION This script allows you to attach a file to an eprint object by proquest id. =head1 COPYRIGHT AND LICENSE 2009 Adam Field, Tomasz Neugebauer 2011 Bin Han This module is free software under the same terms of Perl. Compatible with EPrints 3.2.4 (Victoria Sponge). =cut use strict; use warnings; use EPrints; my $repositoryid = 'library'; my $root_dir = '/opt/eprints3/bin/import-data/proquest'; #location of PDF files my $dataset_id = 'inbox'; #change to 'eprint' if you want to run it over everything. my $depositor = 'batchimporter'; #limit import to $depositor’s Inbox #global variables for log purposes my $int_live = 0; #count of eprints moved to live archive with a document my $int_doc = 0; #count of eprints that already have document attached my @array_doc; #ids of eprints that already have documents my $int_no_doc = 0; #count of eprints moved to live with no document attached my @array_no_doc; #ids of eprints that have no documents my $int_no_proid = 0; #count of eprints with no proquest id my @array_no_proid; #ids of eprints with no proquest id my $session = EPrints::Session->new(1, $repositoryid); die "couldn't create session for $repositoryid\n" unless defined $session; #the hash contains all the files that need to be uploaded #the hash contains key-value pairs: (pq_id => filename) my $filemap = {}; load_filemap($root_dir); #get all eprints in inbox dataset my $dataset = $session->get_repository->get_dataset($dataset_id); #run attach_file on each eprint object $dataset->map($session, \&attach_file); INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2012 123 #output log for attachment print "#### $int_doc eprints already have document attached, skip ####\n @array_doc\n"; print "#### $int_no_proid eprints doesn't have proquest id, skip ####\n @array_no_proid\n"; print "#### $int_no_doc eprints doesn't have associated document, moved to live ####\n @array_no_doc\n"; #total number of eprints that were made live: those with and without documents. my $int_total_live = $int_live + $int_no_doc; print "#### Intotal: $int_total_live eprints moved to live ####\n"; #attach file to corresponding eprint object sub attach_file { my ($session, $ds, $eprint) = @_; #skip if eprint already has a document attached my $full_text_status = $eprint->get_value( "full_text_status" ); if ($full_text_status ne "none") { print "EPrint ".$eprint->get_id." already has a document, skipping\n"; $int_doc ++; push ( @array_doc, $eprint->get_id ); return; } #retrieve username/userid associated with current eprint my $user = new EPrints::DataObj::User( $eprint->{ session }, $eprint->get_value( "userid" ) ); my $username; # exit in case of failure to retrieve associated user, just in case. return unless defined $user; $username = $user->get_value( "username" ); # $dataset includes all eprints in Inbox, so we limit to $depositor's items only return if( $username ne $depositor ); #skip if no proquest id is associated with the current eprint my $pq_id = $eprint->get_value('pq_id'); if (not defined $pq_id) { print "EPrint ".$eprint->get_id." doesn't have a proquest id, skipping\n"; $int_no_proid ++; BATCH INGESTING INTO EPRINTS DIGITAL REPOSITORY SOFTWARE| NEUGEBAUER AND HAN 124 push ( @array_no_proid, $eprint->get_id ); return; } #remove space from proquest id $pq_id =~ s/\s//g; #attach the PDF to eprint objects and move to live archive if ($filemap->{$pq_id} and -e $filemap->{$pq_id} ) #if the file exists { #create document object, add pdf files to document, attach to eprint object, and move to live archive my $doc = EPrints::DataObj::Document::create( $session, $eprint ); $doc->add_file( $filemap->{$pq_id}, $pq_id . '.pdf' ); $doc->set_value( "format", "application/pdf" ); $doc->commit(); print "Adding Document to EPrint ", $eprint->get_id, "\n"; $eprint->move_to_archive; print "Eprint ".$eprint->get_id." moved to archive.\n"; $int_live ++; } else { #move the metadata-only eprints to live as well print "Proquest ID \\$pq_id\\ (EPrint ", $eprint->get_id, ") does not have a file associated with it\n"; $eprint->move_to_archive; print "Eprint ".$eprint->get_id." moved to archive without document attached.\n"; $int_no_doc ++; push ( @array_no_doc, $eprint->get_id ); } } #Recursively traverse the directory, find all PDF files. sub load_filemap { my ($directory) = @_; foreach my $filename (<$directory/*>) { if (-d $filename) { load_filemap($filename); } #catch the file name ending in .pdf elsif ($filename =~ m/([^\/]*)\.pdf$/i) INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2012 125 { my $pq_id = $1; #add pq_id => filename pair to filemap hash table $filemap->{$pq_id} = $filename; } } } 1918 ---- Guest Editorial Clifford Lynch INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2012 3 Congratulations LITA and Information Technology and Libraries. Since the early days of the Internet, I’ve been continually struck by the incredible opportunities that it offers organizations concerned with the creation, organization, and dissemination of knowledge to advance their core missions in new and more effective ways. Libraries and librarians were consistently early and aggressive in recognizing, seizing, and advocating for these opportunities, though they’ve faced—and continue to face—enormous obstacles ranging from copyright laws to the amazing inertia of academic traditions in scholarly communication. Yet the library profession has been slow to open up access to the publications of its own professional societies, to take advantage of the greater reach and impact that such policies can offer. Making these changes is not easy: there are real financial implications that suddenly seem very serious when you are a member of a board of directors, charged with a fiduciary duty to your association, and you have to push through plans to realign its finances, organizational mission, and goals in the new world of networked information. So, as a long-time LITA member, I find it a great pleasure to see LITA finally reach this milestone with Information Technology and Libraries (ITAL) moving to fully open-access electronic distribution, and I congratulate the LITA leadership for the persistence and courage to make this happen. It’s a decision that will, I believe, make the journal much more visible, and a more attractive venue for authors; it will also make it easier to use in educational settings, and to further the interactions between librarians, information scientists, computer scientists, and members of other disciplines. On a broader ALA-wide level, ITAL now joins ACRL’s College & Research Libraries as part of the American Library Association’s portfolio of open-access journals. Supporting ITAL as an open-access journal is a very good reason indeed to be a member of LITA. Clifford Lynch (clifford@cni.org) is Executive Director, Coalition for Networked Information. mailto:clifford@cni.org 1927 ---- President’s Message: Open Access/Open Data Colleen Cuddy INFORMATION TECHNOLOGIES AND LIBRARIES | MARCH 2012 1 I am very excited to write this column. This issue of Information Technology and Libraries (ITAL) marks the beginning of a new era for the journal. ITAL is now an open-access, electronic-only journal. There are many people to thank for this transition. The LITA Publications Committee led by Kristen Antelman did a thorough analysis of publishing options and presented a thoughtful proposal to the LITA Board; the LITA Board had the foresight to push for an open-access journal even if it might mean a temporary revenue loss for the division; Bob Gerrity, ITAL editor, has enthusiastically supported this transition and did the heavy lifting to make it happen; and the LITA office staff worked tirelessly for the past year to help shepherd this project. I am proud to be leading the organization during this time. To see ITAL go open access in my presidential year is extremely gratifying. As Cliff Lynch notes in his editorial, “the library profession has been slow to open up access to the publications of its own professional societies, to take advantage of the greater reach and impact that such policies can offer.” As librarians challenge publishers to pursue open-access venues, myself included, I am relieved to no longer be a hypocrite. By supporting open access we are sending a strong message to the community that we believe in the benefits of open access and we encourage other library organizations to do the same. ITAL will now reach a much broader and larger audience. This will benefit our authors, the organization, and the scholarship of our profession. I understand that while our members embrace open access, not everyone is pleased with an online-only journal. The number of new journals being offered electronically only is growing and I believe we are beginning to see a decline in the dual publishing model of publishers and societies offering both print and online journals. My library has been cutting back consistently on print copies of journals and this year will get only a handful of journals in print. Personally, I have embraced the electronic publishing world. In fact, I held off on subscribing to The New Yorker until it had an iPad subscription model! I estimate that I read 95 percent of my books and all of my professional journals electronically. The revolution has happened for me and for many others. I know that our membership will adapt and transition their ITAL reading habits to our new electronic edition and I look forward to seeing this column and the entire journal in its new format. Colleen Cuddy (colleen.cuddy@med.cornell.edu) is LITA President 2011-12 and Director of the Samuel J. Wood Library and C. V. Starr Biomedical Information Center at Weill Cornell Medical College, New York, New York. mailto:colleen.cuddy@med.cornell.edu PRESIDENT’S MESSAGE | CUDDY 2 Earlier this week saw the Research Works Act die. Librarians and researchers across the country celebrated this victory as we preserved an important open-access mandate requiring the deposition of research articles funded by the National Institutes of Health into PubMed Central. This act threatened not just research but the availability of health information to patients and their families. As librarians, we still need to be vigilant about preserving open access and supporting open-access initiatives. I would like to draw your attention to the Federal Research Public Access Act (FRPAA, HR 4004). This act was recently introduced in the House, with a companion bill in the Senate. As described by the Association of Research Libraries, FRPPA would ensure free, timely, online access to the published results of research funded by eleven U.S. federal agencies. The bill gives individual agencies flexibility in choosing the location of the digital repository to house this content, as long as the repositories meet conditions for interoperability and public accessibility, and have provisions for long-term archiving. The legislation would extend and expand access to federally-funded research resources and, importantly, spur and accelerate scientific discovery. Notably, this bill does not take anything away from publishers. No publisher will be forced to publish research under the bill’s provisions; any publisher can simply decline to publish the material if it feels the terms are too onerous. I encourage the library community to contact their representatives to support this bill. Open access and open data are the keystones of e-science and its goals of accelerating scientific discovery. I hope that many of you will join me at the LITA President’s Program on June 24, 2012, in Anaheim. Tony Hey, Corporate Vice President of Microsoft Research Connections and former director of the U.K.'s e-Science Initiative, and Clifford Lynch, Executive Director of the Coalition for Networked Information, will discuss data-intensive scientific discovery and its implications for libraries, drawing from the seminal work The Fourth Paradigm. Librarians are beginning to explore our role in this new paradigm of providing access to and helping to manage data in addition to bibliographic resources. It is a timely topic and one in which librarians, due to our skill set, are poised to take a leadership role. Reading The Fourth Paradigm was a real game changer for me. It is still extremely relevant. You might consider reading a chapter or two prior to the program. It is an open-access e-book available for download from Microsoft Research (http://research.microsoft.com/en-us/collaboration/fourthparadigm/). I keep a copy on my iPad, right there with downloaded ITAL article PDFs. http://www.arl.org/pp/access/frpaa-2012.shtml http://research.microsoft.com/en-us/collaboration/fourthparadigm/ 1916 ---- Investigations into Library Web-Scale Discovery Services Jason Vaughan INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2012 32 ABSTRACT Web-scale discovery services for libraries provide deep discovery to a library’s local and licensed content and represent an evolution—perhaps a revolution—for end-user information discovery as pertains to library collections. This article frames the topic of web-scale discovery and begins by illuminating web-scale discovery from an academic library’s perspective—that is, the internal perspective seeking widespread staff participation in the discovery conversation. This included the creation of the Discovery Task Force, a group that educated library staff, conducted internal staff surveys, and gathered observations from early adopters. The article next addresses the substantial research conducted with library vendors that have developed these services. Such work included drafting of multiple comprehensive question lists distributed to the vendors, onsite vendor visits, and continual tracking of service enhancements. Together, feedback gained from library staff, insights arrived at by the Discovery Task Force, and information gathered from vendors collectively informed the recommendation of a service for the UNLV Libraries. INTRODUCTION Web-scale discovery services, combining vast repositories of content with accessible, intuitive interfaces, hold the potential to greatly facilitate the research process. While the technologies underlying such services are not new, commercial vendors releasing such services, and their work and agreements with publishers and aggregators to preindex content, is very new. This article in particular frames the topic of web-scale discovery and helps illuminate some of the concerns and commendations related to web-scale discovery from one library’s staff perspective—that is, the internal perspective. The second part focuses on detailed dialog with the commercial vendors, enabling the library to gain a better understanding of these services. In this sense, the second half is focused externally. Given that web-scale discovery is new for the library environment, the author was unable to find any substantive published work detailing identification, research, evaluation, and recommendation related to library web-scale discovery services. It’s hoped that this article will serve as the ideal primer for other libraries exploring or contemplating exploration of these groundbreaking services. Web-scale discovery services are able to index a variety of content, whether hosted locally or remotely. Such content can include library ILS records, digital collections, institutional repository content, and content from locally developed and hosted databases. Such capabilities existed, to varying degrees, in next-generation library catalogs that debuted in the mid 2000s. In addition, web-scale discovery services pre–index remotely hosted content, whether purchased or licensed by the library. This latter set of content—hundreds of millions of items—can include items such as e-books, publisher or aggregator content for tens of thousands of full-text journals, content from abstracting and indexing databases, and materials housed in open-access repositories. For purposes of this article, web-scale discovery services are flexible services which Jason Vaughan (jason.vaughan@unlv.edu) is Director, Library Technologies, University of Nevada, Las Vegas. INVESTIGATIONS INTO LIBRARY WEB-SCALE DISCOVERY SERVICES | VAUGHAN 33 provide quick and seamless discovery, delivery, and relevancy-ranking capabilities across a huge repository of content. Commercial web-scale discovery vendors have brokered agreements with content providers (publishers and aggregators), allowing them to pre–index item metadata and full-text content (unlike the traditional federated search model). This approach lends itself to extremely rapid search and return of results ranked by relevancy, which can then be sorted in various ways according to the researcher’s whim (publication date, item type, full text only, etc.). By default, an intuitive, simple, Google-like search box is provided (along with advanced search capabilities for those wishing this approach). The interface includes design cues expected by today’s researchers (such as faceted browsing) and, for libraries wishing to extend and customize the service, embraces an open architecture in comparison to traditional ILS systems. Why Web-scale Discovery? As illustrated by research dating back primarily to the 1990s, library discovery systems within the networked online environment have evolved, yet continue to struggle to serve users. As a result, the library (or systems supported and maintained by the library) is often not the first stop for research—or worse, not a stop at all. Users accustomed to a quick, easy, “must have it now” environment have defected, and research continues to illustrate this fact. Rather than weave these research findings into a paragraph or page, below are some illustrative quotes to convey this challenge. The quotations below were chosen because they succinctly capture findings from research involving dozens, hundreds, and in some cases thousands of participants or respondents: People do not just use information that is easy to find; they even use information that they know to be of poor quality and less reliable—so long as it requires little effort to find—rather than using information they know to be of high quality and reliable, though harder to find.1 * * * Today, there are numerous alternative avenues for discovery, and libraries are challenged to determine what role they should appropriately play. Basic scholarly information use practices have shifted rapidly in recent years, and as a result the academic library is increasingly being disintermediated from the discovery process, risking irrelevance in one of its core functional areas [that of the library serving as a starting point or gateway for locating research information] . . . we have seen faculty members steadily shifting towards reliance on network- level electronic resources, and a corresponding decline in interest in using locally provided tools for discovery.2 * * * A seamless, easy flow from discovery through delivery is critical to end users. This point may seem obvious, but it is important to remember that for many end users, without the delivery of something he or she wants or needs, discovery alone is a waste of time.3 * * * End users’ expectations of data quality arise largely from their experiences of how information is organized on popular Web sites. . . 4 * * * [User] expectations are increasingly driven by their experiences with search engines like Google and online bookstores like Amazon. When end users conduct a search in a library INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2012 34 catalog, they expect their searches to find materials on exactly what they are looking for; they want relevant results.5 * * * Users don’t understand the difference in scope between the catalog and A&I services (or the catalog, databases, digitized collections, and free scholarly content).6 * * * It is our responsibility to assist our users in finding what they need without demanding that they acquire specialized knowledge or select among an array of “silo” systems whose distinctions seem arbitrary . . . the continuing proliferation of formats, tools, services, and technologies has upended how we arrange, retrieve, and present our holdings. Our users expect simplicity and immediate reward and Amazon, Google, and iTunes are the standards against which we are judged. Our current systems pale beside them.7 * * * Q: If you could provide one piece of advice to your library, what would it be? A: Just remember that students are less informed about the resources of the library than ever before because they are competing heavily with the Internet.8 Additional factors sell the idea of web-scale discovery. Obviously, something must be discoverable for it to be used (and of value) to a researcher; ideally, content should be easily discoverable. Since these new services index content that previously was housed in dozens or hundreds of individual silos, they can greatly facilitate the search process for many research purposes. Libraries often spend large sums of money to license and purchase content, sums that often increase annually. Any tool that holds the potential to significantly increase the discovery and use of such content should cause libraries to take notice. At time of writing, early research is beginning to indicate that these tools can increase discovery. Doug Way compared link-resolver-database and full-text statistics prior to and after Grand Valley State University’s implementation of the Summon web- scale discovery service.9 His research suggests that the service was both broadly adopted by the University’s community and that it has led to an increase in their library’s electronic resource discovery and use. Willamette University implemented WorldCat Local, and Bill Kelm presented results that showed an increase in both ILL requests as well as use of the library’s electronic resources.10 From another angle, information-literacy efforts focus on connecting users to “legitimate” content and providing researchers the skills to identify content quality and legitimacy. Given that these web-scale discovery services include or even primarily focus on indexing a large amount of scholarly research, such services can serve as another tool in the library’s arsenal. Results retrieved from these services—largely content licensed or purchased by libraries—is accurate, relevant, and vetted, compared to the questionable or opinionated content that may often be returned through a web search engine query. Several of the services currently allow a user to refine results to just categorized as peer-reviewed or scholarly. The Internal Academic Library Perspective: Genesis of the UNLV Libraries Discovery Task Force The following sections of this article begin with a focus on the internal UNLV Library perspective—from early discussions focused on the broad topic of discovery to establishing a task INVESTIGATIONS INTO LIBRARY WEB-SCALE DISCOVERY SERVICES | VAUGHAN 35 force charged to identify, research, evaluate, and recommend a potential service for purchase. Throughout this process, and as detailed below, communication with and feedback from the variety of library staff was essential in ensuring success. Given the increasing vitality of content in electronic format, and the fact that such content was increasingly spread across multiple access points or discovery systems, in late 2008 the University of Nevada Las Vegas (UNLV) Libraries began an effort to engage library staff in information discovery and how such discovery would ideally occur in the future. Related to the exponential growth of content in electronic format, traditional technical-services functions of cataloging and acquisitions were changing or would soon change, not just at UNLV, but throughout the academic library community. Coinciding with this, the Libraries were working on drafting their 2009–11 strategic plan and wanted to have a section highlighting the importance of information discovery and delivery with action items focused on improving this critical responsibility of libraries. In spring 2009, library staff were given the opportunity to share with colleagues a product or idea, related to some aspect of discovery, which they felt was worthy of further consideration. This event, open to UNLV Libraries staff and other Nevada colleagues, was titled the Discovery Mini-Summit, and more than a dozen participants shared their ideas, most in a poster-session format. One of the posters focused on Serial Solutions Summon, an early entrant into the vendor web-scale discovery service landscape. At the time, it was a few months from public release. Other posters included topics such as the Flickr Commons (cultural heritage and academic institutions exposing their digital collections through this popular platform), and a working prototype of a homegrown, open-source federated search approach searching across various subscribed databases. In August 2009, the dean of the UNLV University Libraries charged a ten-person task force to investigate and evaluate web-scale discovery services with the ultimate goal of providing a final recommendation for potential purchase. Representation on the task force included three directors and a broad cross section of staff from across the functional areas of the library, including back-of-the-house and public-service operations. The director of Library Technologies, and author of this article, was tasked with drafting a charge and chairing the committee; once charged, the Discovery Task Force worked over the next fifteen months to research, evaluate, and ultimately provide a recommendation regarding a web-scale discovery service. To help illustrate some of the events described, a graphical timeline of activities is presented as appendix A; the original charge appears as appendix B. In retrospect, the initial target date of early 2010 to make a recommendation was naive, as three of the five products ultimately identified and evaluated by the task force weren’t publicly released until 2010. Several boundaries were provided within the charge, including the fact that the task force was not investigating and evaluating traditional federated search products. The Libraries had had a very poor experience with federated search a few years earlier, and the shortcomings of the traditional federated search approach—regardless of vendor—are well known. The remainder of this article discusses the various steps taken by the Discovery Task Force in evaluating and researching web-scale discovery services. While many libraries have begun to implement the web- scale discovery services evaluated by this group, many more are currently at the learning and evaluation stage, or have not yet begun. Many libraries that have already implemented a commercial service likely went through an evaluation process, but perhaps not at the scale conducted by the UNLV Libraries, if for no other reason than the majority of commercial services are extremely new. Even in early 2010, there was less competition, fewer services to evaluate, INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2012 36 fewer vendors to contact, and fewer early adopters from whom to seek references. Fortunately, the initial target date of early 2010 for a recommendation was a soft target, and the Discovery Task Force was given ample time to evaluate the products. Based on presentations given by the author in 2010, it can’t be presumed that an understanding of web-scale discovery—or the awareness of the commercial services now available—is necessarily widespread. In that sense, it’s the author’s hope and intent that information contained in this article can serve as a primer, or a recipe, for those libraries wishing to learn more about web-scale discovery and perhaps begin an evaluation process of their own. While research exists on federated search technologies within the library environment, the author was unable to find any peer-reviewed published research on the evaluation model and investigations for vendor produced web-scale discovery services as described in this paper. However, some reports are available on the open web, providing some insights into web-scale discovery evaluations led by other libraries, such as two reports provided by Oregon State University. The first, dated March 2009, describes a task force whose activities included “scrutinize WCL [WorldCat Local], investigate other vendors’ products, specifically Serials Solutions’ Summon, the recently announced federated index discovery system; EBSCO’s Integrated Search; and Innovative Interfaces’ Encore product, so that a more detailed comparison can be done,” and “by March 2010, communicate . . . whether WCL or another discovery service is the optimal purchase for OSU Libraries.”11 Note that in 2009, Encore existed as a next-generation discovery layer, and it had an optional add on called “Encore Harvester,” which allows for the harvesting of digital local collections. The report cites the University of Michigan’s evaluation of WCL, and adds their additional observations. The March 2009 report provides a features comparison matrix for WorldCat Local, Encore, Summon, and LibraryFind (an open-source search tool developed at OSU that provides federated searching for selected resources). Feature sets include the areas of search and retrieval, content, and added features (e.g., book covers, user tagging, etc.). The report also describes some usability testing involving WCL and integration with other local library services. A second set of investigations followed “in order to provide the task force with an opportunity to more thoroughly investigate other products” and is described in a second report provided at the end of 2009.12 At the time of both phases of this evaluation (and drafted reports) three of the web-scale discovery products had yet to enter public release. The December 2009 report focused on the two released products, Serials Solutions Summon and WorldCat Local, and includes a feature matrix like the earlier report, with the added feature set of “other,” which included the features of “clarity of display,” “icons/images,” and “speed.” The latter report briefly describes how they obtained subject librarian feedback and the pros and cons observed by the librarians in looking at Summon. It also mentions obtaining feedback from two early adopters of the Summon product, as well as obtaining feedback from librarians whose library had implemented WorldCat Local. Apart from the Oregon reports, some other reports on evaluations (or selection) of a particular service, or a set of particular services, are available, such as the University of Michigan’s Article Discovery Working Group, which submitted a final report in January 2010.13 Activity: Understanding Web-scale The first activity of the Discovery Task Force was to educate the members, and later, other library colleagues, on web-scale discovery. Terms such as “federated search,” “metasearch,” “next INVESTIGATIONS INTO LIBRARY WEB-SCALE DISCOVERY SERVICES | VAUGHAN 37 generation catalogs,” and “discovery layers” had all come before, and “web-scale” was a rather new concept that wasn’t widely understood. The Discovery Mini Summit served as a springboard that perhaps more by chance than design introduced to UNLV Library staff what would later become more commonly known as web-scale discovery, though even we weren’t familiar with the term back in Spring 2009. In Fall 2009, the Discovery Task Force identified reports from entities such as OCLC, Ithaka, and reports prepared for the Library of Congress highlighting changing user behavior and expectations; these reports helped form a solid foundation for understanding the “whys” related to web-scale discovery. Additional registration and participation in sponsored web-scale discovery webcasts and meeting with vendors at library conferences helped further the understanding of web-scale discovery. After the Discovery Task Force had a firm understanding of web-scale discovery, the group hosted a forum for all library staff to help explain the concept of web-scale discovery and the role of the Discovery Task Force. Specifically, this first forum outlined some key components of a web-scale discovery service, discussed research the task force had completed to date, and outlined some future research and evaluation steps. A summary of these steps appears in the timeline in appendix A. Time was allowed for questions and answers, and then the task force broadcast several minutes of a (then recent) webcast talking about web-scale discovery. As part of its education role, the Discovery Task Force set up an internal wiki-based webpage in August 2009 upon formation of the group, regularly added content, and notified staff when new content was added. A goal of the task force was to keep the evaluative process transparent, and over time the wiki became quite substantial. Links to “live” services were provided on the wiki. Given that some services had yet to be released, some links were to demo sites or sites of the closest approximation available, i.e., some services yet to be released were built on an existing discovery layer already in general release, and thus the look, feel, and functionality of such services was basically available for staff review. The wiki also provided links to published research and webcasts on Web-scale discovery. Such content grew over time as additional web- scale discovery products entered general release. In addition to materials on particular services, links were provided to important background documents and reports on topics related to the user discovery experience and user expectations for search, discovery, and delivery. Discovery Task Force meeting notes and staff survey results were posted to the wiki, as were evaluative materials such as information on the content-overlap analysis conducted for each service. Announcements to relevant vendor programs at the American Library Association’s Annual Conference were also posted to the wiki. Activity: Initial Staff Survey As noted above, when the task force began its work, only two products (out of five ultimately evaluated) were in general release. As more products entered public release, a next step was to invite vendors onsite to show their publicly released product, or a working, developed prototype nearing initial public release. To capture a sense of the library staff ahead of these vendor visits, the Discovery Task Force conducted the first of two staff surveys. The 21-question survey consisted of a mix of “rank on a scale” questions, multiple-choice questions, and free-text response questions. Both the initial and subsequent surveys were administered through the online SurveyMonkey tool. Respondents were allowed to skip any question they wished. The survey was broken into three broad topical areas: “local library customization capabilities,” “end user aspect: INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2012 38 features and functionality,” and “content.” The survey had an average response rate of 47 staff, or 47% of the library’s 100-strong workforce. The survey questions appear in appendix C. In hindsight, some of the questions could have benefitted from more careful construction. That said, there was a conscious juxtaposition of differing concepts within the same question—the task force did not want to receive a set of responses in which all library staff felt it was important for a service to do everything—in short, to be all things to all people. Forcing staff to rate varied concepts within a question could provide insights into what they felt was really important. A brief summary of some key questions for each section follows. As an introduction, one question in the survey asked staff to rate the relative importance of each overarching aspect related to a discovery service (customization, end user interface, and content). Staff felt content was the most critical aspect of a discovery service, followed by the end-user interface, followed by the ability to heavily customize the service. A snapshot of some of the capabilities library staff thought were important (or not) is provided in table 1. Web-scale Capabilities SA A N D SD Physical item status information 81.6% 18.4% - - - Publication date sort capability 75.5% 24.5% - - - Display library-specified links in the interface 69.4% 30.6% - - - One-click retrieval of full-text items 61.2% 36.7% - - 2% Ability to place ILL / consortial catalog requests 59.2% 36.7% 4.1% - - Display the library’s logo 59.2% 36.7% 4.1% - - To be embedded within various library website pages 58% 42% - - - Full-text items first sort capability 58.3% 31.3% 8.3% 2.1% - Shopping cart for batch printing, emailing, saving 55.1% 44.9% - - - Faceted searching 48.9% 42.6% 8.5% - - Media type sort capability 47.9% 43.8% 4.2% 4.2% - Author name sort capability 41.7% 37.5% 18.8% 2.1% - Have a search algorithm that can be tweaked by library staff 38% 36% 20% 4% 2% User account for saved searches and marked items 36.7% 44.9% 14.3% 4.1% - Book cover images 25% 39.6% 20.8% 10.4% 4.2% Have a customizable color scheme 24% 58% 16% 2% - Google Books preview button for book items 18.4% 53.1% 24.5% 4.1% - Tag cloud 12.5% 52.1% 31.3% 4.2% - User authored ratings 6.4% 27.7% 44.7% 12.8% 8.5% User authored reviews 6.3% 20.8% 50% 12.5% 10.4% User authored tags 4.2% 33.3% 39.6% 10.4% 12.5% SA = Strongly Agree; A = Agree; N = Neither Agree nor Disagree; D = Disagree; SD = Strongly Disagree Table 1. Web-scale Discovery Service Capabilities INVESTIGATIONS INTO LIBRARY WEB-SCALE DISCOVERY SERVICES | VAUGHAN 39 None of the results was surprising, other than perhaps the low interest or indifference in several Web 2.0 community features, such as the ability for users to provide ratings, reviews, or tags for items, and even a tag cloud. The UNLV Libraries already had a next-generation catalog offering these features, and they have not been heavily used. Even if there had been an appreciable adoption of these features by end users in the next-generation catalog for a web scale discovery service they are perhaps less applicable—it’s probably more likely that users would be less inclined to post reviews and ratings for an article, as opposed to a monograph—and article-level content vastly outnumbers book-level content with web-scale discovery services. The final survey section focused on content. One question asked about the incorporation of ten different information types (sources) and asked staff to rank how important it was that a service include such content. Results are provided in table 2. A bit surprisingly, inclusion of catalog records was seen as most important. Not surprisingly, full-text and A&I content from subscription resources were ranked very highly. It should also be noted that at the time of the survey, the institutional repository was in its infancy with only a few sample records, and awareness of this resource was low among library staff. Another question listed a dozen existing publishers (e.g., Springer, Elsevier, etc.) deemed important to the libraries and asked staff to rank the importance that a discovery service index items from these publishers on a four point scale from “essential” to “not important.” Results showed that all publishers were ranked as essential and important. Related to content, 83.8 percent of staff felt that it was preferable for a service to de-dupe records such that the item appears once in the returned list of results; 14.6 percent preferred that the service not de-dupe results. Information Source Rating Average ILS catalog records 1.69 Majority of full-text articles / other research contained in vendor- licensed online resources 2.54 Majority of citation records for non-full-text vendor-licensed A&I databases 4.95 Consortial catalog records 5.03 Electronic reserves records 5.44 Records within locally created and hosted databases 5.64 Digital collection records 5.77 WorldCat records 6.21 ILS authority control records 6.5 Institutional repository records 6.68 Table 2. Importance of Content Indexed in Discovery Service After the first staff survey was concluded, the Discovery Task Force hosted another library forum to introduce and “test drive” the five vendor services in front of library staff. This session was scheduled just a few weeks ahead of the onsite vendor visits to help serve as a primer to engage library staff and get them actively thinking about questions to ask the vendors. The task force INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2012 40 distributed notecards at the forum and asked attendees to record any specific questions they had about a particular service. After the forum, specific questions related to the particular products were collected; 28 questions were collected, and they helped inform future research for those questions for which the task force did not at the time have an answer. Questions ran the gamut and collectively touched on all three areas of evaluation. Activity: Second Staff Survey Within a month after the five vendor onsite visits, a content analysis of the overlap between UNLV licensed content and content indexed by the discovery services was conducted. After these steps, a second staff survey was administered. This second staff survey had questions focused on the same three functional areas as the first staff survey: local library customization features, end user features and functionality, and content. Since the vendor visits had taken place and users could now understand the questions in the context of the products, questions were asked from the perspective of each product, e.g., “Please rate on a five point Likert scale whether each discovery service appears to adequately cover a majority of the critical publisher titles (WorldCat Local, Summon, EDS, Encore Synergy, Primo Central).” In addition, there were free-text questions focused on each individual product allowing colleagues to share additional, detailed thoughts. The second survey totalled 25 questions and had an average response rate of 18 respondents, or about 18 percent of library staff. Several staff conducted a series of sample searches in each of the services and provided feedback of their findings. Though this was a small response rate, two of the five products rose to the top, a third was a strong contender, and two were seen as less desirable. The lower response rate is perhaps indicative of several things. First, not all staff had attended the onsite vendor demonstrations or had taken the time to test drive the services via the links provided on the Discovery Task Force wiki site. Second, some questions were more appropriately answered by a subset of staff. For example, the content questions might best be matched to those with reference, collection development, or curriculum and program liaison duties. Finally, intricate details emerged once a thorough analysis of the vendor services was commenced. The first survey was focused more on the philosophy of what was desirable; the second survey took this a step further and asked how well each product matched such wishes. Discovery services are changing rapidly with respect to interface updates, customization options, and scope of content. As such, and also reflective of the lower response rate, the author is not providing response information nor analysis for this second survey within this article. However, results may be provided upon specific request to the author. The questions themselves for the second staff survey are significant, and they could help serve as a model for other libraries evaluating existing services on the market. As such, questions appear in appendix D. Activity: Early Adopter References One of the latter steps in the evaluation process from the internal academic library perspective was to obtain early adopter references from other academic library customers. A preliminary shortlist was compiled through a straw vote of the Discovery Task Force—and the results of the vote showed a consensus. This vote narrowed down the Discovery Task Force’s list of services still in contention for a potential purchase. This shortlist was based on the growing mass of research conducted by the Discovery Task Force and informed by the staff surveys and feedback to date. Three live customers were identified for each service that had made the shortlist, and the task INVESTIGATIONS INTO LIBRARY WEB-SCALE DISCOVERY SERVICES | VAUGHAN 41 force successfully obtained two references for each service. Reference requests were intensive and involved a set of two dozen questions that references either responded to in writing or answered during scheduled conference calls. To help libraries conducting or interested in conducting their own evaluation and analysis of these services, this list of questions appears in appendix E. The services are so new that the live references weren’t able to comprehensively answer all the questions—they simply hadn’t had sufficient time to fully assess the service they’d chosen to implement. Still, some important insights were gained about the specific products and, at the larger level, discovery services as a whole. As noted earlier, discovery services are changing rapidly in the sense of interface updates, customization options, and scope of content. As such, the author is not providing product specific response information or analysis of responses for each specific product—such investigations and interpretations are the job of each individual library seriously wishing to evaluate the services to help decide which product seems most appropriate for its particular environment. Several broad insights merit notice, and they are shared below. Regarding a question on implementation (though some challenges were mentioned with a few responders), nothing reached the threshold of serious concern. All respondents indicated the new discovery service is already the default or primary search box on their website. One section of the early adopter questions focused on content. The questions in this area seemed a bit challenging for the respondents to provide lots of detail. In terms of “adequately covering a majority of the important library titles,” respondents varied from “too early to tell,” “it covers many areas but there are some big names missing,” to two of the respondents answering simply, “yes.” Several respondents also clearly indicated that the web-scale discovery service is not the “beginning and ending” for discovery, a fact that even some of the discovery vendors openly note. For example, one respondent indicated that web-scale discovery doesn’t replace remote federated searching. A majority (not all) of the discovery vendors also have a federated search product that can, to varying degrees, be integrated with their preharvested, centralized, index-based discovery service. This allows additional content to be searched because such databases may include content not indexed within the web-scale discovery service. However, many are familiar with the limitations of federated search technologies: slow speed, poor relevancy ranking of results, and the need to configure and maintain sources and targets. Such problems remain with federated search products integrated with web-scale discovery services. Another respondent indicated they were targeting their discovery service at undergraduate research needs. Another responded, “As a general rule, I would say the discovery service does an excellent job covering all disciplines. If you start really in-depth research in a specific discipline, it starts to break down. General searches are great . . . dive deeper into any discipline and it falls apart. For example, for a computer science person, at some point they will want to go to ACM or IEEE directly for deep searches.” Related to this, “the catalog is still important, if you want to do a very specific search for a book record, the catalog is better. The discovery service does not replace the catalog.” In terms of satisfaction with content type (newspapers, articles, proceedings, etc.), respondents seemed generally happy with the content mix. A range of responses were received, such as “doesn’t appear to be a leaning one way or another, it’s a mix. Some of these things depend on how you set the system up, as there is quite a bit of flexibility; the library has to make a decision on what they want searched.” Another example was that “the vendor has been working very hard to balance content types and I’ve seen a lot of improvement,” “no imbalance, results seem pretty well rounded.” Another responded, “A common complaint is that newspapers and book reviews dominate the search results, but that is much more a function of search algorithms then the amount of content in the index.” INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2012 42 When asked about positive or critical faculty feedback to the service, several respondents indicated they hadn’t had a lot of feedback yet. One indicated they had anecdotal feedback. Another indicated they’d received backlash from some users who were used to other search services (but also added that it was no greater than backlash from any other service they’d implemented in the past—and so the backlash wasn’t a surprise). One indicated “not a lot of feedback from faculty, the tendency is to go to databases directly, librarians need to instruct them in the discovery service.” For student feedback, one indicated, “We have received a few positive comments and see increased usage.” Another indicated, “Reviews are mixed. We have had a lot of feedback thanking us for providing a search that covers articles and books. They like the ability to do one search and get a mix of resources without the search taking a long time. Other feedback usually centers around a bug or a feature not working as it should, or as they understand it should. In general, however, the feedback has been positive.” Another replied, “Comments we receive are generally positive, but we’ve not collected them systematically.” Some respondents indicated they had done some initial usability testing on the initial interface, but not the most recent one now in use. Others indicated they had not yet conducted usability testing, but it was planned for later in 2010 or 2011. In terms of their fellow library staff and their initial satisfaction, one respondent indicated, “Somewhere between satisfied and very satisfied . . . it has been increasing with each interface upgrade . . . our instruction librarians are not planning to use the discovery service this fall [in instruction efforts] because they need more experience with it . . . they have been overall intrigued and impressed by it . . . I would say our organization is grappling more with the implications of a discovery tools as a phenomenon than with our particular discovery service in particular. There seems to be general agreement that it is a good search tool for the unmediated searcher.” Another indicated some concerns with the initial interface provided: “If librarians couldn’t figure it out, users can’t figure it out.” Another responded, it was “a big struggle with librarians getting on board with the system and promoting the service to students. They continually compare it against the catalog. At one point, they weren’t even teaching the discovery service in bib instruction. The only way to improve things it with librarian feedback; it’s getting better, it has been hard. Librarians have a hard time replacing the catalog and changing things that they are used to.” In terms of local customization, responses varied; some libraries had done basically no customization to the out-of-the-box interface, others had done extensive customization. One indicated they had tweaked sort options and added widgets to the interface. Another indicated they had done extensive changes to the CSS. One indicated they had customized the colors, added a logo, tweaked the headers and footers, and created “canned” or preconfigured search boxes searching a subset of the index. Another indicated they couldn’t customize the header and footer to the degree they would have liked, but were able to customize these elements to a degree. One respondent indicated they’d done a lot of customization to an earlier version of the interface, which had been rather painstaking, and that much of this broke when they upgraded to the latest version. That said, they also indicated the latest version was much better than the previous version. One respondent indicated it would be nice if the service could have multiple sources for INVESTIGATIONS INTO LIBRARY WEB-SCALE DISCOVERY SERVICES | VAUGHAN 43 enriched record content so that better coverage could be achieved. One respondent indicated they were working on a complete custom interface from scratch, which would be partially populated with results from the discovery service index (as well as other data sources). A few questions asked about relevancy as a search concept and how well the respondents felt about the quality of returned results for queries. One respondent indicated, “we have been able to tweak the ranking and are satisfied at this point.” Another indicated, “overall, the relevance is good – and it has improved a lot.” Another noted, “known item title searching has been a problem . . . the issues here are very predictable – one word titles are more likely to be a problem, as well as titles with stopwords,” and noted the vendor was aware of the issue and was improving this. One noted, “we would like to be able to experiment with the discovery service more – and noted, “no relevancy algorithm control.” Another indicated they looked to investigate relevance more once usability studies commenced, and noted they worked with the vendor to do some code changes with the default search mechanism. One noted that they’d like to be able to specify some additional fields that would be part of the algorithm associated with relevancy. Another optimistically noted “as an early adopter, it has been amazing to see how relevance has improved. It is not perfect, but it is constantly evolving and improving.” A final question asked simply, “Overall, do you feel your selection of this vendor’s product was a good one? Do you sense that your users – students and faculty – have positively received the product?” For the majority of responses, there was general agreement from the early adopters that they felt they’d made the right choice. One noted that it was still early and the evaluation is still a work in progress, but felt it has been positively received. The majority were more certain, “yes, I strongly feel that this was the right decision . . . as more users find it, I believe we will receive additional positive feedback,” “yes, we strongly believe in this product and feel it has been adopted and widely accepted by our users,” “I do feel it was a good selection.” The External Perspective: Dialog with Web-scale Discovery Vendors The preceding sections focused on an academic library’s perspective on web-scale discovery services—the thoughts, opinions, preferences, and vetting activities involving library staff. The following sections focus on the extensive dialog and interaction with the vendors themselves, regardless of the internal library perspective, and highlight the thorough, meticulous research activities conducted on five vendor services. The Discovery Task Force sought to learn as much about the each service as possible, a challenging proposition given the fact that at the start of investigations, only two of five services had been released, and, unsurprisingly, very little research existed. As such, it was critical to work with vendors to best understand their services, and how their service compared to others in the marketplace. Broadly summarized efforts included identification of services, drafting of multiple comprehensive question lists distributed to the vendors, onsite vendor visits, and continual tracking of service enhancements. Activity: Vendor Identification Over the course of a year’s work, the Discovery Task Force executed several steps to systematically understand the vendor marketplace—the capabilities, content considerations, development cycles, and future roadmaps associated with five vendor offerings. Given that the INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2012 44 task force began their work when only two of these services were in public release, there was no manual, recipe, or substantial published research to rely on. The beginning, for the UNLV Libraries, lie in identification of the services—one must first know the services to be evaluated before evaluation can commence. As mentioned previously, the Discovery Mini-Summit held at the UNLV Libraries highlighted one product—Serial Solutions Summon; the only released product at the time of the Mini-Summit was WorldCat Local. While no published peer-reviewed research highlighting these new web-scale discovery services existed, press and news releases did exist for the three to-be-released services. Such releases shed light on the landscape of services that the task force would review—a total of five services, from the first-to-market, WorldCat Local, to the most recent entrant, Primo Central. OCLC WorldCat Local, released in November 2007, can be considered the first web-scale discovery service as defined in this research; the experience of an early pilot partner (the University of Washington) is profiled in a 2008 issue of Library Technology Reports.14 In the UW pilot, approximately 30 million article-level items were included with the WorldCat database. Another product, Serials Solutions Summon, was released in July 2009, and together these two services were the only ones publicly released when the Discovery Task Force began its work. The task force identified three additional vendors each working on their own version of a web-scale discovery service; each of these services would enter initial general release as the task force continued its research: EBSCO EDS in January 2010, Innovative Interfaces Encore Synergy around May 2010, and Ex Libris Primo Central in June 2010. While each of these three were new in terms of web-scale discovery capabilities, each was built, at least in part, on earlier systems from the vendors. EDS draws heavily from the EBSCOhost interface (the original version of which dates back to the 1990s), while the base Encore and base Primo systems were next-generation catalog systems that debuted in 2007. Activity: Vendor Investigations After identification of existing and under development discovery services, a next step in UNLV’s detailed vendor investigations included the creation of a uniform, comprehensive question list sent to each of the five vendors. The Discovery Task Force ultimately developed a list of 71 questions divided into nine functional areas, as follows, with an example question: Section 1: Background. “When did product development begin (month, year)?” Section 2: Locally Hosted Systems and Associated Metadata. “With what metadata schemas does your discovery platform work? (e.g., MARC, Dublin Core, EAD, etc.)” Section 3: Publisher/Aggregator Coverage (Full Text and Citation Content). “With approximately how many publishers/aggregators have you forged content agreements ?” Section 4: Records Maintenance and Rights Management. “How is your system initialized with the correct set of rights management information when a new library customer subscribes to your product?” INVESTIGATIONS INTO LIBRARY WEB-SCALE DISCOVERY SERVICES | VAUGHAN 45 Section 5: Seamlessness & Interoperability with Existing Content Repositories. “For ILS records related to physical holdings, is status information provided directly within the discovery service results list?” Section 6: Usability Philosophy. “Describe how your product incorporates published, established best practices in terms of a customer focused, usable interface.” Section 7: Local “Look & Feel” Customization Options. “Which of the following can the library control: Color Scheme; Logo / Branding; Facet Categories and placement; etc.” Section 8: User Experience (Presentation, Search Functionality, and What the User Can Do With the Results). “At what point does a user leave the context and confines of the discovery interface and enter the interface of a different system, whether remote or local?” Section 9: Administration Module & Statistics. “Describe in detail the statistics reporting capabilities offered by your system. Does your system provide the following sets of statistics . . .” All vendors were given 2–3 weeks to respond, and all vendors responded. It was evident from the uneven level of responses to the questions that the vendors were at different developmental states with their products. Some vendors were still 6–9 months away from initial public release; some were not even firm on when their service would enter release. It was also observed that some vendors were less explicit in the level of detail provided, reflective of, or in some cases perhaps regardless of, development state. A refined subset of the original 71 questions appears as a list of 40 questions in appendix F. Apart from the detailed question list, various sets of free and licensed information on these discovery services are available online, and the task force sought to identify and digest the information. The Charleston Advisor has conducted interviews with several of the library web- scale discovery vendors on their products, including EBSCO,15 Serials Solutions,16 and Ex Libris.17 These interviews, each around a dozen questions, ask the vendors to describe their product, how it differs from other products in the marketplace, and include questions on metadata and content—all important questions. An article by Ronda Rowe reviews Summon, EDS, and WorldCat Local, and provides some analysis of each product on the basis of content, user interface and searchability, pricing, and contract options.18 It also provides a comparison of 24 product features provided by these three services, such as “search box can be embedded in any webpage,” “local branding possible,” and “supports social networking.” A wide variety of archived webcasts, many provided by Library Journal, are available through free registration, and new webcasts are being offered at time of writing; these presentations to some degree touch on discussions with the discovery vendors, and are often moderated or include company representatives as part of the discussion group.19 Several libraries have authored reports and presentations that, at least partially, discuss information on particular services gained through their evaluations, which include dialog with the vendors.20 Vendors themselves each have a section on their corporate website devoted to their service. Information provided on these websites ranges from extremely brief to, in the case of WorldCat Local, very detailed and informative. In addition, much can be gained by “test-driving” live implementations. As such, a listing of vendor website addresses INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2012 46 providing more information as well as a list of sample, live implementations is provided in appendix G. Activities: Vendor Visits and Content Overlap Analysis Each of the five vendors visited the UNLV Libraries in spring 2010. Vendor visits all occurred within a nine-day span; visits were intentionally scheduled close to each other to keep things fresh in the minds of library staff, and such proximity would help with product comparisons. Vendor visits lasted approximately half a day, and each vendor visit often included the field or regional sales representative as well as a product manager or technical expert. Vendor visits included a demonstration and Q&A for all library staff as well as invited colleagues from other southern Nevada libraries, a meeting with the Discovery Task Force, and a meeting with technical staff at UNLV responsible for website design and application development and customization. Vendors were each given a uniform set of fourteen questions on topics to address during their visit; these appear in appendix H. Questions were divided into the broad topical areas of content coverage, end user interface and functionality, and staff “control” over the end user interface. On average, approximately 30–40 percent of the library staff attended the open vendor demo and Q & A session. Shortly after the vendor visits, a content-overlap analysis comparing UNLV serials holdings with indexed content in the discovery service was sought from each vendor. Given that the amount of content indexed by each discovery service was growing (and continues to grow) extremely rapidly as new publisher and aggregator content agreements are signed, this content-overlap analysis was intentionally not sought at an earlier date. Some vendors were able to provide detailed coverage information against our existing journal titles (UNLV currently subscribes to approximately 20,000 e-journals and provides access to another 7,000+ open-access titles). For others, this was more difficult. Recognizing this, the head of Collection Development was asked to provide a list of the “top 100” journal titles for UNLV based on such factors as usage statistics and whether the title was a core title for part of the UNLV curriculum. The remaining vendors were able to provide content coverage information against this critical title list. Four of the five products had quite comprehensive coverage (more than 80 percent) of the UNLV Libraries’ titles. While outside the scope of this article, “coverage” can mean different things for different services. Driven by the publisher agreements they are able to secure, some discovery services may have extensive coverage for particular titles (such as the full text, abstracts, author-supplied keywords, subject headings, etc.), whereas other services, while covering the same title, may have “thinner” metadata, such as basic citation information (article title, publication title, author, publication date, etc.). More discussion on this topic is present in the January 2011 Library Technology Reports on library web-scale discovery services.21 Activity: Product Development Tracking One aspect of web-scale discovery services, and the next-generation discovery layers that preceded them, is a rapid enhancement cycle, especially when juxtaposed against the turnkey- style ILS system that dominated library automation for many years. As an example, minor enhancements are provided by Serials Solutions to Summon approximately every three to four weeks; provided by EBSCO to EBSCO Discovery Service approximately every three months; and INVESTIGATIONS INTO LIBRARY WEB-SCALE DISCOVERY SERVICES | VAUGHAN 47 provided by Ex Libris to Primo/Primo Central approximately every three months. Many vendors unveil updates coinciding with annual library conferences, and 2010 was no exception. In late summer/early fall 2010, the Discovery Task Force had conference calls or onsite visits with several of the vendors with a focused discussion on new enhancements and changes to services as well as to obtain answers to any questions that arose since their last visit several months earlier. Since the vendor visits in spring 2010, each service had changed, and two services had unveiled significantly different and improved interfaces. The Discovery Task Force’s understanding of web-scale discovery services had expanded greatly since starting their work. Coordinated with the second series of vendor visits and discussions, an additional list of more than two dozen questions, recognizing this refined understanding, was sent to the majority of vendors. A portion of these questions are provided as part of the refined list of questions presented in appendix F. This second set of questions dealt with complex discussions of metadata quality, such as what level of content publishers and aggregators were providing for indexing purposes, e.g., full text, abstracts, table of contents, author-supplied keywords or subject headings, or particular citation and record fields), and also the vendor’s stance on content neutrality, i.e., whether they are entering into exclusive agreements with publishers and aggregators, and, if the discovery service vendor is owned by a company involved with content, if that content is promoted or weighted more heavily in result sets. Other questions dealt with such topics as current install base counts and technical clarifications about how their service worked. In particular, the questions related to content were tricky for many (not all) of the vendors to address. Still, the Discovery Task Force was able to get a better understanding of how things worked in the evolving discovery environment. Combined with the internal library perspective and the early adopter references, information gathered from vendors provided the necessary data set to submit a recommendation with confidence. Activity: Recommendation By mid-fall 2010, the Discovery Task Force had conducted and had at their disposal a tremendous amount of research. Recognizing how quickly these services change and the fact that a cyclical evaluation could occur, the task force members felt they had met their charge. If all things failed during the next phase—implementation—at least no one would be able to question the thoroughness of the task force’s efforts. Unlike the hasty decision, which in part led to a less than stellar experience with federated search a few years earlier, the evaluation process to recommend a new web-scale discovery service was deliberate, thorough, transparent, and vetted with library stakeholders. Given the Discovery Task Force was entering its final phase, official price quotes were sought from each vendor. Each task force member was asked to develop a pro/con list for all five identified products based on the knowledge that was gained. These lists were anonymized and consolidated into a single, extensive pro/con list for each service. Some of the pros and cons were subjective (such as the interface aesthetics), some were objective (such as a particular discovery service not offering a desired feature). At one of the final meetings of the task force, members reaffirmed the three top contenders, indicated the other two were no longer under consideration and, afterward, were asked to rank their first, second, and third choices for the remaining services. While complete consensus wasn’t achieved, there was a resounding first choice, second choice, and third INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2012 48 choice. The task force presented a summary of findings at a meeting open to all library staff. This meeting summarized the research and evaluation steps the task force had conducted over the past year, framed each of the three shortlisted services by discussing some strengths and weaknesses of each service as observed by the task force, and sought to answer any questions from the library at large. Prior to drafting the final report and making the recommendation to the dean of Libraries, several task force members led a discussion and final question and answer at a Libraries’ cabinet meeting, one of the high-level administrative groups at the UNLV Libraries. Vetting by this body represented the last step related to the Discovery Task Force’s investigation, evaluation, and recommendation for purchase of a library web-scale discovery service. The recommendation was broadly accepted by the Library cabinet, and shortly afterward the Discovery Task Force was officially disbanded, having met its goal of investigating, evaluating, and making a recommendation for purchase of a library web-scale discovery service. Next Steps The dialog above describes the research, evaluation, and recommendation model used by the UNLV Libraries to select a web-scale discovery service. Such a model and the associated appendixes could serve as a framework, with some adaptations perhaps, for other libraries considering the evaluation and purchase of a web-scale discovery service. Together, the Discovery Task Force’s internal and external research and evaluation provided a substantive base of knowledge on which to make a recommendation. After its recommendation, the project progressed from a research and recommendation phase to an implementation phase. The Libraries’ cabinet brainstormed a list of more than a dozen concise implementation bullet points—steps that would need to be addressed—including the harvesting and metadata mapping of local library resources, local branding and some level of customization work, and integration of the web-scale discovery search box in the appropriate locations on the Libraries’ website. Project implementation co-managers were assigned (the director of Technical Services and the Web Technical Support manager), as well as key library personnel who would aid in one or more implementation steps. In January 2011, the implementation commenced, with an expected public launch of the new service planned for mid-2011. The success of a web-scale discovery service at the UNLV Libraries is a story yet to be written, but one full of promise. Acknowledgements The author wishes to thank the other members of the UNLV Libraries’ Discovery Task Force in the research and evaluation of Library Web-scale Discovery Services: Darcy Del Bosque, Alex Dolski, Tamera Hanken, Cory Lampert, Peter Michel, Vicki Nozero, Kathy Rankin, Michael Yunkin, and Anne Zald. REFERENCES 1. Marcia J. Bates, Improving User Access to Library Catalog and Portal Information, final report, version 3 (Washington, DC: Library of Congress, 2003), 4, http://www.loc.gov/catdir/bibcontrol/2.3BatesReport6-03.doc.pdf (accessed September 10, 2010). http://www.loc.gov/catdir/bibcontrol/2.3BatesReport6-03.doc.pdf http://www.loc.gov/catdir/bibcontrol/2.3BatesReport6-03.doc.pdf INVESTIGATIONS INTO LIBRARY WEB-SCALE DISCOVERY SERVICES | VAUGHAN 49 2. Roger C. Schonfeld and Ross Housewright, Faculty Survey 2009: Key Strategic Insights for Libraries, Publishers, and Societies (New York: Ithaka S+R, 2010), 4, http://www.ithaka.org/ithaka-s-r/research/faculty-surveys-2000- 2009/Faculty%20Study%202009.pdf (accessed September 10, 2010). 3. OCLC, Online Catalogs: What Users and Librarians Want (Dublin, OH: OCLC, 2009), 20, http://www.oclc.org/reports/onlinecatalogs/fullreport.pdf (accessed September 10, 2010). 4. Ibid, vi. 5. Ibid, 14. 6. Karen Calhoun, The Changing Nature of the Catalog and Its Integration with Other Discovery Tools: Final Report (Washington, DC: Library of Congress, 2006), 35, http://www.loc.gov/catdir/calhoun-report-final.pdf (accessed September 10, 2010). 7. Bibliographic Services Task Force, Rethinking How We Provide Bibliographic Services for the University of California: Final Report ([Pub location?] University of California Libraries, 2005), 2, http://libraries.universityofcalifornia.edu/sopag/BSTF/Final.pdf (accessed September 10, 2010). 8. OCLC, College Students’ Perceptions of Libraries and Information Resources (Dublin, OH: OCLC, 2006), part 1, page 4, http://www.oclc.org/reports/pdfs/studentperceptions.pdf (accessed September 10, 2010). 9. Doug Way, “The Impact of Web-Scale Discovery on the Use of a Library Collection,” Serials Review, in press. 10. Bill Kelm, “WorldCat Local Effects at Willamette University,” presentation, Prezi, July 21, 2010, http://prezi.com/u84pzunpb0fa/worldcat-local-effects-at-wu/ (accessed Sept 10, 2010). 11. Michael Boock, Faye Chadwell, and Terry Reese, “WorldCat Local Task Force Report to LAMP,”March 27, 2009, http://hdl.handle.net/1957/11167 (accessed February 12, 2012). 12. Michael Boock et al., “Discovery Services Task Force Recommendation to University Librarian,” http://hdl.handle.net/1957/13817 (accessed February 12, 2012). 13. Ken Varnum et al., “University of Michigan Library Article Discovery Working Group Final Report,” Umich, January 29, 2010, http://www.lib.umich.edu/files/adwg/final-report.pdf.[Access date?] 14. Jennifer Ward, Pam Mofjeld, and Steve Shadle, “WorldCat Local at the University of Washington Libraries,” Library Technology Reports 44, no. 6 (August/September 2008). 15. Dennis Brunning and George Machovec, “An Interview with Sam Brooks and Michael Gorrell on the EBSCOhost Integrated Search and EBSCO Discovery Service,” Charleston Advisor 11, no. 3 (January 2010): 62–65. http://www.ithaka.org/ithaka-s-r/research/faculty-surveys-2000-2009/Faculty%20Study%202009.pdf http://www.ithaka.org/ithaka-s-r/research/faculty-surveys-2000-2009/Faculty%20Study%202009.pdf http://www.oclc.org/reports/onlinecatalogs/fullreport.pdf http://www.loc.gov/catdir/calhoun-report-final.pdf http://libraries.universityofcalifornia.edu/sopag/BSTF/Final.pdf http://www.oclc.org/reports/pdfs/studentperceptions.pdf http://prezi.com/u84pzunpb0fa/worldcat-local-effects-at-wu/ http://hdl.handle.net/1957/11167 http://hdl.handle.net/1957/13817 http://www.lib.umich.edu/files/adwg/final-report.pdf INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2012 50 16. Dennis Brunning and George Machovec, “Interview About Summon with Jane Burke, Vice President of Serials Solutions,” Charleston Advisor 11, no. 4 (April 2010): 60–62. 17. Dennis Brunning and George Machovec, “An Interview with Nancy Dushkin, VP Discovery and Delivery Solutions at Ex Libris, Regarding Primo Central,” Charleston Advisor 12, no. 2 (October 2010): 58–59. 18. Ronda Rowe, “Web-Scale Discovery: A Review of Summon, EBSCO Discovery Service, and WorldCat Local,” Charleston Advisor 12, no. 1 (October 2010): 5–10. 19. Library Journal archived webcasts are available at http://www.libraryjournal.com/csp/cms/sites/LJ/Tools/Webcast/index.csp (accessed Sept 10, 2010). 20. Boock, Chadwell, and Reese, “WorldCat Local Task Force Report to LAMP”; Boock et al., “Discovery Services Task Force Recommendation to University Librarian”; Ken Varnum et al., “University of Michigan Library Article Discovery Working Group Final Report.” 21. Jason Vaughan, “Library Web-scale Discovery Services,” Library Technology Reports 47, no. 1 (January 2011). Note: Appendices A–H available as supplemental files. http://www.libraryjournal.com/csp/cms/sites/LJ/Tools/Webcast/index.csp Investigations into Library Web-Scale Discovery Services: Appendices A-H Jason Vaughan INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2012 51 Appendices Appendix A. Discovery Task Force Timeline Appendix B. Discovery Task Force Charge Appendix C. Discovery Task Force: Staff Survey 1 Questions Appendix D. Discovery Task Force: Staff Survey 2 Questions Appendix E. Discovery Task Force: Early Adopter Questions Appendix F. Discovery Task Force: Initial Vendor Investigation Questions Appendix G. Vendor Websites and Example Implementations Appendix H. Vendor Visit Questions INVESTIGATIONS INTO LIBRARY WEB-SCALE DISCOVERY SERVICES | VAUGHAN 52 Appendix A. Discovery Task Force Timeline INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2012 53 Appendix B. Discovery Task Force Charge Discovery Task Force Charge Informed through various efforts and research at the local and broader levels, and as expressed in the Libraries 2010/12 strategic plan, the UNLV Libraries have the desire to enable and maximize the discovery of library resources for our patrons. Specifically, the UNLV Libraries seek a unified solution which ideally could meet these guiding principles: • Creates a unified search interface for users pulling together information from the library catalog as well as other resources (e.g. journal articles, images, archival materials). • Enhances discoverability of as broad a spectrum of library resources as possible • Intuitive: minimizes the skills, time, and effort needed by our users to discover resources •Supports a high level of local customization (such as accommodation of branding and usability considerations) • Supports a high level of interoperability (easily connecting and exchanging data with other systems that are part of our information infrastructure) •Demonstrates commitment to sustainability and future enhancements •Informed by preferred starting points As such, the Discovery Task Force advises Libraries Administration on a solution that appears to best meet the goal of enabling and maximizing the discovery of library resources. A bulk of the work will entail a marketplace survey and evaluation of vendor offerings. Charge Specific deliverables for this work include: 1. Identify vendor next generation discovery platforms, whether established and currently on the market, or those publicized and at an advanced stage of development, with an expectation of availability within a year’s time. Identify & create a representative list of other academic libraries which have implemented or purchased currently available products. 2. Create a checklist / criteria of functional requirements / desires for a next generation discovery platform. 3. Create lists of questions to distribute to potential vendors and existing customers of next generation discovery platforms. Questions will focus on broad categories such as the following: a. Seek to understand how content hosted in our current online systems (III catalog, CONTENTdm, locally created databases, vendor databases, etc.) could/would (or not be able INVESTIGATIONS INTO LIBRARY WEB-SCALE DISCOVERY SERVICES | VAUGHAN 54 to) be incorporated or searchable within the discovery platform. Apart from our existing online systems as we know them today, the task force will explore, in general terms, how new information resources could be incorporated into the discovery platform. More explicitly, the task force will seek an understanding of what types of existing records are discoverable within the vendor’s next generation discovery platform, and seek an understanding of what basic metadata must exist for an item to be discoverable. b. Seek to understand whether the solution relies on federated search, the creation of a central site index via metadata harvesting, or both, to enable discovery of items. c. Additional questions, such as pricing, maintenance, install base, etc. 4. Evaluate gathered information and seek feedback from library staff. 5. Provide to the Dean’s Directs a final report which summarizes the task force findings. This report will include a recommended product(s) and a broad, as opposed to detailed, summary of workload implications related to implementation and ongoing maintenance. The final report should be provided to the Dean’s Directs by February 15, 2010. Boundaries The work of the task force does not include: • Detailing the contents of “hidden collections” within the Libraries and seeking to make a concrete determination that such hidden collections, in their current form, would be discoverable via the new system. • Conducting an inventory, recommending, or prioritizing collections or items which should be cataloged or otherwise enriched with metadata to make them discoverable. • Coordination with other southern Nevada NSHE entities. • An ILS marketplace survey. The underlying Innovative Millennium System is not being reviewed for potential replacement. • Implementation of a selected product. [the charge concluded with a list of members for the Task Force] INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2012 55 Appendix C. Discovery Task Force: Staff Survey 1 Questions “RANK” means the SurveyMonkey question will be set up such that each option can only be chosen once, and will be placed on a scale that corresponds to the number of choices overall. “RATE” means there will be a 5 point Likert scale ranging from strongly disagree to strongly agree. Section 1: Customization. The “Staff Side” of the House 1. Customization. It is important for the Library to be able to control/tweak/influence the following design element [Strongly Disagree / Disagree / Neither Agree or Disagree / Agree / Strongly Agree]  General color scheme  Ability to include a UNLV logo somewhere on the page.  Ability to add other branding elements to the page.  Ability to add one or more library specified links prominently in the interface (example: a link to the Libraries’ home page)  Able to customize the name of the product (meaning, the vendor’s name for the product doesn’t need to be used nor appear within the interface)  Ability to embed the search box associated with the discovery platform elsewhere into the library website, such as the homepage (i.e. the user could start a search w/o having to directly go to the discovery platform 2. Customization. Are there any other design customization capabilities that are significantly important? Please list, and please indicate if this is a high, low, or medium priority in terms of importance to you. (freetext box ) 3. Search Algorithms. It is important for the Library to be able to change or tweak the platform’s native search algorithm to be able to promote desired items such that they appear higher in the returned list of [Strongly Disagree / Disagree / Neither Agree or Disagree / Agree / Strongly Agree] [e.g. The Library, at its option, could tweak one or more search algorithms to more heavily weight resources it wants to promote. For example, if a user searches for “Hoover Dam” the library could set a rule that would heavily weight and promote UNLV digital collection images for Hoover Dam – those results would appear on the first page of results]. 4. Statistics. The following statistic is important to have for the discovery platform [Strongly Disagree / Disagree / Neither Agree or Disagree / Agree / Strongly Agree]  Number of searches, by customizable timeframe Number of item or article level records accessed (that is, a user clicks on something in the returned list of results)  Number of searches generating 0 results INVESTIGATIONS INTO LIBRARY WEB-SCALE DISCOVERY SERVICES | VAUGHAN 56  Number of items accessed by type  Number of items accessed by provider of content (that is, number of articles from particular database/fulltext vendor 5. Statistics. What other statistics would you like to see a discovery platform provide and how important is this to you? (freetext box) 6. Staff Summary. Please RANK on a 1-3 scale how important the following elements are, with a “1” being most important, a “2” being 2nd most important, and a 3 being 3rd most important.  Heavy customization capabilities as described in questions 1 & 2 above  Ability to tweak search algorithms as described in question 3  Ability for the system to natively provide detailed search stats such as described in question 4, 5. Section 2. The “End User” Side of the House 7. Searching. Which of the following search options is preferable when a user begins their search [choose one]  The system has a “Google-like” simple search box  The system has a “Google-like” simple search box, but also has an advanced search capability (user can refine the search to certain categories: author, journal, etc.)  No opinion 8. Zero Hit Searches. For a search that retrieves no actual results: [choose one]  The system should suggest something else or ask, “Did you mean?”  Retrieving precise results is more important and the system should not suggest something else or ask “Did you mean?”  no opinion 9. De-duplication of similar items. Which of the following is preferable [choose one]  The system automatically de-dupes records (the item only appears once in the returned list)  The system does not de-dupe records (the same item could appear more than once in the returned list, such as when we have overlapping coverage of a particular journal from multiple subscription vendors)  No opinion INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2012 57 10. Sorting of Returned Results. It is important for the user to be able to sort or reorder a list of returned results by . . [Strongly Disagree / Disagree / Neither Agree or Disagree / Agree / Strongly Agree]  Publication Date  Alphabetical by Author Name  Alphabetical by Title  Full Text Items First  By Media Type (examples: journal, book, image, etc) 11. Web 2.0 Functionality on Returned Results. The following items are important for a discovery platform to have . . [Strongly Disagree / Disagree / Neither Agree or Disagree / Agree / Strongly Agree] (note, if necessary, please conduct a search in the Libraries’ Encore system to help illustrate / remember some of the features/jargon mentioned below. In Encore, “Facets” appear on the left hand side of the screen; the results with book covers, “add to cart,” and “export” features appear in the middle; and a tag cloud to the right. Note: this question is asking about having the particular feature regardless of which vendor, and not how well or how poorly you think the feature works for the Encore system)  A tag cloud  Faceted searching  Ability to add user-generated tags to materials (“folksonomies”)  Ability for users to write and post a review of an item • Other (please specify) 12. Enriched Record Information on Returned Results. The following items are important to have in the discovery system . . . [Strongly Disagree / Disagree / Neither Agree or Disagree / Agree / Strongly Agree]  Book covers for items held by the Libraries  A Google Books preview button for print items held by the Libraries  Displays item status information for print items held by the Libraries (example: available, checked out) 13. What the User Can do With the Results. The following functionality is important to have in the discovery system . . [Strongly Disagree / Disagree / Neither Agree or Disagree / Agree / Strongly Agree]  Retrieve the fulltext of an item with only a single click on the item from the initial list of returned results  Ability to add items to a cart for easy export (print, email, save, export to Refworks) INVESTIGATIONS INTO LIBRARY WEB-SCALE DISCOVERY SERVICES | VAUGHAN 58  Ability to place an InterLibrary Loan / LINK+ Request for an item  System has a login/user account feature which can store user search information for later. In other words, a user could potentially log in to retrieve saved searches, previously stored items, or create alerts when new materials become available. 14. Miscellaneous. The following feature/attribute is important to have in the discovery system . . . [Strongly Disagree / Disagree / Neither Agree or Disagree / Agree / Strongly Agree]  The vendor has an existing mobile version of their discovery tool for use by smartphones or other small internet-enabled devices.  The vendor has designed the product such that it can be incorporated into other sites used by students, such as WebCampus and/or social networking sites. Such “designs” may include the use of persistent URLs to embed hyperlinks, the ability to place the search box in another website, or specifically designed widgets developed by the vendor  Indexing and availability of newly published items occurs within a matter of days as opposed to a week or perhaps a month.  Library catalog authority record information is used to help return proper results and/or populate a tag cloud. 15. End User Summary. Please RANK on a 1-8 scale how important the following elements are; a “1” means you think it is the most important, a “2” second most important, etc.  System offers a “Google-like” simple search box only, as detailed in question 7 above  System offers a “did you mean?” or alternate suggestions for all searches retrieving 0 results as detailed in question 8 above (obviously, if you value precision of results over “did you mean” functionality, you would rank this toward the lower end of the spectrum).  System de-dupes similar items as detailed in question 9 above(if you believe the system should not de- dupe similar items, you would rate this toward the lower end of the spectrum)  System provides multiple sort options of returned results as detailed in question 10 above  System offers a variety of Web 2.0 features as detailed in question 11 above  System offer enriched record information as detailed in question 12 above  System offers flexible options for what a user can do with the results, as detailed in question 13 above  System has one or more miscellaneous features as detailed in question 14 above. Section 3: Content 16. Incorporation of Different Information Types. In an ideal world, a discovery platform would incorporate ALL of our electronic resources, whether locally produced or licensed/purchased from vendors. Below is a listing of different information types. Please RANK on a scale of 1-10 how vital it is INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2012 59 that a discovery platform accommodate these information types (“1” is the most important item in your mind, a “2” is second most important, etc). a. Innopac Millennium records for UNLV print & electronic holdings b. LINK+ records for print holdings held within the LINK+ consortium c. Innopac authority control records d. Records within OCLC WorldCat e. CONTENTdm records for digital collection materials f. bePRESS Digital Commons Institutional Repository materials g. Locally created Web accessible database records (e.g. the Special Collections & Architecture databases) h. Electronic Reserves materials hosted in ERES i. A majority of the citation records from non fulltext, vendor licensed online index/abstract/citation databases (e.g. The “Agricola” database) j. A majority of the fulltext articles or other research contained in many of our vendor licensed online resources (e.g. “Academic Search Premier” which contains a lot of full text content, and the other fulltext resource packages / journal titles we subscribe to) 17. LOCAL Content. Related to item (g) in the question immediately above, please list any locally produced collections that are currently available either on the website, or in electronic format as a word document, excel spreadsheet or access database (and not currently available on the website) that you would like the discovery platform to incorporate. (freetext box) 18. Particular Sets of Licensed Resources, What’s Important? Please rank which of the licensed (full text or primarily full text) existing publishers below are most important for a discovery platform to accommodate. Elsevier Sage Wiley Springer American Chemical Society Taylor & Francis (Informaworld) IEEE American Institute of Physics Oxford Ovid Nature Emerald INVESTIGATIONS INTO LIBRARY WEB-SCALE DISCOVERY SERVICES | VAUGHAN 60 Section 4: Survey Summary 19. Overarching Survey Question. The questions above were roughly categorized into three areas. Given that no discovery platform will be everything to everybody, please RANK on a 1-3 scale what the most important aspects of a discovery system are to you (1 is most critical, 2 is second in importance overall, etc.)  The platform is highly customizable by staff (types of things in area 1 of the survey)  The platform is highly flexible from the end-user standpoint (type of things in area 2 of the survey)  The platform encompasses a large variety of our licensed and local resources (type of things in area 3 of the survey) 20. Additional Input. The survey above is roughly drawn from a larger list of 71 questions sent to the Discovery Task Force vendors. What other things do you think are REALLY important when thinking about a next-generation discovery platform? (freetext input, you may write a sentence or a book) 21. Demographic. What Library division do you belong to? Library Administration Library Technologies Research & Education Special Collections Technical Services User Services INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2012 61 Appendix D. Discovery Task Force: Staff Survey 2 Question For the comparison questions, products are listed by order of vendor presentation. Please mark an answer for each product. PART I. Licensed Publisher CONTENT (e.g. fulltext journal articles; citations / abstracts) SA = Strongly Agree; A = Agree; N= Neither Agree nor Disagree; D = Disagree; SD = Strongly Disagree 1. “The Discovery Platform appears to ADEQUATELY cover a MAJORITY of the CRITICAL publisher titles.” SA A N D SD I don’t know enough about the content coverage for this product to comment Ex Libris Primo Central OCLC WorldCat Local Ebsco Discovery Services Innovative Encore Synergy Serials Solutions Summon 2. “The Discovery Platform appears to ADEQUATELY cover a MAJORITY of the SECOND-TIER or SOMEWHAT LESS CRITICAL publisher titles.” SA A N D SD I don’t know enough about the content coverage for this product to comment Ex Libris Primo Central OCLC WorldCat Local Ebsco Discovery Services Innovative Encore Synergy Serials Solutions Summon 3. Overall, from the CONTENT COVERAGE point of view, please rank each platform from best to worst. Worst 2nd Worst Middle 2nd Best Best Ex Libris Primo Central OCLC WorldCat Local Ebsco Discovery Services Innovative Encore Synergy Serials Solutions Summon 4. Regardless of a best to worst ranking, please indicate if the products were, overall, ACCEPTABLE or UNACCEPTABLE to you from the CONTENT COVERAGE standpoint. Unacceptable Acceptable Ex Libris Primo Central INVESTIGATIONS INTO LIBRARY WEB-SCALE DISCOVERY SERVICES | VAUGHAN 62 OCLC WorldCat Local Ebsco Discovery Services Innovative Encore Synergy Serials Solutions Summon PART II. END-USER FUNCTIONALITY & EASE OF USE 5. From the USER perspective, how functional do you think the discovery platform is? Are the facets and/or other methods that one can use to limit or refine a search appropriate? Were you satisfied with the export options offered by the system (email, export into Refworks, print, etc.)? If you think Web 2.0 technologies are important (tag cloud, etc.), were one or more of these present (and well executed) in this product? The platform appears to be SEVERELY limited in major aspects of end user functionality The platform appears to have some level of useful functionality, but perhaps not as much or as well executed as some competing products. Yes, the platform seems quite rich in terms of end user functionality, and such functions are well executed. I can’t comment on this particular product because I didn’t see the vendor demo, haven’t visited any of the live implementations linked on the discovery wiki page, or otherwise don’t have enough information. Ex Libris Primo Central OCLC WorldCat Local Ebsco Discovery Services Innovative Encore Synergy Serials Solutions Summon 6. From the USER perspective, for a full-text pdf journal article, how EASY is it to retrieve the full-text? Does it take many clicks? Are there confusing choices? It’s very cumbersome trying to retrieve the full text of an item, there are many clicks, and/or it’s simply confusing when going through the steps to retrieve the full text. It’s somewhat straightforward to retrieve a full text item, but perhaps it’s not as easy or as well executed as some of the competing products It’s quite easy to retrieve a full text item using this platform, as good as or better than the competition, and I don’t feel it would be a barrier to a majority of our users. I can’t comment on this particular product because I didn’t see the vendor demo, haven’t visited any of the live implementations linked on the discovery wiki page, or otherwise don’t have enough information. Ex Libris Primo Central INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2012 63 OCLC WorldCat Local Ebsco Discovery Services Innovative Encore Synergy Serials Solutions Summon 7. How satisfied were you with the platform’s handling of “dead end” or “zero hit” searches? Did the platform offer “did you mean” spelling suggestions? Did the platform offer you the option to request the item via doc delivery / LINK+? Is the vendor’s implementation of such features well executed, or were they difficult, confusing, or otherwise lacking? The platform appears to be severely limited in or otherwise poorly executes how it responds to a dead end or zero hit search. The platform handled dead end or zero hit results, but perhaps not as seamlessly or as well executed as some of the competing products. I was happy with how the platform handled “dead end” searches, and such functionality appears to be well executed, as good as or better than the competition. I can’t comment on this particular product because I didn’t see the vendor demo, haven’t visited any of the live implementations linked on the discovery wiki page, otherwise don’t have enough information. Ex Libris Primo Central OCLC WorldCat Local Ebsco Discovery Services Innovative Encore Synergy Serials Solutions Summon 8. How satisfied were you with the platform’s integration with the OPAC? Were important things such as call numbers, item status information, and enriched content immediately available and easily viewable from within the discovery platform interface, or did it require an extra click or two into the OPAC – and did you find this cumbersome or confusing? The platform provides minimal OPAC item information, and a user The platform appeared to integrate ok with the OPAC in I was happy with how the platform integrated with the I can’t comment on this particular product because I didn’t see the INVESTIGATIONS INTO LIBRARY WEB-SCALE DISCOVERY SERVICES | VAUGHAN 64 would have to click through to the OPAC to get the information they might really need; and/or it took multiple clicks or was otherwise cumbersome to get the relevant item level information terms of providing some level of relevant item level information, but perhaps not as much or as well executed as competing products. OPAC. A majority of the OPAC information was available in the discovery platform, and/or their connection to the OPAC was quite elegant. vendor demo, haven’t visited any of the live implementations linked on the discovery wiki page, or otherwise don’t have enough information. Ex Libris Primo Central OCLC WorldCat Local Ebsco Discovery Services Innovative Encore Synergy Serials Solutions Summon 9. Overall, from an END USER FUNCTIONALITY / EASE OF USE standpoint – how a user can refine a search, export results, easily retrieve the fulltext, easily see information from the OPAC record – please rank each platform from best to worst. Worst 2nd Worst Middle 2nd Best Best Ex Libris Primo Central OCLC WorldCat Local Ebsco Discovery Services Innovative Encore Synergy Serials Solutions Summon 10. Regardless of a best to worst ranking, please indicate if the products were, overall, ACCEPTABLE or UNACCEPTABLE to you from the USER FUNCTIONALITY / EASE OF USE standpoint. Unacceptable Acceptable Ex Libris Primo Central OCLC WorldCat Local Ebsco Discovery Services Innovative Encore Synergy Serials Solutions Summon PART III. STAFF CUSTOMIZATION INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2012 65 11. The “out of the box” design demo’ed at the presentation (or linked to the discovery wiki page – whichever particular implementation you liked best for that product) was . . Seriously lacking and I feel would need major design changes and customization by library Web technical staff. Middle of the road – some things I liked, some things I didn’t. The interface design was better than some competing products, worse than others. Appeared very professional, clean, well organized, and usable; the appearance was better than most/all of the others products. I can’t comment on this particular product because I didn’t see the vendor demo, haven’t visited any of the live implementations linked on the discovery wiki page, or otherwise don’t have enough information. Ex Libris Primo Central OCLC WorldCat Local Ebsco Discovery Services Innovative Encore Synergy Serials Solutions Summon 12. All products offer some level of customization options that allow at least SOME changes to the “out of the box” platform. Based on what the vendors indicated about the level of customization possible with the platform (e.g. look and feel, ability to add library links, ability to embed the search box on a homepage) do you feel there is enough flexibility with this platform for our needs? The platform appears to be severely limited in the degree or types of customization that can occur at the local level. We appear “stuck” with what the vendor gives us – for better or worse. The platform appeared to have some level of customization, but perhaps not as much as some competing products. Yes, the platform seems quite rich in terms of customization options under our local control; more so than the majority or all of the other products. I can’t comment on this particular product because I didn’t see the vendor demo, don’t have enough information, and/or would prefer to leave this question to technical staff to weigh in on. Ex Libris Primo Central OCLC WorldCat Local Ebsco Discovery Services Innovative Encore INVESTIGATIONS INTO LIBRARY WEB-SCALE DISCOVERY SERVICES | VAUGHAN 66 Synergy Serials Solutions Summon 13. Overall, from a STAFF CUSTOMIZATION standpoint – the ability to change the interface, embed links, define facet categories, define labels, place the searchbox in a different webpage, etc., please rank each platform from best to worst. Worst 2nd Worst Middle 2nd Best Best Ex Libris Primo Central OCLC WorldCat Local Ebsco Discovery Services Innovative Encore Synergy Serials Solutions Summon 14. Regardless of a best to worst ranking, please indicate if the products were, overall, ACCEPTABLE or UNACCEPTABLE to you from the STAFF CUSTOMIZATION standpoint. Unacceptable Acceptable Ex Libris Primo Central OCLC WorldCat Local Ebsco Discovery Services Innovative Encore Synergy Serials Solutions Summon PART IV. SUMMARY QUESTIONS 15. Overall, from a content coverage, user functionality, AND staff customization standpoint, please rank each product from best to worst. Worst 2nd Worst Middle 2nd Best Best Ex Libris Primo Central OCLC WorldCat Local Ebsco Discovery Services Innovative Encore Synergy Serials Solutions Summon INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2012 67 16. Regardless of a best to worst ranking, please indicate if the products were, overall, ACCEPTABLE or UNACCEPTABLE to you from the overall standpoint of content coverage, user functionality, AND staff customization standpoint. Unacceptable Acceptable Ex Libris Primo Central OCLC WorldCat Local Ebsco Discovery Services Innovative Encore Synergy Serials Solutions Summon PART V. ADDITIONAL THOUGHTS 17. Please share any additional thoughts you have on Ex Libris Primo Central. (freetext box) 18. Please share any additional thoughts you have on OCLC WorldCat Local. (freetext box) 19. Please share any additional thoughts you have on Ebsco Discovery Services. (freetext box) 20. Please share any additional thoughts you have on Innovative Encore Synergy. (freetext box) 21. Please share any additional thoughts you have on Serials Solutions Summon. (freetext box) INVESTIGATIONS INTO LIBRARY WEB-SCALE DISCOVERY SERVICES | VAUGHAN 68 Appendix E. Discovery Task Force: Early Adopter Reference Questions Author’s note: Appendix E originally appeared in the January 2011 Library Technology Reports: Web Scale Discovery Services as chapter 7, “Questions to Consider.” Part 1 BACKGROUND 1. How long have you had your discovery service available to your end users? (what month and year did it become generally available to your primary user population, and linked to your public library website). 2. After you had selected a discovery service, approximately how long was the implementation period – how long did it take to “bring it up” for your end‐users and make it available (even if in ‘beta’ form) on your library website? 3. What have you named your discovery service, and is it the ‘default’ search service on your website at this point? In other words, regardless of other discovery systems (ILS, Digital Collection Management System, IR, etc.), has the new discovery service become the default or primary search box on your website? Part 2 CONTENT: Article Level Content Coverage & Scope “Article Level Content” = articles from academic journals, articles from mainstream journals, newspaper content, conference proceedings, open access content 4. In terms of article level content, do you feel the preindexed, preharvested central index of the discovery platform adequately covers a majority of the titles important to your library’s collection and focus? 5. Have you observed any particular strengths in terms of subject content in any of the three major overarching areas -- humanities, social sciences, sciences? 6. Have you observed any big, or appreciable, gaps in any of the three major overarching areas – humanities, social sciences, sciences? 7. Have you observed that the discovery service leans toward one or a few particular content types (e.g. peer reviewed academic journal content; mainstream journal content; newspaper article content; conference proceedings content; academic open access content)? 8. Are there particular publishers whose content is either not incorporated, (or not adequately incorporated), into the central index, that you’d like to see included (e.g. Elsevier journal content)? 9. Have you received any feedback, positive or negative, from your institution’s faculty, related to the content coverage within the discovery service? 10. Taking all of the above questions into consideration, are you happy, satisfied, or dissatisfied with the scope of subject content, and formats covered, in the discovery platform’s central index? 11. In general, are you happy with the level of article level metadata associated with the returned INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2012 69 citation level results (that is, before one retrieves the complete full text). In other words, the product may incorporate basic citation level metadata (e.g. title, author, publication info), or it may include additional enrichment content, such as abstracts, author supplied keywords, etc. Overall, how happy do you sense your library staff is with the quality and amount of metadata provided for a “majority” of the article level content indexed in the system? Part 3 CONTENT: Your Local Library Resources 12. It’s presumed that your local library ILS bib records have been harvested into the discovery solution. Do you have any other local “homegrown” collections – hosted by other systems at your library or institution – whose content has been harvested into the discovery solution? Examples would include digital collection content, institutional repository content, library subject guide content, or other specialized, homegrown local database content. If so, please briefly describe the content – focus of collection, type of content (images, articles, etc.), and a ballpark number of items. If no local collections other than ILS bib record content have been harvested, please skip to question 15. 13. [For local collections other than ILS Bib Records]. Did you use existing, vendor provided ingestors to harvest the local record content (i.e. ingestors to transfer the record content, apply any transformations and normalizations to migrate the local content to the underlying discovery platform schema)? Or did you develop your own ingestors from scratch, or using a toolkit or application profile template provided by the vendor? 14. [For local collections other than ILS Bib Records]. Did you need extensive assistance from the discovery platform vendor to help harvest any of your local collections into the discovery index? If so, regardless of whether the vendor offered this assistance for free or charged a fee, were you happy with the level of service received from the vendor? 15. Do you feel your local content (including ILS Bib records) is adequately “exposed” during a majority of searches? In other words, if your local harvested content equaled a million records, and the overall size of the discovery platform index was a hundred million records, do you feel your local content is “lost” for a majority of end user searches, or adequately exposed? Part 4 INTERFACE: General Satisfaction Level 16. Overall, how satisfied are you and your local library colleagues with the discovery service’s interface? 17. Do you have any sense of how satisfied faculty at your institution are with the discovery service’s interface? Have you received any positive or negative comments from faculty related to the interface? 18. Do you have any sense of how satisfied your (non-faculty) end-users are with the discovery service’s interface? Have you received any positive or negative comments from users related to the interface? 19. Have you conducted any end-user usability testing related to the discovery service? If so, can you provide the results, or otherwise some general comments on the results of these tests? 20. Related to searching, are you happy with the relevance of results returned by the discovery service? Have you noticed any consistent “goofiness,” or surprises with the returned results? If you could make a INVESTIGATIONS INTO LIBRARY WEB-SCALE DISCOVERY SERVICES | VAUGHAN 70 change in the relevancy arena, what would it be, if anything? Part 5 INTERFACE: Local Customization 21. Has your library performed what you might consider any “major customization” to the product? Or has it primarily been customizations such as naming the service, defining hyperlinks and the color scheme? If you’ve done more extensive customization, could you please briefly describe, and was the product architecture flexible enough to allow you to do what you wanted to do (also see question 22 below, which is related). 22. Is there any particular feature or function that is missing or non-configurable within the discovery service that you wish were available? 23. In general, are you happy with the “openness” or “flexibility” of the system in terms of how customizable it is by your library staff? Part 6: FINAL THOUGHTS 24. Overall, do you feel your selection of this vendor’s product was a good one? Do you sense that your users – students and faculty – have positively received the product? 25. Have you conducted any statistics review or analysis (through the discovery service statistics, or link resolver statistics, etc.) that would indicate or at least suggest that the discovery service has improved the discoverability of some of your materials (whether local library materials or remotely hosted publisher content). 26. If you have some sense of the competition in the vendor discovery marketplace, do you feel this product offers something above and beyond the other competitors in the marketplace? If so, what attracted you to this particular product, what made it stand out? INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2012 71 Appendix F. Discovery Task Force: Initial Vendor Investigation Questions Section 1: General / Background Questions 1. Customer Install Base How many current customers do you have that have which have implemented the product at their institution? (the tool is currently available to users / researchers at that institution) How many additional customers have committed to the product? How many of these customers fall within our library type (e.g. higher ed academic, public, K-12)? 2. References Can you provide website addresses for live implementations which you feel serve as a representative model matching our library type? Can you provide references – the name and contact information for the lead individuals you worked with at several representative customer sites which match our library type? 3. Pricing Model, Optional Products Describe your pricing model for a library type such as ours, including initial upfront costs and ongoing costs related to the subscription and technical support. What optional add-on services or modules (federated search, recommender services, enrichment services) do you market which we should be aware of, related to and able to be integrated with your web scale discovery solution? 4. Technical Support and Troubleshooting Briefly describe options customers have, and hours of availability, for reporting mission critical problems; and for reporting observed non mission-critical glitches. Briefly describe any consulting services you may provide above and beyond support services offered as part of the ongoing subscription. (e.g. consulting services related to harvesting of a unique library resource for which an ingest/transform/normalize routine does not already exist). Is there a process for suggesting enhancement requests for potential future incorporation into the product? 5. Size of the Centralized Index. How many periodical titles does your preharvested, centralized index encompass? How many indexed items? 6. Statistics. Please describe what you feel are some of the more significant use, management or content related statistics available out-of-the-box with your system. INVESTIGATIONS INTO LIBRARY WEB-SCALE DISCOVERY SERVICES | VAUGHAN 72 Are the statistics COUNTER compliant? 7. Ongoing Maintenance Activities, Local Library Staff. For instances where the interface and discovery service is hosted on your end, please describe any ongoing local library maintenance activities associated with maintaining the service for the local library’s clientele (e.g. maintenance of the link resolver database; ongoing maintenance associated with periodic local resource harvest updates; etc.) Section 2: Local Library Resources 8. Metadata Requirements and Existing Ingestors. What mandatory record fields for a local resource has to exist for the content to be indexed and discoverable within your platform (title, date)? Please verify that your platform has existing connectors -- ingest/transform/normalize tools and transfer mechanisms and/or application profiles for the following schema used by local systems at our library (e.g. MARC 21 bibliographic records; Unqualified / Qualified Dublin Core, EAD, etc.) Please describe any standard tools your discovery platform may offer to assist local staff in crosswalking between the local library database schema and the underlying schema within your platform. Our Library uses the ABC digital collection management software. Do you have any existing customers who also utilize this platform, whose digital collections have been harvested and are now exposed in their instance of the discovery product? Our Library uses the ABC institutional repository software. Do you have any existing customers who also utilize this platform, whose digital collections have been harvested and are now exposed in their instance of the discovery product? 9. Resource Normalization. Is content for both local and remote content normalized to a single schema? If so, please offer comments on how local and remote (publisher/aggregator) content is normalized to this single underling schema. To what degree can collections from different sources have their own unique field information which is displayed and/or figures into the relevancy ranking algorithm for retrieval purposes. 10. Schedule. For records hosted in systems at the local library, how often do you harvest information to account for record updates, modifications, deletions? Can the local library invoke a manual harvest of locally hosted resource records on a per-resource basis (e.g. from a selected resource – for example, if the library launches a new digital collection and want the records to be available in the new discovery platform shortly after they are available in our local digital collection management system, is there a mechanism to force a harvest prior to the next regularly scheduled harvest routine? After harvesting, how long does it typically take for such updates, additions, and deletions to be reflected in the searchable central index? INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2012 73 11. Policies / Procedures. Please describe any general policies and procedures not already addressed which the local library should be aware of as relates to the harvesting of local resources. 12. Consortial Union Catalogs. Can your service harvest or provide access to items within a consortial or otherwise shared catalog (e.g. the INN-REACH catalog). Please describe. Section 3: Publisher and Aggregator Indexed Content 13. Publisher/Aggregator Agreements: General With approximately how many publishers have you forged content agreements with? Are these agreements indefinite or do they have expiration dates? Have you entered into any exclusive agreements with any publishers/aggregators (i.e. the publisher/aggregator is disallowed from forging agreements with competing discovery platform vendors, or disallowed from providing the same deep level of metadata/full text for indexing purposes). 14. Comments on Metadata Provided. Could you please provide some general comments on the level of data provided to you, for indexing purposes, by the “majority” of major publishers/aggregators with which you have forged agreements. Please describe to what degree the following elements play a role in your discovery service: a. “Basic” bibliographic information (article title/journal title/author/publication information) b. Subject descriptors c. Keywords (author supplied?) d. Abstracts (author supplied?) e. Full text 15. Topical Content Strength Do you feel there is a particular content area that you feel the service covers especially well or leans heavily toward (e.g. Humanities, Social Sciences, Sciences). Do you feel there is a particular content type that you feel the service covers very well or leans heavily toward (scholarly journal content, mainstream journal content, newspapers, conference proceedings). What subject / content areas, if any, do you feel the service may be somewhat weak? Are there current efforts to mitigate these weaknesses (e.g. future publisher agreements on the horizon)? 16. Major Publisher Content Agreements. Are there major publisher agreements that you feel are especially significant for your service? If so, which publishers, and why (e.g. other discovery platform vendors may not have such agreements with those particular providers; the amount of content was so great that it greatly augmented the size and scope of your service; etc.) INVESTIGATIONS INTO LIBRARY WEB-SCALE DISCOVERY SERVICES | VAUGHAN 74 17. Content Considered Key by Local Library (by publisher). Following is a list of some major publishers whose content the library licenses which is considered “key.” Has your company forged agreements with these publishers to harvest their materials. If so please describe in general the scope of the agreement. How many titles are covered for each publisher? What level of metadata are they providing to you for indexing purposes (e.g. basic citation level metadata – title, author, publication date; abstracts; full text). A. ex. Elsevier B. ex. Sage C. ex. Taylor and Francis D. ex. Wiley / Blackwell 18. Content Considered Key by Local Library (by title). Following is a list of some major journal / newspaper titles whose content the library licenses which is considered “key.” Could you please indicate if your central index includes these titles, and if so, the level of indexing (e.g. basic citation level metadata – title, author, publication date; abstracts; full text). A. ex. Nature B. ex. American Historical Review C. ex. JAMA D. ex. Wall Street Journal 19. Google Books / Google Scholar. Do any agreements exist at this time to harvest the data associated with the Google Books or Google Scholar projects into your central index? If so, could you please describe the level of indexing (e.g. basic citation level metadata – title, author, publication date; abstracts; full text). 20. Worldcat Catalog. Does your service include the OCLC WorldCat catalog records? If so, what level of information is included? The complete record? Holdings information? 21. E-Book Vendors. Does your service include items from major e-book vendors? 22. Record Information. Given the fact that the same content (e.g. metadata for a unique article) can be provided by multiple sources (e.g. the original publisher of the journal itself, an open access repository, a database / aggregator, another database / aggregator, etc.), please provide some general comments on how records are built within your discovery service. For example: A. You have an agreement with a particular publisher/aggregator and they agree to provide you with rich metadata for their content, perhaps even provide you with indexing they’ve already done for their content, and may even provide you with the full text for you to be able to “deep index” their content. B. You’ve got an agreement with a particular publisher who happens to be the ONLY publisher/provider of that content. They may provide you rich info, or they may provide you rather weak info. In any case, you choose to incorporate this into your service, as they are the only provider/publisher of the info. Or, INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2012 75 alternately, they may not be the only publisher/provider of the info, but they are the only publisher/provider you’ve currently entered into an agreement with for that content. C. For some items appearing within your service, content for those items is provided by multiple different sources whom you’ve made agreements with. In short, there will be in some/many cases of overlap for unique items, such as a particular article title. In such cases, do you create a “merged/composite/super record” -- where your service utilizes particular metadata from each of the multiple sources, creating a “strong” single record built from these multiple resources. 23. Deduping. Related to the question immediately above, please describe your services’ approach (or not) to deduplicating items in your central index. If your service incorporates content for a same unique item from more than one content provider, does your index retrieve and display multiple instances of the same title? Or do you create a merged/composite/super record, and only this single record is displayed? Please describe. Section 4: Open Access Content 24. Open Access Content Sources. Does your service automatically include (out of the box, no additional charge) materials from open access repositories? If so, could you please list some of the major repositories included (e.g. arXiv E-prints; Hindawi Publishing Corporation; the Directory of Open Access Journals; Hathi Trust Materials; etc.). 25. Open Access Content Sources: Future Plans. In addition to the current open access repositories that may be included in your service, are there other repositories whose content you are planning to incorporate in the future? 26. Exposure to other Libraries’ Bibliographic / Digital Collection / IR Content. Are ILS bibliographic records from other customers using your discovery platform exposed for discoverability in the searchable discovery instance of another customer? Are digital collection records? Institutional repository records? Section 5: Relevancy Ranking 27. Relevancy Determination. Please describe some of the factors which comprise the determination of relevancy within your service. What elements play a role, and how heavily are they weighted for purposes of determining relevancy? 28. Currency. Please comment on how heavily currency of an item plays in relevancy determination. Does currency weigh more heavily for certain content types (e.g. newspapers)? 29. Local Library Influence. Does the local library have any influence or level of control over the relevancy algorithm? Can they choose to “bump up” particular items for a search? Please describe. 30. Local Collection Visibility. Could you please offer some comments on how local content (e.g. ILS bibliographic records; digital collections) remains visible and discoverable within the larger pool of content indexed by your service? For example, local content may measures a million items, and your centralized index may cover half a billion items. INVESTIGATIONS INTO LIBRARY WEB-SCALE DISCOVERY SERVICES | VAUGHAN 76 31. Exposure of Items with Minimal Metadata. Some items likely have lesser metadata than other items. Could you please offer some comments on how your system ensures discoverability for items with lesser or minimal metadata. 32. Full Text Searching. Does your service offer the capability for the user to search the fulltext of materials in your service (i.e. are they searching a full text keyword index?) If so, approximately what percentage of items within your service are “deep indexed?” 33. Please describe how your system deals when no hits are retrieved for a search. Does your system enable “best-match” retrieval – that is, something will always be returned or recommended? What elements play into this determination; how is the user prevented from having a completely “dead-end” search? Section 6: Authentication and Rights Management 34. Open / Closed Nature of Your Discovery Solution. Does your system offer an unauthenticated view / access? Please describe and offer some comments on what materials will not be discoverable/visible for an unauthenticated user. A. Licensed Full Text B. Records specifically or solely sourced from Abstract and Indexing Databases C. Full citation information (e.g. an unauthenticated user may see just a title; an authenticated user would see fuller citation information) D. Enrichment information (such as book image covers, table of contents, abstracts, etc.) E. Other 35. Exposure of non-licensed Resource Metadata. If one weren’t to consider and take into account ANY e-journal/publisher package/database subscriptions & licenses the local library pays for, is there a base index of citation information that’s exposed and available to all subscribers of your discovery service? This may include open access materials, and/or bibliographic information for some publisher / aggregator content (which often requires a local library license to access the full text). Please describe. Would a user need to be authenticated to search (and retrieve results from) this “base index?” Approximately how large is this “base index” which all customers may search, regardless of local library publisher/aggregator subscriptions. 36. Rights Management. Please discuss how rights management is initialized and maintained in your system, for purposes of determining whether a local library user should have access to the full text (or otherwise “full resolution” if a library doesn’t license the fulltext – such as resolution to a detailed citation/abstract). INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2012 77 Our library uses the ABC link resolver. Our library uses the ABC A-Z journal listing service. Our library uses the ABC electronic resource management system. Is your discovery solution compatible with one/all of these systems for rights management purposes? Is one approach preferable to the other, or does your approach explicitly depend on one of these particular services? Section 7: User Interface 37. Openness to Local Library Customization. Please describe how “open” your system is to local library customization. For example, please comment on the local library’s ability to A. Rename the service B. Customize the header and footer hyperlinks / color scheme C. Choose which facet clusters appear D. Define new facet clusters E. Embed the search box in other venues F. Create canned, pre-customized searches for an instance of the search box G. Define and promote a collection, database, or item such that it appears at the top or on the first page of any search I. Develop custom “widgits” offering extra functionality or download “widgits” from an existing user community (e.g. image retrieval widgits such as Flickr integration; library subject guide widgits such as Libguides integration; etc. J. Incorporate links to external enriched content (e.g. Google Book Previews; Amazon.com item information) K. Other 38. Web 2.0 Social Community Features. Please describe some current web 2.0 social features present in your discovery interface (e.g. user tagging, ratings, reviews, etc.). What, if any, plans do you have to offer or expand such functionality in future releases? 39. User Accounts. Does your system offer user accounts? If so, are these mandatory or optional? What services does this user account provide? A. Save a list of results to return to at a later time? INVESTIGATIONS INTO LIBRARY WEB-SCALE DISCOVERY SERVICES | VAUGHAN 78 B. Save canned queries for later searching? C. See a list of recently viewed items? D. Perform typical ILS functions such as viewing checked out items / renewals / holds? E. Create customized RSS feeds for a search 40. Mobile Interface. Please describe the mobile interfaces available for your product. Is it a browser based interface optimized for smallscreen devices? Is it a dedicated iPhone, Android, or Blackberry based executable application? 41. Usability Testing. Briefly describe how your product incorporates published, established “best practices” in terms of a customer focused, usable interface. What usability testing have your performed and/or do you conduct on an ongoing basis? Have any other customers that have gone live with your service completed usability testing that you’re aware of? INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2012 79 Appendix G: Vendor Websites and Example Implementations OCLC WorldCat Local www.oclc.org/us/en/worldcatlocal/default.htm Example Implementations: Lincoln Trails Library System www.lincolntrail.info/linc.html University of Delaware www.lib.udel.edu University of Washington www.lib.washington.edu Willamette University http://library.willamette.edu Serials Solutions Summon www.serialssolutions.com/summon Example Implementations: Dartmouth College www.dartmouth.edu/~library/home/find/summon Drexel University www.library.drexel.edu University of Calgary http://library.ucalgary.ca Western Michigan University http://wmich.summon.serialssolutions.com Ebsco Discovery Services www.ebscohost.com/discovery Example Implementations: James Madison University www.lib.jmu.edu Mississippi State University http://library.msstate.edu Northeastern University www.lib.neu.edu University of Oklahoma http://libraries.ou.edu INVESTIGATIONS INTO LIBRARY WEB-SCALE DISCOVERY SERVICES | VAUGHAN 80 Innovative Interfaces Encore Synergy encoreforlibraries.com/tag/encore-synergy Example Implementations: University of Nebraska-Lincoln http://encore.unl.edu/iii/encore/home?lang=eng University of San Diego http://sallypro.sandiego.edu/iii/encore/home?lang=eng Scottsdale Public Library http://encore.scottsdaleaz.gov/iii/encore/home?lang=eng Sacramento Public Library http://find.saclibrarycatalog.org/iii/encore/home?lang=eng Ex Libris Primo Central www.exlibrisgroup.com/category/PrimoCentral Example Implementations: (Note: Example implementations are listed in alphabetical order. Some implementations are more open to search by an external audience, based on configuration decisions at the local library level.) Brigham Young University ScholarSearch www.lib.byu.edu (Note: Choose All-in-One Search) Northwestern University http://search.library.northwestern.edu Vanderbilt University DiscoverLibrary http://discoverlibrary.vanderbilt.edu (Note: Choose Books, Media, and More) Yonsei University (Korea) WiSearch: Articles + Library Holdings http://library.yonsei.ac.kr/main/main.do (Note: Choose the Articles + Library Holdings link. The interface is available in both Korean and English; to change to English, select English at the top right of the screen after you have conducted a search and are within the Primo Central interface) INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2012 81 Appendix H. Vendor Visit Questions Content 1. Please speak to how well you feel your product stacks up against the competition in terms of the LICENSED full-text / citation content covered by your product. Based on whatever marketplace or other competitive analysis you may have done, do you feel the agreements you’ve made with publishers equal, exceed, or trail the agreements other competitors have made? 2. From the perspective of an academic library serving undergraduate and graduate students as well as faculty, do you feel that there are particular licensed content areas your product covers very well (e.g. humanities, social sciences, sciences). Do you feel there are areas which you need to build up? 3. What’s your philosophy going forward in inking future agreements with publishers to cover more licensed content? Are there particular key publishers your index currently doesn’t include, but whom you are in active negotiations with? 4. We have several local content repositories, such as our digital collections in CONTENTdm, our growing IR repository housed in bePress, and locally developed, web-searchable mySQL databases. Given the fact that most discovery platforms are quite new, do you already have existing customers harvesting their local collections, such as the above, into the discovery platform? Have any particular, common problems surfaced in their attempts to get their local collections searchable and exposed in the discovery platform? 5. Let’s say the library subscribes to an ejournal title – Journal of Animal Studies -- that’s from a publisher with whom you don’t have an agreement for their metadata, and thus, supposedly, don’t index. If a student tried to search for an article in this journal – “Giraffe Behavior During the Drought Season,” what would happen? Is this content still somehow indexed in your tool? Would the discovery platform invoke our link resolver? Please describe. 6. Our focus is your next generation discovery platform, and NOT on your “traditional” federated search product which may be able to cover other resources not yet indexed in your next generation discovery platform. That said, please BRIEFLY describe the role of your federated search product vis a vis the next generation discovery platform. Do you see your federated search product “going away” once more and more content is eventually indexed in your next generation discovery platform? End User Interface & Functionality 7. Are there any particular or unique LOOK and FEEL aspects of your interface that you feel elevate your product above your competitors? If so, please describe. 8. Are there any particular or unique FUNCTIONALITY aspects of your product that you feel elevate it above the competition (e.g. presearch or postsearch refinement categories, export options, etc.) 9. Studies show that end users want very quick access to full text materials such as electronic journal articles and ebooks. What is your product’s philosophy in regards to this? Does your platform, in your opinion, provide seamless, quick access to full text materials, with a minimum of confusion? Please describe. INVESTIGATIONS INTO LIBRARY WEB-SCALE DISCOVERY SERVICES | VAUGHAN 82 Related to this, does your platform de-dupe results, or is the user presented with a list of choices for a single, particular journal article they are trying to retrieve? In addition, please describe a bit how your relevancy ranking works for returned results. What makes an item appear first or on the first page of results? 10. Please describe how “well” your product integrates with the library’s OPAC (in our case, Innovative’s Millennium OPAC). What information about OPAC holdings can be viewed directly in the discovery platform w/o clicking into the catalog and opening a new screen (e.g. call #, availability, enriched content such as table of contents or book covers?) In addition, our OPAC uses “scopes” which allow a user – if they choose – to limit at an outset (prior to a search being conducted) what collection they are searching. In other words, these scopes are location based, not media type based. For our institution, we have a scope for the main library, one for each of our three branch libraries, and a scope for the entire UNLV collection. Would your system be able to incorporate or integrate these pre-existing scopes in an advanced search mode? And/or, could these location based scopes appear as facets which a user could use to drill down a results list? 11. What is your platform’s philosophy in terms of “dead end searches.” Does such a thing exist with your product? Please describe what happens if a user a.) misspells a word b.) searches for a book or journal title / article that our library doesn’t own/license, but that we could acquire through interlibrary loan. Staff “Control” over the End User Interface 12. How “open” is your platform to customization or interface design tweaks desired by the library? Are there any particular aspects that the library can customize with your product that you feel elevate it above your competitors (e.g. defining facet categories; completely redesigning the end-user interface with colors, links, logos; etc.)? What are the major things customizable by the library, and why do you think this is something important that your product offers. 13. How “open” is your platform to porting over to other access points? In other words, provided appropriate technical skills exist, can we easily embed the search box for your product into a different webpage? Could we create a “smaller,” more streamlined version of your interface for smartphone access? Overarching Question 14. In summary, what are some of the chief differentiators of your product from the competition? Why is your product the best and most worthy of serious consideration? ABSTRACT INTRODUCTION Why Web-scale Discovery? Q: If you could provide one piece of advice to your library, what would it be? The Internal Academic Library Perspective: Genesis of the UNLV Libraries Discovery Task Force The following sections of this article begin with a focus on the internal UNLV Library perspective—from early discussions focused on the broad topic of discovery to establishing a task force charged to identify, research, evaluate, and recommend a pot... Activity: Understanding Web-scale Activity: Initial Staff Survey Table 1. Web-scale Discovery Service Capabilities Activity: Second Staff Survey Activity: Early Adopter References Activity: Vendor Identification Activity: Vendor Investigations Activity: Product Development Tracking Activity: Recommendation Next Steps REFERENCES 1928 ---- Editor’s Comments Bob Gerrity INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2012 4 Welcome to the first issue of Information Technology and Libraries (ITAL) as an open-access, e- only publication. As announced to LITA members in early January, this change in publishing model will help ensure the long-term viability of ITAL by making it more accessible, more current, more relevant, and more environmentally friendly. ITAL will continue to feature high-quality articles that have undergone a rigorous peer-review process, but it will also begin expanding content to include more case studies, commentary, and information about topics and trends of interest to the LITA community and beyond. Look for a new scope statement for ITAL shortly. We’re pleased to include in this issue the winning paper from the 2011 LITA/Ex Libris Student Writing Award contest, Abigail McDermott’s overview on copyright law. We also have two lengthier-than-usual studies on library discovery services. The first, Jason Vaughan’s overview of his library’s investigations into web-scale discovery options, was accepted for publication more than a year ago, but due to its length did not make it into “print” until now, since we no longer face the constraints associated with the production of a print journal. The second study, by Jody Condit Fagan and colleagues at James Madison University, focuses on discovery-tool usability. Jimmy Ghaphery and Erin White provide a timely overview of the results of their surveys on the use and management of web-based research guides. Tomasz Neugebauer and Bin Han offer a strategy and workflow for batch importing electronic theses and dissertations (ETDs) into an EPrints repository. With the first open-access, e-only issue launched, our attention will be turned to updating and improving the ITAL website and expanding the back content available. Our goal is to have all of the back issues of both ITAL and its predecessor, Journal of Library Automation (JOLA), openly available from the ITAL site. We’ll also be exploring ways to better integrate the ITALica blog and the ITAL preprints site with the main site. Suggestions and feedback are welcome, at the e-mail address below. Bob Gerrity (robert.gerrity@bc.edu) is Associate University Librarian for Information Technology, Boston College Libraries, Chestnut Hill, Massachusetts. 3124 ---- 162 iNFOrMAtiON tecHNOlOGY AND liBrAries | DeceMBer 2010 Within that goal are two strategies that lend them- selves to the topics including playing a role with the Office for Information Technology Policy (OITP) with regard to technology related public policy and actively participating in the creation and adoption of international standards within the library community. Colby Riggs (University of California–Irvine) rep- resents LITA on the Office for Information Technology Policy Advisory Committee. She also serves on the LITA Technology Access Committee, which addresses similar issues. The committee is chaired by Elena M. Soltau (Nova Southeastern University). The Standards Interest Group is chaired by Anne Liebst (Linda Hall Library of Science, Engineering, and Technology). Yan Han (University of Arizona) chairs the Standards Task Force, which was charged to explore and recommend strategies and initiatives LITA can implement to become more active in the creation and adoption of new technol- ogy standards that align with the library community. The task force will submit their final report before the 2011 ALA Midwinter Meeting. For ongoing information about LITA committees, interest groups, task forces, and activities being imple- mented on these and related topics, be sure to check out ALA Connect (http://connect.ala.org/) and the LITA website (http://www.lita.org). The LITA electronic dis- cussion list is there to pose questions you might have. LITA members have an opportunity to advocate and participate in a leadership role as the broadband initia- tive sets the infrastructure for the next ten to fifteen years. LITA members are encouraged to pursue these opportu- nities to ensure a place at the table for LITA, its members, and libraries. B y now, most LITA members have likely heard about the Broadband Technology Opportunities Program (BTOP) and the National Broadband Plan. The federal government is allocating grants to the states to develop their broadband infrastructure, and libraries are receiving funding to implement and expand computing in their local facilities. By September 30, 2010, the National Telecommunications and Information Administration (NTIA) will have made all BTOP awards. Information about these initiatives can be found at www2.ntia.doc.gov (BTOP), www.broadband.gov (National Broadband Plan), and www.ala.org/ala/aboutala/offices/oitp/index.cfm (ALA Office for Information Technology Policy). On September 21, 2010, a public forum was held in Silicon Valley to discuss E-Rate modernization and innovation in education. The conversation addressed the need to prepare schools and public libraries for broad- band. Information about the forum is archived at blog .broadband.gov. Established in 1996, the E-Rate program has provided funding for K–12 schools and public librar- ies for telecommunications and Internet access. The program was successful in a dial-up world. It is time to now address broadband access which is not ubiquitous on a national basis. While the social norm suggests that technology is everywhere and everyone has the skills to use it, there is still plenty of work left to do to ensure that people can use technology and compete in an increasingly digital and global world. How does LITA participate? The new strategic plan includes an advocacy and policy goal that calls for LITA to advocate for and participate in the adoption of legislation, policies, technologies, and standards that promote equitable access to information and technology. Karen J. starr (kstarr@nevadaculture.org) is liTa President 2010–11 and assistant administrator for library and Develop- ment Services, nevada State library and archives, carson city. Karen J. Starr President’s Message: BTOP, Broadband, E-Rate, and LITA 3125 ---- eDitOriAl | truitt 163 ■■ The Space in Between In my opinion, ITAL has an identity crisis. It seems to try in many ways to be scholarly like JASIST, but LITA simply isn’t as formal a group as ASIST. On the other end of the spectrum, Code4Lib is very dynamic, infor- mal and community-driven. ITAL kind of flops around awkwardly in the space in between. —comment by a respondent to ITAL’s reader survey, December 2009 Last December and January, you, the readers of Information Technology and Libraries were invited to participate in a survey aimed at helping us to learn your likes and dis- likes about ITAL, and where you’d like to see this journal go in terms of several important questions. The responses provide rich food for reflection about ITAL, its readers, what we do well and what we don’t, and our future directions. Indeed, we’re still digesting and discussing them, nearly a year after the survey. I’d like to use some of my editorial space in this issue to introduce, provide an overview, and highlight a few of the most interesting results. I strongly encourage you to access the full survey results, which I’ve posted to our weblog ITALica (http:// ital-ica.blogspot.com/); I further invite you to post your own thoughts there about the survey results and their meaning. We ran the survey from mid-December to mid-January. A few responses trickled in as late as mid-February. The survey invitation was sent to the 2,614 LITA personal mem- bers; nonmembers and ITAL subscribers (most of whom are institutions) were excluded. We ultimately received 320 responses—including two from individuals who con- fessed that they were not actually LITA members—for a response rate of 12.24 percent. Thus the findings reported below reflect the views of those who chose to respond to the survey. The response rate, while not optimal, is not far from the 15 percent that I understand LITA usually expects for its surveys. As you may guess, not all respondents answered all questions, which accounts for some small discrepancies in the numbers reported. Who are we? In analyzing the survey responses, one of the first things one notices is the range and diversity of ITAL’s reader base, and by extension, of LITA’s membership. The larg- est groups of subscribers identify themselves either as traditional systems librarians (58, or 18.2 percent) or web services/development librarians (31, or 9.7 percent), with a further cohort of 7.2 percent (23) composed of those working with electronic resources or digital projects. But more than 20 percent (71) come from the ranks of library directors and associate directors. Nearly 15 percent (47) identify their focus as being in the areas of reference, cataloguing, acquisitions, or collection development. See figure 1. The bottom line is that more than a third of our read- ers are coming from areas outside of library IT. A couple of other demographic items: ■■ While nearly six in ten respondents (182, or 57.6 percent) work in academic libraries, that still leaves a sizable number (134, or 42.3 percent) who don’t. More than 14 percent (45) of the total 316 respondents come from the public library sector. ■■ Nearly half (152, or 48.3 percent) of our readers indi- cated that they have been with LITA for five years or fewer. Note that this does not necessarily indicate the age or number of years of service of the respondents, but it’s probably a rough indicator. Still, I confess that this was something of a surprise to me, as I expected larger numbers of long-time members. And how do the numbers shake out for us old geezers? The 6–10 and greater-than-15-years cohorts each composed about 20 percent of those responding; interestingly, only 11.4 percent (36) answered that they’d been LITA members for between 11 and 15 years. Assuming that these numbers are an accurate reflection of LITA’s membership, I can’t help but wonder about the expla- nation for this anomaly.” See figure 2. How are we doing? Question 4 on the survey asked readers to respond to several statements: “it is important to me that articles in ITAL are peer- reviewed.” More than 75 percent (241, or 77.2 percent) answered that they either “agreed” or “strongly agreed.” “ITAL is timely.” More than seven in ten respondents (228, or 73.0 percent) either “agreed” or “strongly agreed” that ITAL is timely. Only 27 (8.7 percent) disagreed. As a technology-focused journal, where time-to-publication is always a sensitive issue, I expected more dissatisfaction on this question (and no, that doesn’t mean that I don’t worry about the nine percent who believe we’re too slow out of the gate). Marc Truitt Editorial: The Space in Between, or, Why ITAL Matters Marc truitt (marc.truitt@ualberta.ca) is associate university librarian, Bibliographic and information Technology Services, university of alberta libraries, Edmonton, alberta, canada, and Editor of ITAL. 164 iNFOrMAtiON tecHNOlOGY AND liBrAries | DeceMBer 2010 would likely quit LITA, with narrative explanations that clearly underscore the belief that ITAL—especially a paper ITAL—is viewed by many as an important benefit of membership. The following comments are typical: ■■ “LITA membership would carry no benefits for me.” ■■ “Dues should decrease, though.” [from a respon- dent who indicated he or she would retain LITA “i use information from ITAL in my work and/ or i find it intellectually stimulating.” By a nearly identical margin to that regarding timeliness, ITAL readers (226, or 72.7 percent) either “agreed” or “strongly agreed” that they use ITAL in their work or find its contents stimulating. “ITAL is an important benefit of litA mem- bership.” An overwhelming majority (248, or 79.78 percent) of respondents either “agreed” or “strongly agreed” with this statement.1 This perception clearly emerges again in responses to the questions about whether readers would drop their LITA membership if we produced an electronic-only or open-access ITAL (see below). Where should we be going? Several questions sought your input about different options for ITAL as we move for- ward. Question 7, for example, asked you to rank how frequently you access ITAL content via several channels, with the choices being “print copy received via membership,” “print copy received by your institution/library,” “electronic copy from the ITAL website,” or “electronic copy accessed via an aggrega- tor service to which your institution/library subscribes (e.g., Ebsco).” The choice most fre- quently accessed was the print copy received via membership, at 81.1 percent (228). Question 8 asked about your preferences in terms of ITAL’s publication model. Of the 307 responses, 60.6 percent (186) indicated a preference for continuance of the present arrangement, whereby we publish both paper and electronic versions simultaneously. Four in ten respondents preferred that ITAL move to publication in electronic version only.2 Of those who favored continued availability of paper, the great majority (159, or 83.2 per- cent) indicated in question 9 that they simply preferred reading ITAL in paper. Those who advocate moving to electronic-only do so for more mixed reasons (question 10), the most popular being cost-effectiveness, timeliness, and the environmen- tal friendliness of electronic publication. A final question in this section asked that you respond to the statement “If ITAL were to become an electronic-only publication I would continue as a dues-paying member of LITA.” While a reassuring 89.8 percent (273) of you answered in the affirmative, 9.5 percent (29) indicated that you Figure 2. Years of LITA Membership Figure 1. Professional Position of LITA Members 18.2% (58) 0.3% (1) 0.6% (2) 0.6% (2) 0.9% (3) 2.2% (7) 2.5% (8) 3.1% (10) 4.1% (13) 4.4% (14) 6.3% (20) 7.9% (25) 9.4% (30) 9.7% (31) 12.9 % (41) 16.7% (53) 0% 5% 10% 15% 20% Systems Librarian (includes responsibility for ILS, servers, workstat... Other (please specify) Library Director Web Services/Development Librarian Deputy/Associate/Assistant Director Reference Services Librarian Cataloging Librarian Consortium/Network/Vendor Librarian Electronic Resources Librarian Digital Projects/Digitization Librarian Student Teaching Faculty Computing Professional (non-MLS) Resource Sharing Librarian Acquisitions/Collection Development Librarian Other Library Staff (non-MLS) 11.4% (36) 19.7% (62) 20.0% (63) 48.3% (152) 0% 10% 20% 30% 40% 5 years or less 11–15 years 6–10 years more than 15 years eDitOriAl | truitt 165 his lipstick-on-a-pig ILS. Somewhere else there’s a library blogger who fends off bouts of insomnia by reading “wonky” ITAL papers in the wee hours of the morning. And that ain’t the half of it, as they say. In short—in terms of readers, interests, and prefer- ences—“the space in between” is a pretty big niche for ITAL to serve. We celebrate it. And we’ll keep trying our best to serve it well. ■■ Departures As I write these lines in late-September, it’s been a sad few weeks for those of us in the ITAL family. In mid-August, former ITAL editor Jim Kopp passed away following a battle with cancer. Last week, Dan Marmion—Jim’s suc- cessor as editor (1999–2004)—and a dear friend to many of us on the current ITAL editorial board—also left us, the victim of a malignant brain tumor. I never met Jim, but LITA President Karen Starr eulogized him in a posting to LITA-L on August 16, 2010.3 I noted Dan’s retirement due to illness in this space in March.4 I first met Dan in the spring of 2000, when he arrived at Notre Dame as the new associate director for Information Systems and Digital Access (I think the position was dif- ferently titled then) and, incidentally, my new boss. Dan arrived only six weeks after my own start there. Things at Notre Dame were unsettled at the time: the Libraries had only the year before successfully implemented ExLibris’ Aleph500 ILS, the first North American site to do so. While ExLibris moved on to implementations at McGill and the University of Iowa, we at Notre Dame struggled with the challenges of supporting and upgrading a system then new to the North American market. It was not always easy or smooth, but throughout, Dan always maintained an unflappable and collegial manner with ExLibris staff and a quiet but supportive demeanor toward those of us who worked for him. I wish I could say that I understood and appreciated this better at the time, but I can’t. I still had some growing ahead of me—I’m sure that I still do. Dan was there for me again as an enthusiastic refer- ence when I moved on, first to the University of Houston in 2003 and then to the University of Alberta three years later. In these jobs I’d like to think I’ve come to under- stand a bit better the complex challenges faced by senior managers in large research libraries; in the process, I know I’ve come to appreciate Dan’s quiet, knowledge- able, and hands-off style with department managers. It is one I’ve tried (not always successfully) to cultivate. While I was still at Notre Dame, Dan invited me to join the editorial board of Information Technology and Libraries, a group which over the years has come to include many “Friends of Dan,” including Judith Carter (quite possibly the world’s finest managing editor), Andy Boze (ITAL’s membership] ■■ “ITAL is the major benefit to me as we don’t have funds for me to attend LITA meetings or training sessions.” ■■ “The paper journal is really the only membership benefit I use regularly.” ■■ “Actually my answer is more, ‘I don’t know.’ I really question the value of my LITA membership. ITAL is at least some tangible benefit I receive. Quite hon- estly, I don’t know that there really are other benefits of LITA membership.” Question 12 asked about whether ITAL should con- tinue with its current delayed open-access model (i.e., the latest two issues embargoed for non-LITA members), or go completely open-access. By a three-to-two margin, readers favored moving to an open-access model for all issues. In the following question that asked whether respondents would continue or terminate LITA mem- bership were ITAL to move to a completely open-access publication model, the results were remarkably similar to those for the question linking print availability to LITA membership, with the narrative comments again suggest- ing much the same underlying reasoning. In sum, the results suggest to me more satisfaction with ITAL than I might have anticipated; at the same time, I’ve only scratched the surface in my comments here. The narrative answers in particular—which I have touched on in only the most cursory fashion—have many things to say about ITAL’s “place,” suggestions for future articles, and a host of other worthy ideas. There is as well the whole area of crosstabbing: some of the questions, when analyzed with reference to the demographic answers in the beginning of the survey, may highlight entirely new aspects of the data. Who, for instance, favors continuance of a paper ITAL, and who prefers electronic-only? But to come back to that reader’s comment about ITAL and “the space in between” that I used to frame this discussion (indeed, this entire column): to me, the demographic responses—which clearly show ITAL has a substantial readership outside of library IT—suggest that that “space in between” is precisely where ITAL should be. We may or may not occupy that space “awkwardly,” and there is always room for improvement, although I hope we do better than “flop around”! The results make clear that ITAL’s readers—who would be you!—encompass the spectrum from the tech-savvy early-career reader of Code4Lib Journal (electronic-only, of course!) to the library administrator who satisfies her need for technol- ogy information by taking her paper copy of ITAL along when traveling. Elsewhere on that continuum, there are reference librarians and catalogers wondering what’s new in library technology, and a traditional systems librarian pondering whether there is an open-source discovery solution out there that might breathe some new life into 166 iNFOrMAtiON tecHNOlOGY AND liBrAries | DeceMBer 2010 between membership and receiving the journal. Many of them appear to infer that a portion of their LITA dues, then, are ear- marked for the publication and mailing of ITAL. Sadly, this is not the case. In years past, ITAL’s income from advertising paid the bills and even generated additional revenue for LITA coffers. Today, the shoe is on the other foot because of declining advertis- ing revenue, but ITAL is still expected to pay its own way, which it has failed to do in recent years. But to those who reasonably believe that some portion of their dues is dedicated to the sup- port of ITAL, well, t’ain’t so. Bothered by this? Complain to the LITA board. 2. As a point of comparison, consider the following results from the 2000 ITAL reader survey. Respondents were asked to rank several publishing options on a scale of 1 to 3 (with 1 = most preferred option and 3 = least preferred option): ITAL should be published simultaneously as a print-on- paper journal and an electronic journal (N = 284): 1 = 169 (59.5%); 2 = 93 (32.7%); 3 = 22 (7.7%) ITAL should be published in an electronic form only (N = 293): 1 = 55 (18.8%); 2 = 61 (20.8%); 3 = 177 (60.4%) In other words, then as now, about 60% of readers preferred paper and electronic to electronic-only. 3. Karen Starr, “FW: [Libs-Or] Jim Kopp: Celebration of Life,” online posting, Aug. 16, 2010, LITA-L, http://lists.ala. org/sympa/arc/lita-l/2010-08/msg00079.html (accessed Sept. 29, 2010). 4. Marc Truitt, “Dan Marmion,” Information Technology & Libraries 29 (Mar. 2010): 4, http://www.ala.org/ala/mgrps/ divs/lita/ital/292010/2901mar/editorial_pdf.cfm (accessed Sept. 29, 2010). webmaster), and Mark Dehmlow. While Dan left ITAL in 2004, I think that he left the journal a wonderful and last- ing legacy in these extremely capable and dedicated folks. My fondest memories of Dan concern our shared pas- sion for model trains. I remember visiting a train show in South Bend with him a couple of times, and our last time together (at the ALA Midwinter Meeting in Denver two years ago) was capped by a snowy trek with ExLibris’ Carl Grant, another model train enthusiast, to the Mecca of model railroading, Caboose Hobbies. Three boys off to see their toys—oh, exquisite bliss! I don’t know whether ITAL or its predecessor JOLA have ever reprinted an editorial, but while searching the archives to find something that would honor both Jim and Dan, I found a piece that I hope speaks eloquently of their contributions and to ITAL’s reason for being. Dan’s edito- rial, “Why Is ITAL Important?” originally published in our June 2002 issue, appears again immediately following this column. I think its message and the views expressed therein by Jim and Dan remain as valid today as they were in 2002. They also may help to frame my comments concerning our reader survey in the previous section. Farewell, Jim and Dan. You will both be sorely missed. Notes and References 1. A number of narrative answers to the survey make it clear that ITAL readers who are LITA members perceive a link 3126 ---- eDitOriAl | MArMiON 167 Dan MarmionEditorial: Why Is ITAL Important? Editor’s Note: What follows is a reprint of Dan Marmion’s editorial from ITAL 20, no. 2 (2001), http://www.ala.org/ ala/mgrps/divs/lita/ital/2002editorial.cfm. After reading, we ask you to consider: Why does ITAL matter to you? Post your thoughts on ITALica (http://ital-ica.blogspot .com/). S ome time ago I received an e-mail from a library school student, who asked me “Why is [ITAL] important in the library profession?” I answered the question in this way: ITAL is important to the library profession for at least four reasons. First, while it is no longer the only publication that addresses the use of technology in the library profession, it is the oldest (dating back to 1968, when it was founded as the Journal of Library Automation) and, we like to think, most distinguished. Second, not only do we publish on a myriad of topics that are pertinent to technology in libraries, we publish at least three kinds of articles on those subjects: pure scholarly articles that give the results of empirical research done on topics of importance to the profes- sion, communications from practitioners in the field that present real-world experiences from which other librarians can profit, and tutorials on specific subjects that teach our readers how to do useful things that will help them in their everyday jobs. The book and software reviews that are in most issues are added bonuses. Third, it is the “official” publication of LITA, the only professional organization devoted to the use of information technology in the library profession. Fourth, it is a scholarly, peer-reviewed journal, and as such is an important avenue for many academic librar- ians whose career advancement depends in part on their ability to publish in this type of journal. In a sen- tence, then, ITAL is important to the library profession because it contributes to the growth of the profession and its professionals. After sending my response, I thought it would be interesting to see what some other people with close asso- ciations to the journal would add. Thus I posed the same question to the editorial board and to the person who preceded me as editor. Here are some of their comments: One of the many things that was not traditionally taught in library school was a systematic approach to problem solving—for somebody who needs to acquire this skill and doesn’t have a mentor handy, ITAL is a wonderful resource. Over and over again, ITAL describes how a problem was identified and defined, explains the techniques used to investigate it, and details the conclusions that might fairly be drawn from the results of the investigation. Few other journals so effectively model this approach. Regardless of the specific subject of the article, the opportunity to see practical problem solving techniques demonstrated is always valuable. (Joan Frye Williams) The one thing I would add to your points, and it ties into a couple of them, is that by some definitions a “profession” is one that does have a major publica- tion. As such, it is not only the “official” publication of LITA but an identity focus for those professionals in this particular area of librarianship. In fact, ideally, I would like to think that’s more of a reason why ITAL is important than just the fact that it’s a perk of LITA membership. (Jim Kopp) Real world experiences from which other librarians would profit—to use your own words. That is my primary reason for reading it, although I take note of tutorials as well. And the occasional book review here may catch my eye as it is likely more detailed that what might appear in LJ or Booklist, and [I would] be more likely to purchase it for either my office or for the gen- eral collection. (Donna Cranmer) ITAL begins as the oldest and best-established journal for refereed scholarly work in library automation and information technology, a role that by itself is impor- tant to libraries and the library profession. ITAL goes beyond that role to add high-quality work that does not fit in the refereed-paper mold, helping librarians to work more effectively. As the official publication of America’s largest professional association for library and information technology, ITAL assures a broad audience for important work—and, thanks to its cost- recovery subscription pricing, ITAL makes that work available to nonmembers at prices far below the norm for scholarly publishing. (Walt Crawford) The journal serves as an historical record/documen- tation and joins its place with many other items that together record the history of mankind. A profes- sional/scholarly journal has a presumed life that lasts indefinitely. (Ken Bierman) In a sentence, ITAL is important to the profession because “Communication is the key to our success.” Dan Marmion was editor of iTal, 1999–2004. This editorial was first published in the June 2002 issue of ITAL. 168 iNFOrMAtiON tecHNOlOGY AND liBrAries | DeceMBer 2010 to paper. ITAL provides one means of fostering this communication in a format that is easily usable and recognizable. It is not the only communications format, but it fills a particular niche. (Eric Lease Morgan) So there you have the thoughts of the editor and a few other folks as to why this journal is important. * * * Why does ITAL matter to you? Post your thoughts on ITALica (http://ital-ica.blogspot.com/). ITAL is a formal, traditional, and standardized way of sharing ideas within a specific segment of the library community. Librarianship is an institutional profession. As an institution it is an organic organiza- tion requiring communication between its members. An advantage of written communication, especially paper-based written communication, is its ability to transcend space and time. A written document can communicate an idea long after the author has died and half way around the world. Yes, electronic com- munication can do the same thing, but electronic com- munication is much more fragile than ideas committed 3127 ---- eDitOriAl BOArD tHOuGHts | FArNel 169 T his past Spring, my alma mater, the School of Library and Information Studies (SLIS) at the University of Alberta, restructured the IT compo- nent of its MLIS program. As a result, as of September 2010, incoming students are expected to possess certain basic IT skills before beginning their program.1 These skills include the following: ■■ Comprehension of the components and operations of a personal computer ■■ Microsoft Windows file management ■■ Proficiency with Microsoft Office (or similar) prod- ucts, including word processing and presentation software ■■ Use of e-mail ■■ Basic Web browsing and searching This new requirement got me thinking: Is this com- mon practice among ALA-accredited Library Schools? If other schools are also requiring basic IT skills prior to entry, how do those required by SLIS compare? So I thought I’d do a little investigating to see what others in “Library School land” are doing. Before I continue, a word of warning: this was by no means a rigorous sci- entific investigation, but rather an informal survey of the landscape. I started my investigation with ALA’s directory of institutions offering accredited master’s programs.2 There are fifty-seven institutions listed in the directory. I visited each institution’s website and looked for pages describing technology requirements, computer-competency require- ments, and the like. If I wasn’t able to find the desired information after fifteen or twenty minutes, I would note “nothing found” and move on to the next. In the end I found some sort of list of technology or computer-competency requirements on thirty-three (approximately 58 percent) of the websites. It may be the case that such a list exists on other sites and I didn’t find it. I should also note that five of the lists I found focus more on software and hardware than on skills in using said software and hardware. Even considering these conditions, however, I was somewhat surprised at the low numbers. Is it simply assumed that today’s students already have these skills? Or is it expected that they will be picked up along the way? I don’t claim to know the answers, and discovering them would require a much more detailed and thorough investigation, but they are interesting questions nonetheless. Once I had found the requirements, I examined them in some detail to get a sense of the kinds of skills listed. While I won’t enumerate them all, I did find the most common ones to be similar to those required by SLIS— basic comfort with a personal computer and proficiency with word processing and presentation software, e-mail, file management, and the Internet. A few (5) schools also list comfort with local systems (e-mail accounts, online courseware, etc.). Several (7) schools mention familiarity with basic database design and functionality, while a few (5) list basic Web design. Very few (3) mention compe- tency with security tools (firewalls, virus checkers, etc.), and just slightly more (4) mention familiarity with Web 2.0 tools like blogs, wikis, etc. While many (14) specifi- cally mention searching under basic Internet skills, few (7) mention proficiency with OPACs or other common infor- mation tools such as full-text databases. Interestingly, one school has a computer programming requirement, with mentions of specific acceptable languages, including C++, PASCAL, Java, and Perl. But this is certainly the exception rather than the rule. I was encouraged that there seems to be a certain agreement on the basics. But I was a little surprised at the relative rarity of competency with wikis and blogs and all those Web 2.0 tools that are so often used and talked about in today’s libraries. Is this because there is still some uncertainty as to the utility of such tools in libraries? Or is it because of a belief that the members of the Millennial or “digital” generation are already expert in using them? I don’t know the reasons, but it is interesting to ponder nonetheless. I was also surprised that a level of informa- tion literacy isn’t listed more often, particularly given that we’re talking about SLIS programs. I do know, of course, that many of these skills will be developed or enhanced as students work their way through their programs, but it also seems to me that there is so much other material to learn that the more that can be taken care of beforehand, the better. Librarians work in a highly technical and techno- logical environment, and this is only going to become even more the case for future generations of librarians. Certainly, basic familiarity with a variety of applications and tools and comfort with rapidly changing technologies are major assets for librarians. In fact, ALA recognizes the importance of “technological knowledge and skills” as core competencies of librarianship. Specifically men- tioned are the following: ■■ Information, communication, assistive, and related technologies as they affect the resources, service delivery, and uses of libraries and other information agencies. ■■ The application of information, communication, assistive, and related technology and tools consistent with professional ethics and prevailing service norms and applications. ■■ The methods of assessing and evaluating the Sharon Farnel Editorial Board Thoughts: System Requirements sharon Farnel (sharon.farnel@ualberta.ca) is Metadata & cata- loguing librarian at the university of alberta in Edmonton, al- berta, canada. 170 iNFOrMAtiON tecHNOlOGY AND liBrAries | DeceMBer 2010 References 1. University of Alberta School of Library and Information Studies, “Degree Requirements: Master of Library & Information Studies,” www.slis.ualberta.ca/mlis_degree_requirements.cfm (accessed Aug. 5, 2010). 2. American Library Association Office for Accreditation, “Library & Information Studies Directory of Institutions Offer- ing Accredited Master’s Programs 2008–2009,” 2008, http:// ala.org/ala/educationcareers/education/accreditedprograms/ directory/pdf/lis_dir_20082009.pdf (accessed Aug. 5, 2010). 3. American Library Association, “ALA’s Core Competences of Librarianship,” January 2009, www.ala.org/ala/education careers/careers/corecomp/corecompetences/finalcorecomp stat09.pdf (accessed Aug. 5, 2010). specifications, efficacy, and cost efficiency of technol- ogy-based products and services. ■■ The principles and techniques necessary to identify and analyze emerging technologies and innovations in order to recognize and implement relevant techno- logical improvements.3 Given what we know about the importance of tech- nology to librarians and librarianship, my investigation has left me with two questions: (1) why aren’t more library schools requiring certain IT skills prior to entry into their programs? and (2) are those who do require them asking enough of their prospective students? I hope you, our readers, might ask yourselves these questions and join us on ITALica for what could turn out to be a lively discussion. 3128 ---- GeNerAtiNG cOllABOrAtive sYsteMs FOr DiGitAl liBrAries | MAliziA, BOttONi, AND leviAlDi 171 from previous experience and from research in software engineering. Wasted effort and poor interoperability can therefore ensue, raising the costs of DLs and jeopardizing the fluidity of information assets in the future. In addition, there is a need for modeling services and data structures as highlighted in the “Digital Library Reference Model” proposed by the DELOS EU network of excellence (also called the “DELOS Manifesto”);2 in fact, the distribution of DL services over digital networks, typically accessed through Web browsers or dedicated clients, makes the whole theme of interaction between users important, for both individual usage and remote collaboration. Designing and modeling such interactions call for considerations pertaining to the fields of human– computer interaction (HCI) and computer-supported cooperative work (CSCW). As an example, scenario- based or activity-based approaches developed in the HCI area can be exploited in DL design. To meet these needs we developed CRADLE (Cooperative-Relational Approach to Digital Library Environments),3 a metamodel-based Digital Library Management System (DLMS) supporting collaboration in the design, development, and use of DLs, exploiting patterns emerging from previous projects. The entities of the CRADLE metamodel allow the specification of col- lections, structures, services, and communities of users (called “societies” in CRADLE) and partially reflect the DELOS Manifesto. The metamodel entities are based on existing DL taxonomies, such as those proposed by Fox and Marchionini,4 Gonçalves et al.,5 or in the DELOS Manifesto, so as to leverage available tools and knowl- edge. Designers of DLs can exploit the domain-specific visual language (DVSL) available in the CRADLE envi- ronment—where familiar entities extracted from the referred taxonomies are represented graphically—to model data structures, interfaces and services offered to the final users. The visual model is then processed and transformed, exploiting suitable templates, toward a set of specific languages for describing interfaces and services. The results are finally transformed into platform- independent (Java) code for specific DL applications. CRADLE supports the basic functionalities of a DL through interfaces and service templates for managing, browsing, searching, and updating. These can be further specialized to deploy advanced functionalities as defined by designers through the entities of the proposed visual The design and development of a digital library involves different stakeholders, such as: information architects, librarians, and domain experts, who need to agree on a common language to describe, discuss, and negoti- ate the services the library has to offer. To this end, high-level, language-neutral models have to be devised. Metamodeling techniques favor the definition of domain- specific visual languages through which stakeholders can share their views and directly manipulate representations of the domain entities. This paper describes CRADLE (Cooperative-Relational Approach to Digital Library Environments), a metamodel-based framework and visual language for the definition of notions and services related to the development of digital libraries. A collection of tools allows the automatic generation of several services, defined with the CRADLE visual language, and of the graphical user interfaces providing access to them for the final user. The effectiveness of the approach is illustrated by presenting digital libraries generated with CRADLE, while the CRADLE environment has been evaluated by using the cognitive dimensions framework. D igital libraries (DLs) are rapidly becoming a pre- ferred source for information and documentation. Both at research and industry levels, DLs are the most referenced sources, as testified by the popularity of Google Books, Google Video, IEEE Explore, and the ACM Portal. Nevertheless, no general model is uni- formly accepted for such systems. Only few examples of modeling languages for developing DLs are available,1 and there is a general lack of systems for designing and developing DLs. This is even more unfortunate because different stakeholders are interested in the design and development of a DL, such as information architects, to librarians, to software engineers, to experts of the spe- cific domain served by the DL. These categories may have contrasting objectives and views when deploying a DL: librarians are able to deal with faceted categories of documents, taxonomies, and document classification; software engineers usually concentrate on services and code development; information architects favor effective- ness of retrieval; and domain experts are interested in directly referring to the content of interest without going through technical jargon. Designers of DLs are most often library technical staff with little to no formal training in software engineering, or computer scientists with little background in the research findings of hypertext infor- mation retrieval. Thus DL systems are usually built from scratch using specialized architectures that do not benefit Alessio Malizia (alessio.malizia@uc3m.es) is associate Profes- sor, universidad carlos iii, Department of informatics, Madrid, Spain; Paolo Bottoni (bottoni@di.uniroma1.it) is associate Pro- fessor and s. levialdi (levialdi@di.uniroma1.it) is Professor, “Sa- pienza” university of rome, Department of computer Science, rome, italy. Alessio Malizia, Paolo Bottoni, and S. Levialdi Generating Collaborative Systems for Digital Libraries: a Model-Driven Approach 172 iNFOrMAtiON tecHNOlOGY AND liBrAries | DeceMBer 2010 a formal foundation for digital libraries, called 5S, based on the concepts of streams, (data) structures, (resource) spaces, scenarios, and societies. While being evidence of a good modeling endeavor, the approach does not specify formally how to derive a system implementation from the model. The new generation of DL systems will be highly dis- tributed, providing adaptive and interoperable behaviour by adjusting their structure dynamically, in order to act in dynamic environments (e.g., interfacing with the physical world).13 To manage such large and complex systems, a systematic engineering approach is required, typically one that includes modeling as an essential design activity where the availability of such domain-specific concepts as first-class elements in DL models will make application specification easier.14 While most of the disciplines related to DLs—e.g., databases,15 information retrieval,16 and hypertext and multimedia17—have underlying formal models that have properly steered them, little is available to formalize DLs per se. Wang described the structure of a DL system as a domain-specific database together with a user interface for querying the records stored in the database.18 Castelli et al. present an approach involving multidimensional query languages for searching information in DL systems that is based on first-order logic.19 These works model metadata specifications and thus are the main examples of system formalization in DL environments. Cognitive models for information retrieval, as used for example by Oddy et al.,20 focus on users’ information-seeking behav- ior (i.e., formation, nature, and properties of a users’ information need) and on how information retrieval sys- tems are used in operational environments. Other approaches based on models and languages for describing the entities involved in a DL are the Digital Library Definition Language,21 the DSpace data model22 (with the definitions of communities and workflow mod- els), the Metis Workflow framework,23 and the Fedora structoid approach.24 E/R approaches are frequently used for modeling database management system (DBMS) applications,25 but as E/R diagrams only model the static structure of a DBMS, they generally do not deal deeply with dynamic aspects. Temporal extensions add dynamic aspects to the E/R approach, but most of them are not object-oriented.26 The advent of object-oriented technol- ogy calls for approaches and tools to information system design resulting in object-oriented systems. These consid- erations drove research toward modeling approaches as supported by UML.27 However, since the UML metamodel is not yet wide- spread in the DL community, we adopted the E/R formalism and complemented it with the specification of the dynamics made available through the user interface, as described by Malizia et al.28 Using the metamodel, we have defined a DSVL, including basic entities and language. CRADLE is based on the entity-relationship (E/R) formalism, which is powerful and general enough to describe DL models and is supported by many tools as a metamodeling language. Moreover, we observed that users and designers involved in the DL environment, but not coming from a software engineering background, may not be familiar with advanced formalism like unified modeling language (UML), but they usually have basic notions on database management systems, where E/R is largely employed. ■■ Literature Review DLs are complex information systems involving technolo- gies and features from different areas, such as library and information systems, information retrieval, and HCI. This interdisciplinary nature is well reflected in the various definitions of DLs present in the literature. As far back as 1965, Licklider envisaged collections of digital versions of scanned documents accessible via interconnected com- puters.6 More recently, Levy and Marshall described DLs as sets of collections of documents, together with digital resources, accessible by users in a distributed context.7 To manage the amount of information stored in such systems, they proposed some sort of user-assisting software agent. Other definitions include not only printed documents, but multimedia resources in general.8 However differ- ent the definitions may be, they all include the presence of collections of resources, their organization in struc- tured repositories, and their availability to remote users through networks (as discussed by Morgan).9 Recent efforts toward standardization have been taken by public and private organizations. For example, a Delphi study identified four main ingredients: an organized collection of resources, mechanisms for browsing and searching, a distributed networked environment, and a set of objec- tified services.10 The President’s Information Technology Advisory Committee (PITAC) Panel on Digital Libraries sees DLs as the networked collections of digital text, doc- uments, images, sounds, scientific data, and software that make up the core of today’s Internet and of tomorrow’s universally accessible digital repositories of all human knowledge.11 When considering DLs in the context of distributed DL environments, only few papers have been produced, contrasting with the huge bibliography on DLs in gen- eral. The DL Group at the Universidad de las Américas Puebla in Mexico introduced the concept of personal and group spaces, relevant to the CSCW domain, in the DL system context.12 Users can share information stored in their personal spaces or share agents, thus allowing other users to perform the same search on the document collec- tions in the DL. The cited text by Gonçalves et al. gives GeNerAtiNG cOllABOrAtive sYsteMs FOr DiGitAl liBrAries | MAliziA, BOttONi, AND leviAlDi 173 education as discussed by Wattenberg or Zia.33 In the NSDL program, a new generation of services has been developed that includes support for teaching and learn- ing; this means also considering users’ activities or scenarios and not only information access. Services for implementing personal content delivery and sharing, or managing digital resources and modeling collaboration, are examples of tools introduced during this program. The virtual reference desk (VRD) is emerging as an interactive service based on DLs. With VRD, users can take advantage of domain experts’ knowledge and librar- ians’ experience to locate information. For example, the U.S. Library of Congress Ask a Librarian service acts as a VRD for users who want help in searching information categories or to interact with expert librarians to search for a specific topic.34 The interactive and collaborative aspects of activities taking place within DLs facilitate the development of user communities. Social networking, work practices, and content sharing are all features that influence the technol- ogy and its use. Following Borgmann,35 Lynch sees the future of DLs not in broad services but in supporting and facilitating “customization by community,” i.e., services tailored for domain-specific work practices.36 We also examined the research agenda on system- oriented issues in DLs and the DELOS manifesto.37 The agenda abstracts the DL life cycle, identifying five main areas, and proposes key research problems. In particular we tackle activities such as formal modeling of DLs and their communities and developing frameworks coherent with such models. At the architectural level, one point of interest is to support heterogeneous and distributed systems, in par- ticular networked DLs and services.38 For interoperability, one of the issues is how to support and interoperate with different metadata models and standards to allow distrib- uted cataloguing and indexing, as in the Open Archive Initiative (OAI).39 Finally, we are interested in the service level of the research agenda and more precisely in Web services and workflow management as crucial features when including communities and designing DLs for use over networks and for sharing content. As a result of this analysis, the CRADLE framework features the following: ■■ a visual language to help users and designers when visual modeling their specific DL (without knowing any technical detail apart from learning how to use a visual environment providing diagrams representa- tions of domain specific elements) ■■ an environment integrating visual modeling and code generation instead of simply providing an integrated architecture that does not hide technical details ■■ interface generation for dealing with different users relationships for modeling DL-related scenarios and activities. The need for the integration of multiple lan- guages has also been indicated as a key aspect of the DSVL approach.29 In fact, complex domains like DLs typi- cally consist of multiple subdomains, each of which may require its own particular language. In the current implementation, the definition of DSVLs exploits the metamodeling facilities of AToM3, based on graph-grammars.30 AToM3 has been typically used for simulation and model transformation, but we adopt it here as a tool for system generation. ■■ Requirements for Modeling Digital Libraries We follow the DELOS Manifesto by considering a DL as an organization (possibly virtual and distributed) for managing collections of digital documents (digital con- tents in general) and preserving their images on storage. A DL offers contextual services to communities of users, a certain quality of service, and the ability to apply specific policies. In CRADLE we leave the definition of quality of service to the service-oriented architecture standards we employ and partially model the applicable policy, but we focus here on crucial interactivity aspects needed to make DLs usable by different communities of users. In particular, we model interactive activities and services based on librarians’ experiences in face-to-face communication with users, or designing exchange and integration procedures for communicating between insti- tutions and managing shared resources. While librarians are usually interested in modeling metadata across DLs, software engineers aim at provid- ing multiple tools for implementing services,31 such as indexing, querying, semantics,32 etc. Therefore we pro- vide a visual model useful for librarians and information architects to mimic the design phases they usually per- form. Moreover, by supporting component services, we help software engineers to specify and add services on demand to DL environments. To this end, we use a service component model. By sharing a common language, users from different categories can communicate to design a DL system while concentrating on their own tasks (services development and design for software engineers and DL design for librarians and information architects). Users are modeled according to the Delos Manifesto as DL End-users (subdivided into content creators, content consumers, and librarians), DL Designers (librarians and information archi- tects), DL System Administrators (typically librarians), and DL Application Developers (software engineers). Several activities have been started on modeling domain specific DLs. As an example, the U.S. National Science Digital Library (NSDL) program promotes edu- cational DLs and services for basic and advanced science 174 iNFOrMAtiON tecHNOlOGY AND liBrAries | DeceMBer 2010 ■■ how that information is structured and organized (Structural Model) ■■ the behavior of the DL (Service Model) and the differ- ent societies of actors ■■ groups of services acting together to carry out the DL behavior (Societal Model) Figure 1 depicts the design approach supported by CRADLE architecture, namely, modeling the society of actors and services interacting in the domain-specific scenarios and describing the documents and metadata structure included with the library by defining a visual model for all these entities. The DL is built using a col- lection of stock parts and configurable components that provide the infrastructure for the new DL. This infrastruc- ture includes the classes of objects and relationships that make up the DL, and processing tools to create and load the actual library collection from raw documents, as well as services for searching, browsing, and collection main- tenance. Finally, the code generation module generates tailored DL services code stubs by composing and special- izing components from the component pool. Initially, a DL designer is responsible for formalizing (starting from an analysis of the DL requirements and characteristics) a conceptual description of the DL using metamodel concepts. Model specifications are then fed into a DL generator (written in Python for AToM3), to produce a DL tailored suitable for specific platforms and requirements. After these design phases, CRADLE gener- ates the code for the user interface and the parts of code corresponding to services and actors interacting in the described society. A set of templates for code generation and designers ■■ flexible metadata definitions ■■ a set of interactive integrated tools for user activities with the generated DL system To sum up, CRADLE is a DLMS aimed at supporting all the users involved in the development of a DL system and providing interfaces, data modeling, and services for user-driven generation of spe- cific DLs. Although CRADLE does not yet satisfy all requirements for a generic DL system, it addresses issues focused on developing interactive DL systems, stressing interfaces and communication between users. Nevertheless, we employed standards when possible to leave it open for further specification or enhancements from the DL user community. Extensive use of XML-based languages allows us to change document information depending on implemented recognition algorithms so that expert users can easily model their DL by selecting the best recognition and indexing algorithms. CRADLE evolves from the JDAN (Java-based environ- ment for Document Applications on Networks) platform, which managed both document images and forms on the basis of a component architecture.40 JDAN was based on XML technologies, and its modularity allowed its integra- tion in service-based and grid-based scenarios. It supported template code generation and modeling, but it required the designer to write XML specifications and edit XML schema files in order to model the DL document types and services, thus requiring technical knowledge that should be avoided to let users concentrate on their specific domains. ■■ Modeling Digital Library Systems The CRADLE framework shows a unique combination of features: it is based on a formal model, exploits a set of domain-specific languages, and provides automatic code generation. Moreover, fundamental roles are played by the concepts of society and collaboration.41 CRADLE generates code from tools built after modeling a DL (according to the rules defined by the proposed metamodel) and performs automatic transformation and mapping from model to code to generate software tools for a given DL model. The specification of a DL in CRADLE encompasses four complementary dimensions: ■■ multimedia information supported by the DL (Collection Model) Figure 1. CRADLE architecture GeNerAtiNG cOllABOrAtive sYsteMs FOr DiGitAl liBrAries | MAliziA, BOttONi, AND leviAlDi 175 socioeconomic, and environment dimen- sions. We now show in detail the entities and relations in the derived metamodel, shown in figure 2. Actor entities Actors are the users of DLs. Actors interact with the DL through services (interfaces) that are (or can be) affected by the actors preferences and messages (raised events). In the CRADLE metamodel, an actor is an entity with a behavior that may concurrently generate events. Communications with other actors may occur synchronously or asynchronously. Actors can relate through services to shape a digital community, i.e., the basis of a DL society. In fact, communities of students, readers, or librarians interact with and through DLs, generally follow- ing predefined scenarios. As an example, societies can behave as query generator services (from the point of view of the library) and as teaching, learning, and working services (from the point of view of other humans and organiza- tions). Communication between actors within the same or different societies occur through message exchange. To operate, societies need shared data structures and message protocols, enacted by sending structured sequences of queries and retrieving collections of results. The actor entity includes three attributes: 1. Role identifies which role is played by the actor within the DL society. Examples of specific human roles include authors, publishers, editors, maintain- ers, developers, and the library staff. Examples of nonhuman actors include computers, printers, tele- communication devices, software agents, and digital resources in general. 2. Status is an enumeration of possible statuses for the actor: I. None (default value) II. Active (present in the model and actively generat- ing events) III. Inactive (present in the model but not generating events) IV. Sleeping (present in the model and awaiting for a response to a raised event) 3. Events describes a list of events that can be raised by the actor or received as a response message from a service. Examples of events are borrow, reserve, return, etc. Events triggered from digital resources include store, trash, and transfer. Examples of response events are found, not found, updated, etc. have been built for typical services of a DL environment. To improve acceptability and interoperability, CRADLE adopts standard specification sublanguages for representing DL concepts. Most of the CRADLE model primitives are defined as XML elements, possibly enclos- ing other sublanguages to help define DL concepts. In more detail, MIME types constitute the basis for encod- ing elements of a collection. The XML User Interface Language (XUL)42 is used to represent appearance and visual interfaces, and XDoclet is used in the LibGen code generation module, as shown in figure 1.43 ■■ The Cradle Metamodel In the CRADLE formalism, the specification of a DL includes a Collection Model describing the maintained multimedia documents, a Structural Model of informa- tion organization, a Service Model for the DL behavior, and a Societal Model describing the societies of actors and groups of services acting together to carry out the DL behavior. A society is an instance of the CRADLE model defined according to a specific collaboration framework in the DL domain. A society is the highest-level component of a DL and exists to serve the information needs of its actors and to describe its context of usage. Hence a DL collects, preserves, and shares information artefacts for society members. The basic entities in CRADLE are derived from the categorization along the actors, activities, components, Figure 2. The CRADLE metamodel with the E/R formalism 176 iNFOrMAtiON tecHNOlOGY AND liBrAries | DeceMBer 2010 a text document, including scientific articles and books, becomes a sequence of strings. the struct entity A Struct is a structural element specifying a part of a whole. In DLs, structures represent hypertexts, taxono- mies, relationships between elements, or containment. For example, books can be structured logically into chap- ters, sections, subsections, and paragraphs, or physically into cover, pages, line groups (paragraphs), and lines. Structures are represented as graphs, and the struct entity (a vertex) contains four attributes: 1. Document is a pointer to the document entity the structure refers to. 2. Id is a unique identifier for a structure element. 3. Type takes three possible values: I. Metadata denotes a content descriptor, for instance title, author, etc. II. Layout denotes the associated layout, e.g., left frame, columns, etc. III. Item indicates a generic structure element used for extending the model. 4. Values is a list of values describing the element con- tent, e.g., title, author, etc. Actors interact with services in an event-driven way. Services are connected via messages (send and reply) and can be sequential, concurrent, or task-related (when a ser- vice acts as a subtask of a macroservice). Services perform operations (e.g., get, add, and del) on collections, producing collections of documents as results. Struct elements are connected to each other as nodes of a graph representing metadata structures associated with documents. The metamodel has been translated to a DSVL, asso- ciating symbols and icons with entities and relations (see “CRADLE Language and Tools” below). With respect to the six core concepts of the DELOS Manifesto (content, user, functionality, quality, policy, and architecture), con- tent can be modeled in CRADLE as collections and structs, user as actor, and functionality as service. The quality con- cept is not directly modeled in CRADLE, but for quality of service we support standard service architecture. Policies can be partially modeled by services managing interaction between actors and collections, making it possible to apply standard access policies. From the architectural point of view, we follow the reference architecture of figure 1. ■■ CRADLE Language and Tools In this section we describe the selection of languages and tools of the CRADLE platform. To improve interoperability service entities Services describe scenarios, activities, operations, and tasks that ultimately specify the functionalities of a DL, such as collecting, creating, disseminating, evaluating, organizing, personalizing, preserving, requesting, and selecting documents and providing services to humans concerned with fact-finding, learning, gathering, and exploring the content of a DL. All these activities can be described and implemented using scenarios and appear in the DL setting as a result of actors using services (thus societies). Furthermore, these activities realize and shape relationships within and between societies, services, and structures. In the CRADLE metamodel, the service entity models what the system is required to do, in terms of actions and processes, to achieve a task. A detailed task analysis helps understand the current system and the information flow within it in order to design and allocate tasks appropriately. The service entity has four attributes: 1. Name is a string representing a textual description of the service. 2. Sync states whether communication is synchronous or asynchronous, modeled by values wait and nowait, respectively. 3. Events is a list of messages that can trigger actions among services (tasks); for example, valid or notValid in case of a parsing service. 4. Responses contain a list of response messages that can reply to raised events; they are used as a communica- tion mechanism by actors and services. the collection entity Collections are sets of documents of arbitrary type (e.g., bits, characters, images, etc.) used to model static or dynamic content. In the static interpretation, a collection defines information content interpreted as a set of basic elements, often of the same type, such as plain text. Examples of dynamic content include video delivered to a viewer, ani- mated presentations, and so on. The attributes of collection are name and documents. Name is a string, while documents is a list of pairs (DocumentName, DocumentLabel), the latter being a pointer to the document entity. the Document entity Documents are the basic elements in a DL and are modeled with attributes label and structure. Label defines a textual string used by a collection entity to refer to the document. We can consider it as a document identifier, specifying a class or a type of document. Structure defines the semantics and area of appli- cation of the document. For example, any textual representation can be seen as a string of characters, so that GeNerAtiNG cOllABOrAtive sYsteMs FOr DiGitAl liBrAries | MAliziA, BOttONi, AND leviAlDi 177 graphs. Model manipulation can then be expressed via graph grammars also specified in AToM3. The general process of automatic creation of coop- erative DL environments for an application is shown in figure 3. Initially, a designer formalizes a conceptual description of the DL using the CRADLE metamodel concepts. This phase is usually preceded by an analysis of requirements and interaction scenarios, as seen previ- ously. Model specifications are then provided to a DL code generator (written in Python within AToM3) to pro- duce DLs tailored to specific platforms and requirements. These are built on a collection of templates of services and configurable components providing infrastructure for the new DL. The sketched infrastructure includes classes for objects (tasks), relationships making up the DL, and processing tools to upload the actual library collection from raw documents, as well as services for searching and browsing and for document collections maintenance. The CRADLE generator automatically generates different kinds of output for the CRADLE model of the cooperative DL environment, such as service and collection managers. Collection managers define the logical schemata of the DL, which in CRADLE correspond to a set of MIME types, XUL and XDoclet specifications, representing digital objects, their component parts, and linking infor- mation. Collection managers also store instances of their and collaboration, CRADLE makes extensive use of existing standard spec- ification languages. Most CRADLE outputs are defined with XML-based formats, able to enclose other specific languages. The basic languages and corresponding tools used in CRADLE are the following: ■■ MIME type. Multipurpose Internet Mail Extensions (MIME) constitute the basis for encoding documents in CRADLE, supporting several file formats and types of charac- ter encoding. MIME was chosen because of wide availability of MIME types, and standardisation of the approach. This makes it a natural choice for DLs where dif- ferent types of documents need to be managed (PDF, HTML, Doc, etc.). Moreover, MIME standards for character encoding descrip- tions help keeping the CRADLE framework open and compliant with standards. ■■ XUL. The XML User Interface Language (XUL) is an XML-based markup language used to represent appearance and visual interfaces. XUL is not a public standard yet, but it uses many existing standards and technologies, including DTD and RDF,44 which makes it easily readable for peo- ple with a background in Web programming and design. The main benefit of XUL is that it provides a simple definition of common user interface elements (widgets). This drastically reduces the software devel- opment effort required for visual interfaces. ■■ XDoclet. XDoclet is used for generating services from tagged-code fragments. It is an open-source code generation library which enables attribute-ori- ented programming for Java via insertion of special tags.45 It includes a library of predefined tags, which simplify coding for various technologies, e.g., Web services. The motivation for using XDoclet in the CRADLE framework is related to its approach for template code generation. Designers can describe templates for each service (browse, query, and index) and the XDoclet generated code can be automatically transformed into the Java code for managing the specified service. ■■ AToM3. AToM3 is a metamodeling system to model graphical formalisms. Starting from a metaspecifi- cation (in E/R), AToM3 generates a tool to process models described in the chosen formalism. Models are internally represented using abstract syntax Figure 3. Cooperative DL generation process with CRADLE framework 178 iNFOrMAtiON tecHNOlOGY AND liBrAries | DeceMBer 2010 and (3) the metadata operations box. The right column manages visualization and mul- timedia information obtained from documents. The basic features provided with the UI templates are docu- ment loading, visualization, metadata organization, and management. The layout template, in the collection box, manages the visualization of the documents contained in a collection, while the visualization template works according to the data (MIME) type specified by the document. Actually, by selecting a document included in the collection, the corresponding data file is automatically uploaded and visualized in the UI. The metadata visualization in the code template reflects the metadata structure (a tree) represented by a struct, specifying the relationship between parent and child nodes. Thus the XUL template includes an area (the meta- data box) for managing tree structures as described in the visual model of the DL. Although the tree-like visualiza- tion has potential drawbacks if there are many metadata items, there should be no real concern with medium loads. The UI template also includes a box to perform opera- tions on metadata, such as insert, delete, and edit. Users can select a value in the metadata box and manipulate the presented values. Figure 4 shows an example of a UI generated from a basic template. service templates To achieve automated code generation, we use XDoclet to specify parameters and service code generation according to such parameters. CRADLE can automatically annotate Java files with name–value pairs, and XDoclet provides a syntax for parameter specification. Code generation is classes and function as search engines for the system. Services classes also are generated and are represented as attribute-oriented classes involving parts and features of entities. ■■ CRADLE platform The CRADLE platform is based on a model-driven approach for the design and automatic generation of code for DLs. In particular, the DSVL for CRADLE has four diagram types (collection, structure, service, and actor) to describe the different aspects of a DL. In this section we describe the user interface (UI) and service templates used for generating the DL tools. In particular, the UI layout is mainly generated from the structured information provided by the document, struct, and collection entities. The UI events are managed by invoking the appropriate services according to the imported XUL templates. At the service and communica- tion levels, the XDoclet code is generated by the service and actor entities, exploiting their relationships. We also show how code generation works and the advanced platform features, such as automatic service discovery. At the end of the section a running example is shown, rep- resenting all the phases involved in using the CRADLE framework for generating the DL tools for a typical library scenario. user interface templates The generation of the UI is driven by the visual model designed by the CRADLE user. Specifically, the model entities involved in this process are document, struct and collection (see figure 2) for the basic components and lay- out of the interfaces, while linked services are described in the appropriate templates. The code generation process takes place through transformations implemented as actions in the AToM3 metamodel specification, where graph-grammar rules may have a condition that must be satisfied for the rule to be applied (preconditions), as well as actions to be performed when the rule is executed (postconditions). A transformation is described during the visual modeling phase in terms of conditions and corresponding actions (inserting XUL language statements for the interface in the appropriate code template placeholders). The gener- ated user interface is built on a set of XUL template files that are automatically specialized on the basis of the attributes and relationships designed in the visual mod- eling phase. The layout template for the user interface is divided into two columns (see figure 4). The left column is made of three boxes: (1) the collection box (2) the metadata box, Figure 4. An example of an automatically generated user inter- face. (A) document area; (B) collection box; (C) metadata box; (D) metadata operations box. GeNerAtiNG cOllABOrAtive sYsteMs FOr DiGitAl liBrAries | MAliziA, BOttONi, AND leviAlDi 179 "msg arguments.argname"> { "" , "" "" } , }; The first two lines declare a class with a name class nameImpl that extends the class name. The XDoclet template tag XDtClass:className denotes the name of the class in the annotated Java file. All standard XDoclet template tags have a namespace starting with “XDt.” The rest of the template uses XDtField : forAllField to iterate through the fields. For each field with a tag named msg arguments.argname (checked using XDtField : ifHasFieldTag), it creates a subarray of strings using the values obtained from the field tag parameters. XDtField : fieldName gives the name of the field, while XDtField : fieldTagValue retrieves the value of a given field tag parameter. Characters that are not part of some XDoclet template tags are directly copied into the generated code. The following code segment was generated by XDoclet using the annotated fields and the above tem- plate segment: public class MSGArgumentsImpl extends MSGArguments { public static String[ ][ ] argumentNames = new String[ ][ ]{ { "eventMsg" , " event " , " eventstring " } , { " responseMsg " , " response " , " responsestring " } , }; } Similarly, we generate the getter and setter methods for each field: public get () { return ; } public void set ( String value ) { based on code templates. Hence service templates are XDoclet templates for transforming XDoclet code frag- ments obtained from the modeled service entities. The basic XDoclet template manages messages between services, according to the event and response attributes described in “CRADLE Language and Tools” above. In fact, CRADLE generates a Java application (a service) that needs to receive messages (event) and reply to them (response) as parameters for the service application. In XDoclet, these can be attached to the cor- responding field by means of annotation tags, as in the following code segments: public class MSGArguments { . . . . . . /* * @msg arguments.argname name="event " desc="event_string " */ protected String eventMsg = null; /* * @msg arguments.argname name="response" * desc="response_string " */ protected String responseMsg = null; } Each msg arguments.argname related to a field is called a field tag. Each field tag can have multiple parameters, listed after the field tag. In the tag name msg arguments .argname, the prefix serves as the namespace of all tags for this particular XDoclet application, thus avoiding naming conflicts with other standard or customized XDoclet tags. Not only fields can be annotated, but also other entities such as class and functions can have tags too. XDoclet enables powerful code generation requir- ing little or no customization (depending on how much is provided by the template). The type of code to be generated using the parameters is defined by the corre- sponding XDoclet template. We have created template files composed of Java codes and special XDoclet instructions in the form of XML tags. These XDoclet instructions allow conditionals (if) and loops (for), thus providing us with expressive power close to a programming language. In the following example, we first create an array containing labels and other information for each argument: public class Impl extends { public static String[ ][ ] argumentNames = new String[ ][ ] { " , value ) ; }< /XDtField : ifHasFieldTag> This translates into the following generated code: public java.lang.String get eventMsg ( ) { return eventMsg ; } public void set eventMsg ( String value ) { setValue ( "eventMsg" , value ) ; } public java.lang.String getresponseMsg ( ) { return getresponseMsg ; } public void setresponseMsg ( String value ) { setValue ( " responseMsg " , value ) ; } The same template is used for managing the name and sync attributes of service entities. code Generation, service Discovery, and Advanced Features A service or interface template only describes the solu- tion to a particular design problem—it is not code. Consequently, users will find it difficult to make the leap from the template description to a particular implemen- tation even though the template might include sample code. Others, like software engineers, might have no trouble translating the template into code, but they still may find it a chore, especially when they have to do it repeatedly. The CRADLE visual design environment (based on AToM3) helps alleviate these problems. From just a few pieces of information (the visual model), typi- cally application-specific names for actors and services in a DL society along with choices for the design trade- offs, the tool can create class declarations and definitions implementing the template. The ultimate goal of the modeling effort remains, however, the production of reliable and efficiently executable code. Hence a code generation transformation produces interface (XUL) and service (Java code from XDoclet templates) code from the DL model. We have manually coded XUL templates specifying the static setup of the GUI, the various widgets and their layout. This must be complemented with code gener- ated from a DL model of the systems dynamics coded into services. While other approaches are possible,46 we employed the solution implemented within the AToM3 environment according to its graph grammar modeling approach to code generation. CRADLE supports a flexible iterative process for visual design and code generation. In fact, a design change might require substantial reimplementation GeNerAtiNG cOllABOrAtive sYsteMs FOr DiGitAl liBrAries | MAliziA, BOttONi, AND leviAlDi 181 selecting one, the UI activates the metadata operations box—figure 6(D). The selected metadata node will then be presented in the lower (metadata operations) box, labeled “set MetaData Values,” replacing the default “None” value as shown in figure 6. After the metadata item is presented, the user can edit its value and save it by clicking on the “set value” button. The associated action saves the metadata information and causes its display in the intermediate box (tree-like structure), changing the visualization according to the new values. The code generation process for the Do_Search and Front Desk services is based on XDoclet templates. In particular, a message listener template is used to generate the Java code for the Front Desk service. In fact, the Front Desk service is asynchronous and manages communica- tions between actors. The actors classes are generated also by using the services templates since they have attributes, events, and messages, just like the services. The Do_Search service code is based on the producer and consumer templates, since it is synchronous by defini- tion in the modeled scenario. A get method retrieving a collection of documents is implemented from the getter template. The routine invoked by the transformation action for struct entities performs a breadth-first exploration of the metadata tree in the visual model and attaches the cor- responding XUL code for displaying the struct node in the correct position within the graph structure of the UI. collections, while a single rectangle connected to a collection represents a document entity; the circles linked to the document entity are the struct (metadata) entities. Metadata entities are linked to the node rela- tionships (organized as a tree) and linked to the document entity by a metadata LinkType relationship. The search service is synchro- nous (sync attribute set to “wait”). It queries the document collec- tion (get operation) looking for the requested document (using meta- data information provided by the borrow request), and waits for the result of get (a collection of docu- ments). Based on this result, the service returns a Boolean message “Is_Available,” which is then propa- gated as a response to the librarian and eventually to the student, as shown in figure 5. When the library designer has built the model, the transformation process can be run, executing the code generation actions associated with the entities and services represented in the model. The code generation process is based on template code snippets generated from the AToM3 environment graph transformation engine, following the generative rules of the metamodel. We also use pre– and postconditions on application of transformation rules to have code genera- tion depend on verification of some property. The generated UI is presented in figure 6. On the right side, the document area is presented according to the XUL template. Documents are managed according to their MIME type: the PDF file of the example is loaded with the appropriate Adobe Acrobat Reader plug-in. On the left column of the UI are three boxes, according to the XUL template. The collection box—figure 6(B)— presents the list of documents contained in the collection specified by the documents attribute of the library collec- tion entity, and allows users to interact with documents. After selecting a document by clicking on the list, it is presented in the document area—figure 6(A)—where it can be managed (edit, print, save, etc.). In the metadata box—figure 6(C)—the tree structure of the metadata is depicted according to the categoriza- tion modeled by the designer. The XUL template contains all the basic layout and action features for managing a tree structure. The generated box contains the parent and child nodes according to the attributes specified in the corresponding struct elements. The user can click on the root for compacting or exploding the tree nodes; by Figure 5. The Library model, alias the model of the Library society 182 iNFOrMAtiON tecHNOlOGY AND liBrAries | DeceMBer 2010 workflow system. The Release collection maintains the image files in a permanent storage, while data is written to the target database or content management software, together with XML metadata snippets (e.g., to be stored in XML native DBMS). A typical configuration would have the Recognition service running on a server cluster, with many Data- Entry services running on different clients (Web browsers directly support XUL interfaces). Whereas current docu- ment capture environments are proprietary and closed, the definition of an XML-based interchange format allows the suitable assembly of different component-based tech- nologies in order to define a complex framework. The realization of the JDAN DL system within the CRADLE framework can be considered as a preliminary step in the direction of a standard multimedia document managing platform with region segmentation and clas- sification, thus aiming at automatic recognition of image database and batch acquisition of multiple multimedia documents types and formats. Personal and collaborative spaces A personal space is a virtual area (within the DL society) that is modeled as being owned and maintained by a user including resources (document collections, services, etc.), or references to resources, which are relevant to a task, or set of tasks, the user needs to carry out in the DL. Personal spaces may thus contain digital documents in multiple media, personal schedules, visualization tools, and user agents (shaped as services) entitled with various tasks. Resources within personal spaces can be allocated ■■ Designing and Generating Advanced Collaborative DL Systems In this section we show the use of CRADLE as an analyti- cal tool helpful in comprehending specific DL phenomena, to present the complex interplays that occur between CRADLE components and DL concepts in a real DL appli- cation, and to illustrate the possibility of using CRADLE as a tool to design and generate advanced tools for DL development. Modeling Document images collections With CRADLE, the designer can provide the visual model of the DL Society involved in document management and the remaining phases are automatically carried out by CRADLE modules and templates. We have provided the user with basic code templates for the recognition and indexing services, the data-entry plug-in, and archive release. The designer can thus simply translate the par- ticular DL society into the corresponding visual model within the CRADLE visual modeling editor. As a proof of concept, figure 7 models the JDAN archi- tecture, introduced in “Requirements for Modeling Digital Libraries,” exploiting the CRADLE visual language. The Recognition Service performs the automatic document rec- ognition and stores the corresponding document images, together with the extracted metadata in the Archive col- lection. It interacts with the Scanner actor, representing a machine or a human operator that scans paper documents. Designers can choose their own segmentation method or algorithm; what is required to be compliant with the framework is to produce an XDoclet template. It stores the document images into the Archive collection, with its different regions layout information according to the XML metadata schema provided by the designer. If there is at least one region marked as “not interpreted,” the Data- Entry service is invoked on the “not interpreted” regions. The Data-Entry service allows Operators to evaluate the automatic classification performed by the system and edit the segmentation for indexing. Operators can also edit the recognized regions with the classification engine (included in the Recognition service) and adjust their values and sizes. The output of this phase is an XML description that will be imported in the Indexing service for indexing (and eventually querying). The Archive collection stores all of the basic informa- tion kept in JDAN, such as text labels, while the Indexing service, based on a multitier architecture, exploiting JBoss 3.0, has access to them. This service is responsible for turning the data fragments in the Archive collection into useful forms to be presented to the final users, e.g., a report or a query result. The final stage in the recognition process could be to release each document to a content management or Figure 6. The UI generated by CRADLE transforming the Library model in XUL and XDocLet code GeNerAtiNG cOllABOrAtive sYsteMs FOr DiGitAl liBrAries | MAliziA, BOttONi, AND leviAlDi 183 and metadata, but also can share information with the various com- mittees collaborating for certain tasks. ■■ Evaluation In this section we evaluate the pre- sented approach from three different perspectives: usability of the CRADLE notation, its expressiveness, and usability of the generated DLs. usability of crADle Notation We have tested it by using the well known Cognitive Dimensions framework for notations and visual language design.48 The dimensions are usually employed to evaluate the usability of a visual language or notation, or as heuristics to drive the design of innovative visual lan- guages. The significant results are as follows. Abstraction Gradient An abstraction is a grouping of elements to be treated as one entity. In this sense, CRADLE is abstraction-tolerant. It provides entities for high-level abstractions of com- munication processes and services. These abstractions are intuitive as they are visualized as the process they represent (services with events and responses) and easy to learn as their configuration implies few simple attri- butes. Although CRADLE does not allow users to build new abstractions, the E/R formalism is powerful enough to provide basic abstraction levels. closeness of Mapping CRADLE elements have been assigned icons to resemble their real-world counterparts (e.g., a collection is repre- sented as a set of paper sheets). The elements that do not have a correspondence with a physical object in the real world have icons borrowed from well-known notations (e.g., structs represented as graph nodes). consistency A notation is consistent if a user knowing some of its structure can infer most of the rest. In CRADLE, when two elements represent the same entity but can be used either as input or as output, then their shape is equal but incorporates an incoming or an outgoing message in order to differentiate them. See, for example, the icons for services or those for graph nodes representing either a according to the user’s role. For example, a conference chair would have access to conference-specific materi- als, visualization tools and interfaces to upload papers for review by a committee. Similarly, we denote a group space as a virtual area in which library users (the entire DL society) can meet to conduct collaborative activities synchronously or asynchronously. Explicit group spaces are created dynamically by a designer or facilitator who becomes (or appoints) the owner of the space and defines who the participants will be. In addition to direct user-to- user communication, users should be able to access library materials and make annotations on them for every other group to see. Ideally, users should be able to act (and carry DL materials with them) between personal and group spaces or among group spaces to which they belong. It may also be the case, however, that a given resource is referenced in several personal or group spaces. Basic functionality required for personal spaces includes capa- bilities for viewing, launching, and monitoring library services, agents, and applications. Like group spaces, personal spaces should provide users with the means to easily become aware of other users and resources that are present in a given group space at any time, as well as mechanisms to communicate with other users and make annotations on library resources. We employed this personal and group space paradigm in modeling a collaborative environment in the Academic Conferences domain, where a Conference Chair can have a personal view of the document collections (resources) Figure 7. The CRADLE model for the JDAN framwork 184 iNFOrMAtiON tecHNOlOGY AND liBrAries | DeceMBer 2010 of “Sapienza” University of Rome (undergraduate stu- dents), shown in figure 5, and (2) an application employed with a project of Records Management in a collabora- tion between the Computer Science and the Computer Engineering Department of “Sapienza” University, as shown in figure 7. usability of the Generated tools Environments for single-view languages generated with AToM3 have been extensively used, mostly in an aca- demic setting, in different areas like software and Web engineering, modeling, and simulation; urban planning; etc. However, depending on the kind of the domain, generating the results may take some time. For instance, the state reachability analysis in the DL example takes a few minutes; we are currently employing a version of AToM3 that includes Petri-nets formalism where we can test the services states reachability.49 In general, from application experience, we note the general agreement that automated syntactical consistency support greatly simplifies the design of complex systems. Finally, some users pointed out some technical limitations of the cur- rent implementation, such as the fact that it is not possible to open several views at a time. Altogether, we believe this work contributes to make more efficient and less tedious the definition and main- tenance of environments for DLS. Our model-based approach must be contrasted with the programming- centric approach of most CASE tools, where the language and the code generation tools are hard-coded so that whenever a modification has to be done (whether on the language or on the semantic domain) developers have to dive into the code. ■■ Conclusions and Future Work DLs are complex information systems that integrate findings from disciplines such as hypertext, information retrieval, multimedia, databases, and HCI. DL design is often a multidisciplinary effort, including library staff and computer scientists. Wasted effort and poor inter- operability can therefore ensue. Examining the related bibliography, we noted that there is a lack of tools or automatic systems for designing and developing coopera- tive DL systems. Moreover, there is a need for modeling interactions between DLs and users, such as scenario or activity-based approaches. The CRADLE framework fulfills this gap by providing a model-driven approach for generating visual interaction tools for DLs, supporting design and automatic generation of code for DLs. In particular, we use a metamodel made of different diagram types (collection, structures, service, and struct or an actor, with different colors. Diffuseness/terseness A notation is diffuse when many elements are needed to express one concept. CRADLE is terse and not diffuse because each entity expresses a meaning on its own. error-Proneness Data flow visualization reduces the chance of errors at a first level of the specification. On the other hand, some mistakes can be introduced when specifying visual entities, since it is possible to express relations between source and target models which cannot generate semanti- cally correct code. However, these mistakes should be considered “programming errors more than slips,” and may be detected through progressive evaluation. Hidden Dependencies A hidden dependency is a relation between two elements that is not visible. In CRADLE, relevant dependencies are represented as data flows via directed links. Progressive evaluation Each DL model can be tested as soon as it is defined, without having to wait until the whole model is finished. The visual interface for the DL can be generated with just one click, and services can be subsequently added to test their functionalities. viscosity CRADLE has a low viscosity because making small changes in a part of a specification does not imply lots of readjustments in the rest of it. One can change prop- erties, events or responses and these changes will have only local effect. The only local changes that could imply performing further changes by hand are deleting entities or changing names; however, this would imply minimal changes (just removing or updating references to them) and would only affect a small set of subsequent elements in the same data flow. visibility A DL specification consists of a single set of diagrams fit- ting in one window. Empirically, we have observed that this model usually involves no more than fifteen entities. Different, independent CRADLE models can be simulta- neously shown in different windows. expressiveness of crADle The paper has illustrated the expressiveness of CRADLE by defining different entities end relationships for differ- ent DL requisites. To this end, two different applications have been considered: (1) a basic example elaborated with the collaboration of the Information Science School GeNerAtiNG cOllABOrAtive sYsteMs FOr DiGitAl liBrAries | MAliziA, BOttONi, AND leviAlDi 185 Retrieval (Reading, Mass.: Addison-Wesley, 1999). 17. D. Lucarella and A. Zanzi, “A Visual Retrieval Environ- ment for Hypermedia Information Systems,” ACM Transactions on Information Systems 14 (1996): 3–29. 18. B. Wang, “A Hybrid System Approach for Supporting Digital Libraries,” International Journal on Digital Libraries 2 (1999): 91–110,. 19. D. Castelli, C. Meghini, and P. Pagano, “Foundations of a Multidimensional Query Language for Digital Libraries,” in Proc. ECDL ’02, LNCS 2458 (Berlin: Springer, 2002): 251–65. 20. R. N. Oddy et al., eds., Proc. Joint ACM/BCS Symposium in Information Storage & Retrieval (Oxford: Butterworths, 1981). 21. K. Maly, M. Zubair et al., “Scalable Digital Libraries Based on NCSTRL/DIENST,” in Proc. ECDL ’00 (London: Springer, 2000): 168–79. 22. R. Tansley, M. Bass and M. Smith, “DSpace as an Open Archival Information System: Current Status and Future Direc- tions,” Proc. ECDL ’03, LNCS 2769 (Berlin: Springer, 2003): 446–60. 23. K. M. Anderson et al., “Metis: Lightweight, Flexible, and Web-Based Workflow Services for Digital Libraries,” Proc. 3rd ACM/IEEE-CS JCDL ’03 (Los Alamitos, Calif.: IEEE Computer Society, 2003): 98–109. 24. N. Dushay, “Localizing Experience of Digital Content via Structural Metadata,” In Proc. 2nd ACM/IEEE-CS JCDL ’02 (New York: ACM, 2002): 244–52. 25. M. Gogolla et al., “Integrating the ER Approach in an OO Environment,” Proc. ER, ’93 (Berlin: Springer, 1993): 376–89. 26. Heidi Gregersen and Christian S. Jensen, “Temporal Entity-Relationship Models—A Survey,” IEEE Transactions on Knowledge & Data Engineering 11 (1999): 464–97. 27. B. Berkem, “Aligning IT with the Changes using the Goal-Driven Development for UML and MDA,” Journal of Object Technology 4 (2005): 49–65. 28. A. Malizia, E. Guerra, and J. de Lara, “Model-Driven Development of Digital Libraries: Generating the User Interface,” Proc. MDDAUI ’06, http://sunsite.informatik.rwth-aachen.de/ Publications/CEUR-WS/Vol-214/ (accessed Oct 18, 2010). 29. D. L. Atkins et al., “MAWL: A Domain-Specific Language for Form-Based Services,” IEEE Transactions on Software Engineer- ing 25 (1999): 334–46. 30. J. de Lara and H. Vangheluwe, “AToM3: A Tool for Multi-Formalism and Meta-Modelling,” Proc. FASE ’02 (Berlin: Springer, 2002): 174–88. 31. J. M. Morales-Del-Castillo et al., “A Semantic Model of Selective Dissemination of Information for Digital Libraries,” Journal of Information Technology & Libraries 28 (2009): 21–30. 32. N. Santos, F. C. A. Campos, and R. M. M. Braga, “Dig- ital Libraries and Ontology,” in Handbook of Research on Digital Libraries: Design, Development, and Impact, ed. Y.-L. Theng et al. (Hershey, Pa.: Idea Group, 2008): 1:19. 33. F. Wattenberg, “A National Digital Library for Science, Mathematics, Engineering, and Technology Education,” D-Lib Magazine 3 no. 10 (1998), http://www.dlib.org/dlib/october98/ wattenberg/10wattenberg.html (accessed Oct 18, 2010); L. L. Zia, “The NSF National Science, Technology, Engineering, and Mathematics Education Digital Library (NSDL) Program: New Projects and a Progress Report,” D-lib Magazine, 7, no. 11 (2002), http://www.dlib.org/dlib/november01/zia/11zia.html (accessed Oct 18, 2010). 34. U.S. Library of Congress, Ask a Librarian, http://www.loc society), which describe the different aspects of a DL. We have built a code generator able to produce XUL code from the design models for the DL user interface. Moreover, we use template code generation integrating predefined components for the different services (XDoclet language) according to the model specification. Extensions of CRADLE with behavioral diagrams and the addition of analysis and simulation capabilities are under study. These will exploit the new AToM3 capabili- ties for describing multiview DSVLs, to which this work directly contributed. References 1. A. M. Gonçalves, E. A Fox, “5SL: a language for declara- tive specification and generation of digital libraries,” Proc. JCDL ’02 (New York: ACM, 2002): 263–72. 2. L. Candela et al., “Setting the Foundations of Digital Libraries: The DELOS Manifesto,” D-Lib Magazine 13 (2007), http://www.dlib.org/dlib/march07/castelli/03castelli.html (accessed Oct 18, 2010). 3. A. Malizia et al., “A Cooperative-Relational Approach to Digital Libraries,” Proc. ECDL 2007, LNCS 4675 (Berlin: Springer, 2007): 75–86. 4. E. A. Fox and G. Marchionini, “Toward a Worldwide Dig- ital Library,” Communications of the ACM 41 (1998): 29–32. 5. M. A. Gonçalves et al., “Streams, Structures, Spaces, Scenarios, Societies (5s): A Formal Model for Digital Libraries,” ACM Transactions on Information Systems 22 (2004): 270–312. 6. J. C. R. Licklider, Libraries of the Future (Cambridge, Mass.: MIT Pr., 1965). 7. D. M. Levy and C. C. Marshall, “Going Digital: A Look at Assumptions Underlying Digital Libraries,” Communications of the ACM 38 (1995): 77–84. 8. R. Reddy and I. Wladawsky-Berger, “Digital Librar- ies: Universal Access to Human Knowledge—A Report to the President,” 2001, www.itrd.gov/pubs/pitac/pitac-dl-9feb01.pdf (accessed Mar. 16, 2010). 9. E. L. Morgan, “MyLibrary: A Digital Library Framework and Toolkit,” Journal of Information Technology & Libraries 27 (2008): 12–24. 10. T. R. Kochtanek and K. K. Hein, “Delphi Study of Digital Libraries,” Information Processing Management 35 (1999): 245–54. 11. S. E. Howe et al., “The President’s Information Technology Advisory Committee’s February 2001 Digital Library Report and Its Impact,” In Proc. JCDL ’01 (New York: ACM, 2001): 223–25. 12. N. Reyes-Farfan and J. A. Sanchez, “Personal Spaces in the Context of OA,” Proc. JCDL ’03 (IEEE Computer Society, 2003): 182–83. 13. M. Wirsing, Report on the EU/NSF Strategic Workshop on Engineering Software-Intensive Systems, 2004, http://www.ercim. eu/EU-NSF/sis.pdf (accessed Oct 18, 2010) 14. S. Kelly and J.-P. Tolvanen, Domain-Specific Modeling: Enabling Full Code Generation (Hoboken, N.J.: Wiley, 2008). 15. H. R. Turtle and W. Bruce Croft, “Evaluation of an Infer- ence Network-Based Retrieval Model,” ACM Transactions on Information Systems 9 (1991): 187–222. 16. R. A. Baeza-Yates, B. A. Ribeiro-Neto, Modern Information 186 iNFOrMAtiON tecHNOlOGY AND liBrAries | DeceMBer 2010 .mozilla.org/En/XUL (accessed Mar. 16, 2010). 43. XDoclet, Welcome! What is XDoclet? http://xdoclet .sourceforge.net/xdoclet/index.html (accessed Mar. 16, 2010). 44. W3C, Extensible Markup Language (XML) 1.0 (Fifth Edition), http://www.w3.org/TR/2008/REC-xml-20081126/ (accessed Mar. 16, 2010); W3C, Resource Description Framework (RDF), http://www.w3.org/RDF/ (accessed Mar. 16, 2010). 45. H. Wada and J. Suzuki, “Modeling Turnpike Frontend System: A Model-Driven Development Framework Leveraging UML Metamodeling and Attribute-Oriented Programming,” Proc. MoDELS ’05, LNCS 3713 (Berlin: Springer, 2005): 584–600. 46. I. Horrocks, Constructing the User Interface with Statecharts (Boston: Addison-Wesley, 1999). 47. Universal Discover, Description, and Integration OASIS Standard, Welcome to UDDI XML.org, http://uddi.xml.org/ (accessed Mar. 16, 2010). 48. T. R. G. Green and M. Petre, “Usability Analysis of Visual Programming Environments: A ‘Cognitive Dimensions Frame- work,’” Journal of Visual Languages & Computing 7 (1996): 131–74. 49. J. de Lara, E. Guerra, and A. Malizia, “Model Driven Development of Digital Libraries—Validation, Analysis and For- mal Code Generation,” Proc. 3rd WEBIST ’07 (Berlin: Springer, 2008). .gov/rr/askalib/ (accessed on Mar. 16, 2010). 35. C. L. Borgmann, “What are Digital Libraries? Competing Visions,” Information Processing & Management 25 (1999):227–43. 36. C. Lynch, “Coding with the Real World: Heresies and Unexplored Questions about Audience, Economics, and Con- trol of Digital Libraries,” In Digital Library Use: Social Practice in Design and Evaluation, ed. A. P. Bishop, N. A. Van House, and B. Buttenfield (Cambridge, Mass.: MIT Pr., 2003): 191–216. 37. Y. Ioannidis et al., “Digital Library Information-Technol- ogy Infrastructure,” International Journal of Digital Libraries 5 (2005): 266–74. 38. E. A. Fox et al., “The Networked Digital Library of Theses and Dissertations: Changes in the University Community,” Jour- nal of Computing Higher Education 13 (2002): 3–24. 39. H. Van de Sompel and C. Lagoze, “Notes from the Inter- operability Front: A Progress Report on the Open Archives Ini- tiative,” Proc. 6th ECDL, 2002, LNCS 2458 (Berlin: Springer 2002): 144–57. 40. F. De Rosa et al., “JDAN: A Component Architecture for Digital Libraries,” DELOS Workshop: Digital Library Architectures, (Padua, Italy: Edizioni Libreria Peogetto, 2004): 151–62. 41. Defined as a set of actors (users) playing roles and inter- acting with services. 42. Mozilla Developer Center, XUL, https://developer 3129 ---- GeNerAtiNG cOllABOrAtive sYsteMs FOr DiGitAl liBrAries | visser AND BAll 187 Marijke Visser and Mary Alice Ball The Middle Mile: The Role of the Public Library in Ensuring Access to Broadband of fundamentally altering culture and society. In some circles the changes happen in real time as new Web-based applications are developed, adopted, and integrated into the user’s daily life. These users are the early adopters; the Internet cognoscenti. Second tier users appreciate the availability of online resources and use a mix of devices to access Internet content but vary in the extent to which they try the latest application or device. The third tier users also vary in the amount they access the Internet but have generally not embraced its full potential, from not seeking out readily available resources to not connecting at all.1 Regardless of the degree to which they access the Internet, all of these users require basic technology skills and a robust underlying infrastructure. Since the introduction of Web 2.0, the number and type of participatory Web-based applications has continued to grow. Many people are eagerly taking part in creating an increasing variety of Web-based content because the basic tools to do so are widely available. The amateur, creating and sharing for primarily personal reasons, has the ability to reach an audience of unprecedented size. In turn, the Internet audience, or virtual audience, can select from a vast menu of formats, including multimedia and print. With print resources disappearing, it is increasingly likely for an individual to only be able to access necessary material online. Web-based resources are unique in that they enable an undetermined number of people, person- ally connected or complete strangers, to interact with and manipulate the content thereby creating something new with each interaction and subsequent iteration. Many of these new resources and applications require much more bandwidth than traditional print resources. With the necessary technology no longer out of reach, a cross- section of society is affecting the course the twenty-first century is taking vis à vis how information is created, who can create it, and how we share it.2 In turn, who can access Web-based content and who decides how it can be accessed become critical questions to answer. As people become more adept at using Web-based tools and eager to try new applications, the need for greater broadband will intensify. The economic downturn is having a marked effect on people’s Internet use. If there was a preexisting problem with inadequate access to broadband, current circumstances exacerbate it to where it needs immediate attention. Access to broadband Internet today increases This paper discusses the role of the public library in ensuring access to the broadband communication that is so critical in today’s knowledge-based society. It examines the culture of information in 2010, and then asks what it means if individuals are online or not. The paper also explores current issues surrounding telecommunications and policy, and finally seeks to understand the role of the library in this highly technological, perpetually connected world. I n the last twenty years library collections have evolved from being predominantly print-based to ones that have a significant digital component. This trend, which has a direct impact on library services, has only accelerated with the advent of Web 2.0 technologies and participa- tory content creation. Cutting-edge libraries with next generation catalogs encourage patrons to post reviews, contribute videos, and write on library blogs and wikis. Even less adventuresome institutions offer a variety of electronic databases licensed from multiple publishers and vendors. The piece of these library portfolios that is at best ignored and at worst vilified is the infrastructure that enables Internet connectivity. In 2010, broadband telecommunication is recognized as essential to access the full range of information resources. Telecommunications experts articulate their concerns about the digital divide by focusing on first- and last-mile issues of bringing fiber and cable to end users. The library, particularly the public library, represents the metaphorical middle mile provid- ing the public with access to rich information content. Equally important, it provides technical knowledge, sub- ject matter expertise, and general training and support to library users. This paper discusses the role of the public library in ensuring access to the broadband communication that is so critical in today’s knowledge-based society. It examines the culture of information in 2010, and then asks what it means if individuals are online or not. The paper also explores current issues surrounding telecommunications and policy, and finally seeks to understand the role of the library in this highly technological, perpetually connected world. ■■ The Culture of Information Information today is dynamic. As the Internet contin- ues on its fast paced, evolutionary track, what we call ‘information’ fluctuates with each emerging Web-based technology. Theoretically a democratic platform, the Internet and its user-generated content is in the process Marijke visser (mvisser@alawash.org) is information Technol- ogy Policy analyst and Mary Alice Ball (maryaliceball@yahoo .com) former chair, Telecommunications Subcommittee, office for information Technology Policy, american library association, washington, Dc. 188 iNFOrMAtiON tecHNOlOGY AND liBrAries | DeceMBer 2010 The geographical location of a community will also influ- ence what kind of Internet service is available because of deployment costs. These costs are typically reflected in varying prices to consumers. In addition to the physical layout of an area, current federal telecommunications policies limit the degree to which incentives can be used on the local level.7 Encouraging competition between ISPs, including municipal electric utilities, incumbent local exchange carriers, and national cable companies, for example, requires coordination between local needs and state and federal policies. Such coordinated efforts are inherently difficult when taking into consideration the numerous differences between locales. Ultimately, though, all of these factors influence the price end users must pay for Internet access. With necessary infrastructure and telecommunica- tions policies in place, there are individual behaviors that also affect broadband adoption. According to the Pew study, “Home Broadband Adoption 2008,” 62 percent of dial-up users are not interested in switching to broad- band.8 Clearly there is a segment of the population that has not yet found personal relevance to high-speed access to online resources. In part this may be because they only have experience with dial-up connections. Depending on dial-up gives the user an inherently inferior experi- ence because bandwidth requirements to download a document or view a website with multimedia features automatically prevent these users from accessing the same resources as a user with a high-speed connection. A dial-up user would not necessarily be aware of this differ- ence. If this is the only experience a user has it might be enough to deter broadband adoption, especially if there are other contributing factors like lack of technical com- fort or availability of relevant content. Motivation to use the Internet is influenced by the extent to which individuals find content personally rel- evant. Whether it is searching for a job and filling out an application, looking at pictures of grandchildren, using Skype to talk to a family member deployed in Iraq, researching healthcare providers, updating a personal webpage, or streaming video, people who do these things have discovered personally relevant Internet content and applications. Understanding the potential relevance of going online makes it more likely that someone would experiment with other applications, thus increasing both the familiarity with what is available and the comfort level with accessing it. Without relevant content, there is little motivation for someone not inclined to experiment with Internet technology to cross what amounts to a sig- nificant hurdle to adoption. Anthony Wilhelm argues in a 2003 article discussing the growing digital divide that culturally relevant content is critical in increasing the likelihood that non-users will want to access Web-based resources.9 The scope of the issue of providing culturally relevant content is underscored in the 2008 Pew study, the amount of information and variety of formats avail- able to the user. In turn more content is being distributed as users create and share original content.3 Businesses, nonprofits, municipal agencies, and educational institu- tions appreciate that by putting their resources online they reach a broader segment of their constituency. This approach to reaching an audience works provided the constituents have their own access to the materials, both physically and intellectually. It is one thing to have an Internet connection and another to have the skill set nec- essary to make productive use of it. As reported in Job-Seeking in U.S. Public Libraries in 2009, “less than 44% of the top 100 U.S. retailers accept in- store paper applications.”4 Municipal, state, and federal agencies are increasingly putting their resources online, including unemployment benefit applications, tax forms, and court documents.5 In addition to online documents, the report finds social service agencies may encourage clients to make appointments and apply for state jobs online.6 Many of the processes that are now online require an ability to navigate the complexities of the Internet at the same time as navigating difficult forms and websites. The combination of the two can deter someone from retrieving necessary resources or successfully completing a critical procedure. While early adopters and policy-makers debate the issues surrounding Internet access, the other strata of society, knowingly or not and to varying degrees, are enmeshed in the outcomes of these ongoing discussions because their right to information is at stake. ■■ Barriers to Broadband Access By condensing Internet access issues to focus on the availability of adequate and sustainable broadband, it is possible to pinpoint four significant barriers to access: price, availability, perceived relevance, and technical skill level. The first two barriers are determined by existing telecommunications infrastructure as well as local, state, and federal telecommunications policies. The latter barri- ers are influenced by individual behaviors. Both divisions deserve attention. If local infrastructure and the Internet service provider (ISP) options do not support broadband access to all areas within its boundaries, the result will be that some commu- nity members can have broadband services at home while others must rely on work or public access computers. It is important to determine what kind of broadband services are available (e.g., cable, DSL, fiber, satellite) and if they are robust enough to support the activities of the commu- nity. Infrastructure must already be in place or there must be economic incentive for ISPs to invest in improving current infrastructure or in installing new infrastructure. GeNerAtiNG cOllABOrAtive sYsteMs FOr DiGitAl liBrAries | visser AND BAll 189 at all. Success hinges on understanding that each com- munity is unique, on leveraging its strengths, and on ameliorating its weaknesses. Local government can play a significant role in the availability of broadband access. From a municipal per- spective, emphasizing the role of broadband as a factor in economic development can help define how the munici- pality should most effectively advocate for broadband deployment and adoption. Gillett offers four initiatives appropriate for stimulating broadband from a local view- point. Municipal governments can ■■ become leaders in developing locally relevant Internet content and using broadband in their own services; ■■ adopt policies that make it easier for ISPs to offer broadband; ■■ subsidize broadband users and/or ISPs; or ■■ become involved in providing the infrastructure or services themselves.12 Individually or in combination these four initiatives underscore the fact that government awareness of the possibilities for community growth made possible by broadband access can lead to local government sup- port for the initiatives of other local agencies, including nonprofit, municipal, or small businesses. Agencies part- nering to support community needs can provide evidence to local policy makers that broadband is essential for com- munity success. Once the municipality sees the potential for social and economic development, it is more likely to support policies that stimulate broadband buildout. Building strong local partnerships will set the stage for the development of a sustainable broadband initiative as the different stakeholders share perspectives that take into account a variety of necessary components. When the time comes to implement a strategy, not only will different perspectives have been included, the plan will have champions to speak for it: the government, ISPs, public and private agencies, and community members. It is important to know which constituents are already engaged in supporting community broadband initiatives and which should be tapped. The ultimate purpose in establishing broadband Internet access in a community is to benefit the individual community members, thereby stimulating local economic development. Key players need to represent agencies that recognize the individual voice. A 2004 study led by Strover provides an example of the importance of engaging local community leaders and agencies in developing a successful broadband access project.13 The study looked at thirty-six communities that received state funding to establish community technology centers (CTC). It addressed the effective use and manage- ment of CTCs and called attention to the inadequacy of supplying the hardware without community support which found that of the 27 percent of adult Americans who are not Internet users, 33 percent report they are not interested in going online.10 That Pew can report similar information five years after the Wilhelm article identifies a barrier to equitable access that has not been adequately resolved. ■■ Models for Sustainable Broadband Availability In discussing broadband, the question of what constitutes broadband inevitably arises. Gillett, Lehr, and Osoria, in “Local Government Broadband Initiatives,” offers a functional definition: “access is ‘broadband’ if it repre- sents a noticeable improvement over standard dial-up and, once in place, is no longer perceived as the limit- ing constraint on what can be done over the Internet.”11 While this definition works in relationship to dial-up, it is flexible enough to apply to all situations by focusing on “a noticeable improvement” and “no longer perceived as the limiting constraint” (added emphasis). Ensuring sustainable broadband access necessitates anticipating future demand. Short sighted definitions, applicable at a set moment in time, limit long-term viability of alterna- tive solutions. Devising a sustainable solution calls for careful scru- tiny of alternative models, because the stakes are so high in the broadband debate. There are many different play- ers involved in constructing information policies. This does not mean, however, that their perspectives are mutu- ally exclusive. In debates with multiple perspectives, it is important to involve stakeholders who are aligned with the ultimate goal: assuring access to quality broadband to anyone going online. What is successful for one community may be entirely inappropriate in another; designing a successful system requires examining and comparing a range of scenarios. Existing circumstances may predetermine a particular starting point, but one first step is to evaluate best prac- tices currently in place in a variety of communities to come up with a plan that meets the unique criteria of the community in question. Sustainable broadband solutions need to be developed with local constituents in mind and successful solutions will incorporate the realities of cur- rent and future local technologies and infrastructure as well as local, state, and federal information policies. Presupposing that the goal is to provide the commu- nity with the best possible option(s) for quality broadband access, these are key considerations to take into account when devising the plan. In addition to the technologi- cal and infrastructure issues, within a community there will be a combination of ways people access the Internet. There will be those who have home access, those who need public access, and those who do not seek access 190 iNFOrMAtiON tecHNOlOGY AND liBrAries | DeceMBer 2010 the current emphasis on universal broadband depends on selecting the best of the alternative plans according to carefully vetted criteria in order to develop a flexible and forward-thinking course of action. Can we let people remain without access to robust broadband and the necessary skill set to use it effectively? No. As more and more resources critical to basic life tasks are accessible only online, those individuals that face challenges to going online will likely be socially and economically disadvantaged when compared to their online counterparts. Recognition of this poten- tial for intensifying digital divide is recognized in the Federal Communication Commission’s (FCC) National Broadband Plan (NBP) released in March 2010.18 The NBP states six national broadband goals, the third of which is “Every American should have affordable access to robust broadband service, and the means and skills to subscribe if they so choose.”19 Research conducted for the recom- mendations in the NBP was comprehensive in scope including voices from industry, public interest, academia, and municipal and state government. Responses to more than thirty public notices issued by the FCC provide evidence of wide concern from a variety of perspectives that broadband access should become ubiquitous if the United States is to be a competitive force in the twenty- first century. Access to essential information such as govern- ment, public safety, educational, and economic resources requires a broadband connection to the Internet. It is incumbent on government officials, ISPs, and community organizations to share ideas and resources to achieve a solution for providing their communities with robust and sustainable broadband. It is not necessary to have all users up to par with the early adopters. There is not a one-size-fits-all approach to wanting to be connected, nor is there a one-size-fits-all solution to providing access. What is important is that an individual can go online via a robust, high-speed connection that meets that indi- vidual’s needs at that moment. What this means for finding solutions is ■■ there needs to be a range of solutions to meet the needs of individual communities; ■■ they need to be flexible enough to meet the evolv- ing needs of these communities as applications and online content continue to change; and ■■ they must be sustainable for the long term so that the community is prepared to meet future needs that are as yet unknown. Solutions to providing broadband Internet access will be most successful when they are designed starting at the local level. Community needs vary according to local demographics, geography, existing infrastructure, types of service providers, and how state and federal systems in place. Users need a support system that high- lights opportunities available via the Internet and that provides help when they run into problems. Access is more than providing the infrastructure and hardware. The potential users must also find content that is cultur- ally relevant in an environment that supports local needs and expectations. Strover found the most successful CTCs were located in places that “actively attracted people for other social and entertaining reasons.”14 In other words, the CTCs did not operate in a vacuum devoid of social context. Successful adoption of the CTCs as a resource for information was dependent on the targeted population finding culturally relevant content in a supportive envi- ronment. An additional point made in the study showed that without strong community leadership, there was not significant use of the CTC even when placed in an already established community center.15 This has signifi- cant implications for what constitutes access as libraries plan broadband initiatives. Investments in technology and a national commit- ment to ensure universal access to these new technologies in the 1990s provide the current policy framework. As suggested by Wilhelm in 2003, to continue to move for- ward the national agenda needs to focus on updating policies to fit new information circumstances as they arise. Today’s information policy debates should empha- size a similar focus. Beyond accelerating broadband deployment into underserved areas, Wilhelm suggests there needs to be support for training and content devel- opment that guarantees communities will actually use and benefit from having broadband deployed in their area.16 Technology training and support for local agencies that provide the public with Internet access, as well as opportunities for the individuals themselves, is essential if policies are going to actually lead to useful broadband adoption. Individual and agency Internet access and adoption require investment beyond infrastructure; they depend on having both culturally relevant content and the information literacy skills necessary to benefit from it. ■■ Finding the Right Solution Though it may have taken an economic crisis to bring broadband discussions into the living room, the result is causing renewed interest in a long-standing issue. Many states have formed broadband task forces or councils to address the lack of adequate broadband access at the state level and, on the national front, broadband was a key component of the American Recovery and Reinvestment Act of 2009.17 The issue changes as technologies evolve but the underlying tenet of providing people access to the information and resources they need to be produc- tive members of society is the same. What becomes of GeNerAtiNG cOllABOrAtive sYsteMs FOr DiGitAl liBrAries | visser AND BAll 191 difficult to measure, these kinds of social and cultural capital are important elements in ongoing debates about uses and consequences of broadband access. An ongoing challenge for those interested in the social, economic, and policy consequences of modern information networks will be to keep up with changing notions of what it means to be connected in cyberspace.”20 The social contexts in which a broadband plan will be enacted influence the appropriateness of different scenarios and should help guide which ones are imple- mented. Engaging a variety of stakeholders will increase the likelihood of positive outcomes as community mem- bers embrace the opportunities provided by broadband Internet access. It is difficult, however, to anticipate the outcomes that may occur as users become more familiar with the resources and achieve a higher level of comfort with technology. Ramirez states, The “unexpected outcomes” section of many evalua- tion reports tends to be rich with anecdotes . . . . The unexpected, the emergent, the socially constructed innovations seem to be, to a large extent, off the radar screen, and yet they often contain relevant evidence of how people embrace technology and how they inno- vate once they discover its potential.21 Community members have the most to gain from having broadband Internet access. Including them will increase the community’s return on its investment as they take advantage of the available resources. Ramirez sug- gests that “participatory, learning, and adaptive policy approaches” will guide the community toward develop- ing communication technology policies that lead to a vibrant future for individuals and community alike.22 As success stories increase, the aggregation of local commu- nities’ social and economic growth will lead to a net sum gain for the nation as a whole. ■■ The Role of the Library Public libraries play an important role in providing Internet access to their community members. According to a 2008 study, the public library is the only outlet for no-fee Internet access in 72.5 percent of communities nationwide; in rural communities the number goes up to 82.0 percent.23 Beyond having desktop or, in some cases, wireless access, public libraries offer invaluable user support in the form of technical training and locally relevant content. Libraries provide a secondary commu- nity resource for other local agencies who can point their clients to the library for no-fee Internet access. In today’s economy where anecdotal reports show an increase in library use, particularly Internet use, the role of the public policies mesh with local ordinances. Local stakeholders best understand the complex interworking of their com- munity and are aware of who should be included in the decision-making process. Including a local perspective will also increase the likelihood that as community needs change, new issues will be brought to the attention of policy makers and agencies who advocate for the indi- vidual community members. Community agencies that already are familiar with local needs, abilities, and expectations are logical groups to be part of developing a successful local broadband access strategy. The library exemplifies a community resource whose expertise in local issues can inform infor- mation policy discussions on local, state, and federal levels. As a natural extension of library service, libraries offer the added value support necessary for many users to successfully navigate the Internet. The library is an estab- lished community hub for informational resources and provides dedicated staff, technology training opportuni- ties, and no-fee public access computers with an Internet connection. Libraries in many communities are creating locally relevant Web-based content as well as linking to other community resources on their own websites. Seeking a partnership with the local library will augment a community broadband initiative. It is difficult to appreciate the impacts of current information technologies because they change so rap- idly there is not enough time to realistically measure the effects of one before it is mixed in with a new innovation. With Web-based technologies there is a lag time between what those in the front of the pack are doing online and what those in the rear are experiencing. While there is general consensus that broadband Internet access is critical in promoting social and economic development in the twenty-first century as is evidenced by the national purposes outlined in the NBP, there is not necessarily agreement on benchmarks for measuring the impacts. Three anticipated outcomes of providing community access to broadband are ■■ civic participation will increase; ■■ communities will realize economic growth; and ■■ individual quality of life will improve. When a strategy involves significant financial and energy investments there is a tendency to want palpable results. The success of providing broadband access in a community is challenging to capture. To achieve a level of acceptable success it is necessary to focus on local communities and aggregate anecdotal evidence of incre- mental changes in public welfare and economic gain. Acceptable success is subjective at best but can be usefully defined in context of local constituencies. Referring to participation in the development of a vibrant culture, Horrigan notes that “while inherently 192 iNFOrMAtiON tecHNOlOGY AND liBrAries | DeceMBer 2010 isolation. An individual must possess skills to navigate the online resources. As users gain an understanding of the potential personal growth and opportunities broad- band yields, they will be more likely to seek additional online resources. By stimulating broadband use, the library will contribute to the social and economic health of the community. If the library is to extend its role as the information hub in the community by providing no-fee access to broadband to anyone who walks through the door, the local community must be prepared to support that role. It requires a commitment to encourage build out of appro- priate technology necessary for the library to maintain a sustainable Internet connection. It necessitates that local communities advocate for national information and com- munication policies that are pro-library. When public policy supports the library’s efforts, the local community benefits and society at large can progress. What if the library’s own technology needs are not met? The role of the library in its community is becoming increasingly important as more people turn to it for their Internet access. Without sufficient revenue, the library will have a difficult time meeting this additional demand for services. In turn, in many libraries increased demand for broadband access stretches the limit of IT support for both the library staff and the patrons needing help at the computers. What will be the fallout from the library not being able to provide Internet services the patrons desire and require? Will there be a growing skills difference between people who adopt emerging technologies and incorporate them into their daily lives and those who maintain the technological status quo? What will the social impact be of remaining off line either completely or only marginally? Can the library be the bridge between those on the edge, those in the middle, and those at the end? With a strong and well articulated vision for the future, the library can be the link that provides the com- munity with sustainable broadband. ■■ Conclusion The recent national focus on universal broadband access has provided an opportunity to rectify a lapse in effective information policy. Whether the goal includes facilitating meaningful access continues to be more elusive. As gov- ernment, organizations, businesses, and individuals rely more heavily on the Internet for sharing and receiving information, broadband Internet access will continue to increase in importance. Following the status quo will not necessarily lead to more people having broadband access in the long run. The early adopters will continue to stimu- late technological innovation which, in turn, will trickle down the ranks of the different user types. Currently, library as a stable Internet provider cannot be overesti- mated. To maintain its vital function, however, the library must also resolve infrastructure challenges of its own. Because of the increased demand for access to Internet resources, public libraries are finding their current broad- band services are not able to support the demand of their patrons. The issues are two-fold: increased patron use means there are often neither sufficient workstations nor broadband speeds to meet patron demand. In 2008, about 82.5 percent of libraries reported an insufficient number of public workstations, and about 57.5 percent reported insufficient broadband speeds.24 To add to these already significant issues, the report indicates libraries are having trouble supporting the necessary information technology (IT) because of either staff time constraints or the lack of a dedicated IT staff.25 Public libraries are facing consider- able infrastructure management issues at a time when library use is increasing. Overcoming the challenges successfully will require support on the local, state, and federal level. Here is where the librarian, as someone trained to become inherently familiar with the needs of her local constituency and ethically bound to provide access to a variety of information resources, needs to insert herself into the debate. Librarians need to be ahead of the crowd as the voice that assures content will be readily accessible to those who seek it. Today, the elemental policy issue regarding access to information via the Internet hinges on connectivity to a sustainable broadband network. To promote equitable broadband access, the librarian needs be aware of the pertinent information policies in place or under consideration, and be able to anticipate those in the future. Additionally, she will need to educate local policy makers about the need for broadband in their com- munity. In some circumstances, the librarian will need to move beyond her local community and raise awareness of community access issues on the state and federal level. The librarian is already able to articulate numerous issues to a variety of stakeholders and can transfer this skill to advocate for sustainable broadband strategies that will succeed in her local community. There are many strata of Internet users, from those in the forefront of early adoption to those not interested in being online at all. The early adopters drive the market which responds by making resources more and more likely to be primarily available only online. As we con- tinue this trend, the social repercussions increase from merely not being able to access entertainment and news to being unable to participate in the knowledge-based society of the twenty-first century. By folding in added value online access for the community, the library helps increase the likelihood that the community will benefit from broadband being available to the library patrons and by extension to the community as a whole. To realize the Internet’s full potential, access to it cannot be provided in GeNerAtiNG cOllABOrAtive sYsteMs FOr DiGitAl liBrAries | visser AND BAll 193 community, the entire community benefits regardless of where and how the individuals go online. The effects of the Internet are now becoming broadly social enough that there is a general awareness that the Internet is not decoration on contemporary society but a challenge to it.28 Being connected is no longer an optional luxury; to engage in the twenty-first century it is essential. Access to the Internet, however, is more than simple connectivity. Successful access requires: an understanding of the ben- efits to going on line, technological comfort, information literacy, ongoing support and training, and the availabil- ity of culturally relevant content. People are at various levels of Internet use, from those eagerly anticipating the next iteration of Web-based applications to those hesitant to open an e-mail account. This user spectrum is likely to continue. Though the starting point may vary depending on the applications that become important to the user in the middle of the spectrum, there will be those out in front and those barely keeping up. The implications of the pervasiveness of the Internet are only beginning to be appreciated and understood. Because of their involvement at the cutting edge of Internet evolution, librarians can help lead the conver- sations. Libraries have always been situated in neutral territory within their communities and closely aligned with the public good. Librarians understand the per- spective of their patrons and are grounded in their local communities. Librarians can therefore advocate effectively for their communities on issues that may not completely be understood or even recognized as matter- ing. Connectivity is an issue supremely important to the library as today access to the full range of information necessitates a broadband connection. Libraries have carved out a role for themselves as a premier Internet access provider in the continually evolving online culture. As noted by Bertot, McClure, and Jaeger, the “role of Internet access provider for the community is ingrained in the social perceptions of public libraries, and public Internet access has become a central part of community perceptions about libraries and the value of the library profession.”29 In times of both economic crisis and technological innovation, there are many unknowns. In part because of these two juxtaposed events, the role of the public library is in flux. Additionally, the network of community orga- nizations that libraries link to is becoming more and more complex. It is a time of great opportunity if the library can articulate its role and frame it in relationship to broader society. Evolving Internet applications require increasing amounts of bandwidth and the trend is to make these bandwidth-heavy applications more and more vital to daily life. One clear path the library community can take however, the supply of Internet resources is unevenly stimulating user demand and the unequal distribution of broadband access has greater potential for significant negative social consequences. Staying the course and fol- lowing a haphazard evolution of broadband adoption, may, in fact, renew valid concerns about a digital divide. Without an intentional and coordinated approach to developing a broadband strategy, its success is likely to fall short of expectations. The question of how to ensure that Internet content is meaningful requires instituting a plan on a very local level, including stakeholders who are familiar with the unique strengths and weaknesses of their community. Strover, in her 2000 article The First Mile, suggests connectivity issues should be viewed from a first mile perspective where the focus is on the person accessing the Internet and her qualitative experience rather than from a last mile perspective which emphasizes ISP, infra- structure, and market concerns.26 Both perspectives are talking about the same physical section of the connection network: the piece that connects the user to the network. According to Strover, distinguishing between the first mile and last mile perspectives is more than an arbitrary argument over semantics. Instead, a first mile perspective represents a shift “in the values and priorities that shape telecommunications policy.”27 By switching to a first mile perspective, connectivity issues immediately take into account the social aspects of what it means to be online. Who will bring this perspective to the table? And how will we ascertain what the best approach to supporting the individual voice should be? The first mile perspective is one the library is inti- mately familiar with as an organization that traditionally advocates for the first mile of all information policies. The library is in a key position in the connectivity debate because of its inclination to speak for the user and to be aware of the unique attributes and needs of its local community. As part of its mission, the library takes into account the distinctive needs of its user community when it designs and implements its services. A natural outgrowth of this practice is to be keenly aware of the demographics of the community at large. The library can leverage its knowledge and understanding to create an even greater positive impact on the social, educational, and economic community development made possible by broadband adoption. To extend the first mile perspective analogy, in the connectivity debate, the library will play the role of the middle mile: the support system that suc- cessfully connects the Internet to the consumer. While the target populations for stimulating demand for broadband are really those in the second tier of users, by advocating for the first mile perspective, the library will be advocating for equitable information policies whose implementation has bearing on the early adopters as well. By stimulating demand for broadband within a 194 iNFOrMAtiON tecHNOlOGY AND liBrAries | DeceMBer 2010 Initiatives,” 538. 12. Ibid., 537–58. 13. Sharon Strover, Gary Chapman, and Jody Waters, “Beyond Community Networking and CTCs: Access, Development, and Public Policy,” Telecommunications Policy 28, no. 7/8 (2004): 465–85. 14. Ibid., 483. 15. Ibid. 16. Wilhelm, “Leveraging Sunken Investments in Communi- cations Infrastructure,” 282. 17. See, for example, the Virginia Broadband Round Table (http://www.otpba.vi.virginia.gov/broadband_roundtable .shtml), the Ohio Broadband Council (http://www.ohiobroad bandcouncil.org/), and the California Broadband Task Force (http://gov.ca.gov/speech/4596. See www.fcc.gov/recovery/ broadband/) for information on broadband initiatives in the American Recovery and Reinvestment Act. 18. Federal Communication Commission, National Broad- band Plan: Connecting America, http://www.broadband.gov/ (accessed Apr. 11, 2010). 19. Ibid. 20. Horrigan, “Broadband: What’s All the Fuss About?” 2. 21. Ricardo Ramirez, “Appreciating the Contribution of Broadband ICT with Rural and Remote Communities: Stepping Stones toward and Alternative Paradigm,” The Information Soci- ety 23 (2007): 86. 22. Ibid., 92. 23. Denise M. Davis, John Carlo Bertot, and Charles, R. McClure, “Libraries Connect Communities: Public Library Funding & Technology Access Study 2007–2008,” 35, http:// www.ala.org/ala/aboutala/offices/ors/plftas/0708/Libraries ConnectCommunities.pdf (accessed Jan. 24, 2009). 24. John Carlo Bertot et al., “Public Libraries and the Internet 2008: Study Results and Findings,” 11, http://www.ii.fsu.edu/ projectFiles/plinternet/2008/Everything.pdf (accessed Jan. 24, 2009). These numbers represent an increase from the previous year’s study which suggests that libraries while trying to meet demand are not able to keep up. 25. Ibid. 26. Sharon Strover, “The First Mile,” The Information Society 16, no. 2 (2000): 151–54. 27. Ibid., 151. 28. Clay Shirky, “Here Comes Everybody: The Power of Organizing without Organizations.” Berkman Center for Inter- net & Society (2008). Video presentation. Available at http:// cyber.law.harvard.edu/interactive/events/2008/02/shirky (Retrieved March 1, 2009). 29. John Carlo Bertot, Charles R. McClure, and Paul T. Jaeger, “The Impacts of Free Public Internet Access on Public Library Patrons and Communities,” Library Quarterly 78, no. 3 (2008): 286, http://www.journals.uchicago.edu.proxy.ulib.iupui.edu/ doi/pdf/10.1086/588445 (accessed Jan. 30, 2009). is to develop its role as the middle mile connecting the increasing breadth of Internet resources to the general public. The broadband debate has moved out of the background of telecommunication policy and into the center of public attention. Now is the moment that calls for an information policy advocate who can represent the end user while understanding the complexity of the other stakeholder perspectives. The library undoubtedly has its own share of stakeholders, but over time it is an institution that has maintained a neutral stance within its community, thereby achieving a unique ability to speak for all parties. Those who speak for the library are able to represent the needs of the public, work with a diverse group of stakeholders, and help negotiate a sustainable strategy for providing broadband Internet access. References and notes 1. Lee Rainie, “2.0 and the Internet World,” Internet Librar- ian 2007, http://www.pewinternet.org/Presentations/2007/20 -and-the-Internet-World.aspx (accessed Mar. 4, 2009). See also John Horrigan, “A Typology of Information and Communication Technology Users,” 2007, www.pewinternet.org/~/media// Files/Reports/2007/PIP_ICT_Typology.pdf.pdf (accessed Feb. 12, 2009). 2. Lawrence Lessig, “Early Creative Commons His- tory, My Version,” video blog post, 2008, http://lessig.org/ blog/2008/08/early_creative_commons_history.html (accessed Jan. 20, 2009). See the relevant passage from 20:53 through 21:50. 3. John Horrigan, “Broadband: What’s All the Fuss About?” 2007, p. 1, http://www.pewinternet.org/~/media/ Files/Reports/2007/BroadBand%20Fuss.pdf.pdf (accessed Feb. 12, 2009). 4. “Job-Seeking in US Public Libraries,” Public Library Fund- ing & Technology Access Study, 2009, http://www.ala.org/ ala/research/initiatives/plftas/issuesbriefs/brief_jobs_july.pdf (accessed Mar. 27, 2009). 5. Ibid. 6. Ibid. 7. Sharon E. Gillett, William H. Lehr, and Carlos Osorio, “Local Government Broadband Initiatives,” Telecommunications Policy 28 (2004): 539. 8. John Horrigan, “Home Broadband Adoption 2008,” 10, http://www.pewinternet.org/~/media//Files/Reports/2008/ PIP_Broadband_2008.pdf (accessed Feb. 12, 2009). 9. Anthony G. Wilhelm, “Leveraging Sunken Investments in Communications Infrastructure: A Policy Perspective from the United States,” The Information Society 19 (2003): 279–86. 10. Horrigan, “Home Broadband Adoption,” 12. 11. Gillett, Lehr, and Osorio, “Local Government Broadband 3130 ---- GeNerAtiNG cOllABOrAtive sYsteMs FOr DiGitAl liBrAries | HilerA et Al. 195 José R. Hilera, Carmen Pagés, J. Javier Martínez, J. Antonio Gutiérrez, and Luis de-Marcos An Evolutive Process to Convert Glossaries into Ontologies dictionary, the outcome will be limited by the richness of the definition of terms included in that dictionary. It would be what is normally called a “lightweight” ontol- ogy,6 which could later be converted into a “heavyweight” ontology by implementing, in the form of axioms, know- ledge not contained in the dictionary. This paper describes the process of creating a lightweight ontology of the domain of software engineering, starting from the IEEE Standard Glossary of Software Engineering Terminology.7 ■■ Ontologies, the Semantic Web, and Libraries Within the field of librarianship, ontologies are already being used as alternative tools to traditional controlled vocabularies. This may be observed particularly within the realm of digital libraries, although, as Krause asserts, objections to their use have often been raised by the digital library community.8 One of the core objections is the difficulty of creating ontologies as compared to other vocabularies such as taxonomies or thesauri. Nonetheless, the semantic richness of an ontology offers a wide range of possibilities concerning indexing and searching of library documents. The term ontology (used in philosophy to refer to the “theory about existence”) has been adopted by the artificial intelligence research community to define a cate- gorization of a knowledge domain in a shared and agreed form, based on concepts and relationships, which may be formally represented in a computer readable and usable format. The term has been widely employed since 2001, when Berners-Lee et al. envisaged the Semantic Web, which aims to turn the information stored on the Web into knowledge by transforming data stored in every webpage into a common scheme accepted in a specific domain.9 To accomplish that task, knowledge must be represented in an agreed-upon and reusable computer-readable format. To do this, machines will require access to structured collections of information and to formalisms which are based on mathematical logic that permits higher levels of automatic processing. Technologies for the Semantic Web have been devel- oped by the World Wide Web Consortium (W3C). The most relevant technologies are RDF (Resource Description This paper describes a method to generate ontologies from glossaries of terms. The proposed method presupposes an evolutionary life cycle based on successive transforma- tions of the original glossary that lead to products of intermediate knowledge representation (dictionary, tax- onomy, and thesaurus). These products are characterized by an increase in semantic expressiveness in comparison to the product obtained in the previous transformation, with the ontology as the end product. Although this method has been applied to produce an ontology from the “IEEE Standard Glossary of Software Engineering Terminology,” it could be applied to any glossary of any knowledge domain to generate an ontology that may be used to index or search for information resources and documents stored in libraries or on the Semantic Web. F rom the point of view of their expressiveness or semantic richness, knowledge representation tools can be classified at four levels: at the basic level (level 0), to which dictionaries belong, tools include defini- tions of concepts without formal semantic primitives; at the taxonomies level (level 1), tools include a vocabulary, implicit or explicit, as well as descriptions of specialized relationships between concepts; at the thesauri level (level 2), tools further include lexical (synonymy, hyperonymy, etc.) and equivalence relationships; and at the reference models level (level 3), tools combine the previous relation- ships with other more complex relationships between concepts to completely represent a certain knowledge domain.1 Ontologies belong at this last level. According to the hierarchic classification above, knowledge representation tools of a particular level add semantic expressiveness to those in the lowest levels in such a way that a dictionary or glossary of terms might develop into a taxonomy or a thesaurus, and later into an ontology. There are a variety of comparative studies of these tools,2 as well as varying proposals for systematically generating ontologies from lower-level knowledge repre- sentation systems, especially from descriptor thesauri.3 This paper proposes a process for generating a termino- logical ontology from a dictionary of a specific knowledge domain.4 Given the definition offered by Neches et al. (“an ontology is an instrument that defines the basic terms and relations comprising the vocabulary of a topic area as well as the rules for combining terms and relations to define extensions to the vocabulary”)5 it is evident that the ontology creation process will be easier if there is a vocabulary to be extended than if it is developed from scratch. If the developed ontology is based exclusively on the José r. Hilera (jose.hilera@uah.es) is Professor, carmen Pagés (carmina.pages@uah.es) is assistant Professor, J. Javier Mar- tínez (josej.martinez@uah.es) is Professor, J. Antonio Gutiér- rez (jantonio.gutierrez@uah.es) is assistant Professor, and luis de-Marcos (luis.demarcos@uah.es) is Professor, Department of computer Science, Faculty of librarianship and Documentation, university of alcalá, Madrid, Spain. 196 iNFOrMAtiON tecHNOlOGY AND liBrAries | DeceMBer 2010 configuration management; data types; errors, faults, and failures; evaluation techniques; instruction types; language types; libraries; microprogramming; operating systems; quality attributes; software documentation; soft- ware and system testing; software architecture; software development process; software development techniques; and software tools.15 In the glossary, entries are arranged alphabetically. An entry may consist of a single word, such as “software,” a phrase, such as “test case,” or an acronym, such as “CM.” If a term has more than one definition, the definitions are numbered. In most cases, noun definitions are given first, followed by verb and adjective definitions as applicable. Examples, notes, and illustrations have been added to clarify selected definitions. Cross-references are used to show a term’s relations with other terms in the dictionary: “contrast with” refers to a term with an opposite or substantially different mean- ing; “syn” refers to a synonymous term; “see also” refers to a related term; and “see” refers to a preferred term or to a term where the desired definition can be found. Figure 2 shows an example of one of the definitions of the glossary terms. Note that definitions can also include Framework),10 which defines a common data model to specify metadata, and OWL (Ontology Web Language),11 which is a new markup language for publishing and sharing data using Web ontologies. More recently, the W3C has presented a proposal for a new RDF-based markup system that will be especially useful in the con- text of libraries. It is called SKOS (Simple Knowledge Organization System), and it provides a model for expressing the basic structure and content of concept schemes, such as thesauri, classification schemes, subject heading lists, taxonomies, folksonomies, and other simi- lar types of controlled vocabularies.12 The emergence of the Semantic Web has created great interest within librarianship because of the new possibili- ties it offers in the areas of publication of bibliographical data and development of better indexes and better displays than those that we have now in ILS OPACs.13 For that rea- son, it is important to strive for semantic interoperability between the different vocabularies that may be used in libraries’ indexing and search systems, and to have com- patible vocabularies (dictionaries, taxonomies, thesauri, ontologies, etc.) based on a shared standard like RDF. There are, at the present time, several proposals for using knowledge organization systems as alternatives to controlled vocabularies. For example, folksonomies, though originating within the Web context, have been proposed by different authors for use within libraries “as a powerful, flexible tool for increasing the user-friendliness and inter- activity of public library catalogs.”14 Authors argue that the best approach would be to create interoperable controlled vocabularies using shared and agreed-upon glossaries and dictionaries from different domains as a departure point, and then to complete evolutive processes aimed at semantic extension to create ontologies, which could then be com- bined with other ontologies used in information systems running in both conventional and digital libraries for index- ing as well as for supporting document searches. There are examples of glossaries that have been transformed into ontologies, such as the Cambridge Healthtech Institute’s “Pharmaceutical Ontologies Glossary and Taxonomy” (http://www.genomicglossaries.com/content/ontolo gies.asp), which is an “evolving terminology for emerging technologies.” ■■ IEEE Standard Glossary of Software Engineering Terminology To demonstrate our proposed method, we will use a real glossary belonging to the computer science field, although it is possible to use any other. The glossary, available in electronic format (PDF), defines approxi- mately 1,300 terms in the domain of software engineering (figure 1). Topics include addressing assembling, compil- ing, linking, loading; computer performance evaluation; Figure 1. Cover of the Glossary document GeNerAtiNG cOllABOrAtive sYsteMs FOr DiGitAl liBrAries | HilerA et Al. 197 4. Define the classes and the class hierarchy 5. Define the properties of classes (slots) 6. Define the facets of the slots 7. Create instances As outlined in the Introduction, the ontology devel- oped using our method is a terminological one. Therefore we can ignore the first two steps in Noy’s and McGuinness’ process as the concepts of the ontology coincide with the terms of the glossary used. Any ontology development process must take into account the basic stages of the life cycle, but the way of organizing the stages can be different in different meth- ods. In our case, since the ontology has a terminological character, we have established an incremental develop- ment process that supposes the natural evolution of the glossary from its original format (dictionary or vocabu- lary format) into an ontology. The proposed life cycle establishes a series of steps or phases that will result in intermediate knowledge representation tools, with the final product, the ontology, being the most semantically rich (figure 4). Therefore this is a product-driven process, in which the aim of every step is to obtain an intermediate product useful on its own. The intermediate products and the final examples associated with the described concept. In the resulting ontology, the examples were included as instances of the corresponding class. In figure 2, it can be seen that the definition refers to another glossary on programming languages (Std 610.13), which is a part of the series of dic- tionaries related to computer science (“IEEE Std 610,” figure 3). Other glossaries which are men- tioned in relation to some references about term definitions are 610.1, 610.5, 610.7, 610.8, and 610.9. To avoid redundant definitions and pos- sible inconsistencies, links must be implemented between ontologies developed from those glossa- ries that include common concepts. The ontology generation process presented in this paper is meant to allow for integration with other ontolo- gies that will be developed in the future from the other glossaries. In addition to the explicit references to other terms within the glossary and to terms from other glos- saries, the textual definition of a concept also has implicit references to other terms. For example, from the phrase “provides features designed to facilitate expression of data structures” included in the definition of the term high order language (figure 2), it is possible to determine that there is an implicit relationship between this term and the term data structure, also included in the glossary. These relationships have been considered in establishing the properties of the concepts in the developed ontology. ■■ Ontology Development Process Many ontology development methods presuppose a life cycle and suggest technologies to apply during the pro- cess of developing an ontology.16 The method described by Noy and McGuinness is helpful when beginning this process for the first time.17 They establish a seven-step process: 1. Determine the domain and scope of the ontology 2. Consider reusing existing ontologies 3. Enumerate important terms in the ontology Figure 2. Example of term definition in the IEEE Glossary Figure 3. IEEE Computer Science Glossaries 610—Standard Dictionary of Computer Terminology 610.1—Standard Glossary of Mathematics of Computing Terminology 610.2—Standard Glossary of Computer Applications Terminology 610.3—Standard Glossary of Modeling and Simulation Terminology 610.4—Standard Glossary of Image Processing Terminology 610.5—Standard Glossary of Data Management Terminology 610.6—Standard Glossary of Computer Graphics Terminology 610.7—Standard Glossary of Computer Networking Terminology 610.8—Standard Glossary of Artificial Intelligence Terminology 610.9—Standard Glossary of Computer Security and Privacy Terminology 610.10—Standard Glossary of Computer Hardware Terminology 610.11—Standard Glossary of Theory of Computation Terminology 610.12—Standard Glossary of Software Engineering Terminology 610.13—Standard Glossary of Computer Languages Terminology high order language (HOL). A programming language that requires little knowledge of the computer on which a program will run, can be translated into several difference machine languages, allows symbolic naming of operations and addresses, provides features designed to facilitate expression of data structures and program logic, and usually results in several machine instructions for each program state- ment. Examples include Ada, COBOL, FORTRAN, ALGOL, PASCAL. Syn: high level language; higher order language; third gen- eration language. Contrast with: assembly language; fifth generation language; fourth generation language; machine language. Note: Specific languages are defined in P610.13 198 iNFOrMAtiON tecHNOlOGY AND liBrAries | DeceMBer 2010 Since there are terms with different meanings (up to five in some cases) in the IEEE Glossary of Software Engineering Terminology, during dictionary development we decided to create different concepts (classes) for the same term, associating a number to these concepts to differentiate them. For example, there are five different definitions for the term test, which is why there are five concepts (Test1–Test5), corresponding to the five meanings of the term: (1) An activity in which a system or compo- nent is executed under specified conditions, the results are observed or recorded, and an evaluation is made of some aspect of the system or component; (2) To conduct an activity as in (1); (3) A set of one or more test cases; (4) A set of one or more test procedures; (5) A set of one or more test cases and procedures. taxonomy The proposed lifecycle establishes a stage for the con- version of a dictionary into a taxonomy, understanding taxonomy as an instrument of concepts categorization, product are a dictionary, which has a formal and computer processed structure, with the terms and their definitions in XML format; a taxonomy, which reflects the hierarchic rela- tionships between the terms; a thesaurus, which includes other relationships between the terms (for example, the synonymy relationship); and, finally, the ontology, which will include the hierarchy, the basic relationships of the the- saurus, new and more complex semantic relationships, and restrictions in form of axioms expressed using description logics.18 The following paragraphs describe the way each of these products is obtained. Dictionary The first step of the proposed development process con- sists of the creation of a dictionary in XML format with all the terms included in the IEEE Standard Glossary of Software Engineering Terminology and their related defini- tions. This activity is particularly mechanical and does not need human intervention as it is basically a transfor- mation of the glossary from its original format (PDF) into a format better suited to the development process. All formats considered for the dictionary are based on XML, and specifically on RDF and RDF schema. In the end, we decided to work with the standards DAML+OIL and OWL,19 though we are not opposed to working with other languages, such as SKOS or XMI,20 in the future. (In the latter case, it would be possible to model the intermediate products and the ontology in UML graphic models stored in xml files.)21 In our project, the design and implementation of all products has been made using an ontology editor. We have used OilEd (with OilViz Plugin) as editor, both because of its simplicity and because it allows the exportation to OWL and DAML formats. However, with future maintenance and testing in mind, we decided to use Protégé (with OWL plugin) in the last step of the process, because this is a more flexible environment with extensible mod- ules that integrate more functionality such as ontology annotation, evaluation, middleware service, query and inference, etc. Figure 5 shows the dictionary entry for “high order language,” which appears in figure 2. Note that the dic- tionary includes only owl:class (or daml:class) to mark the term; rdf:label to indicate the term name; and rdf:comment to provide the definition included in the original glossary. Figure 4. Ontology development process HighOrderLanguage Figure 5. Example of dictionary entry GeNerAtiNG cOllABOrAtive sYsteMs FOr DiGitAl liBrAries | HilerA et Al. 199 example, when analyzing the definition of the term com- piler: “(Is) A computer program that translates programs expressed in a high order language into their machine language equivalent,” it is possible to deduce that com- piler is a subconcept of computer program, which is also included in the glossary.) In addition to the lexical or syn- tactic analysis, it is necessary for an expert in the domain to perform a semantic analysis to complete the develop- ment of the taxonomy. The implementation of the hierarchical relation- ships among the concepts is made using rdfs:subClassOf, regardless of whether the taxonomy is implemented in OWL or DAML format, since both languages specify this type of relationship in the same way. Figure 6 shows an example of a hierarchical relationship included in the definition of the concept pictured in figure 5. thesaurus According to the International Organization for Standardization (ISO), a thesaurus is “the vocabulary of a controlled indexing language, formally organized in order to make explicit the a priori relations between concepts (for example ‘broader’ and ‘narrower’).”25 This definition establishes the lexical units and the semantic relationships between these units as the elements that constitute a the- saurus. The following is a sample of the lexical units: ■■ Descriptors (also called “preferred terms”): the terms used consistently when indexing to represent a con- cept that can be in documents or in queries to these documents. The ISO standard introduces the option of adding a definition or an application note to every term to establish explicitly the chosen meaning. This note is identified by the abbreviation SN (Scope Note), as shown in figure 7. ■■ Non-descriptors (“non-preferred terms”): the syn- onyms or quasi-synonyms of a preferred term. A nonpreferred term is not assigned to documents submitted to an indexing process, but is provided as an entry point in a thesaurus to point to the appropri- ate descriptor. Usually the descriptors are written in capital letters and the nondescriptors in small letters. ■■ Compound descriptors: the terms used to represent complex concepts and groups of descriptors, which allow for the structuring of large numbers of thesau- rus descriptors into subsets called micro-thesauri. In addition to lexical units, other fundamental elements of a thesaurus are semantic relationships between these units. The more common relationships between lexical units are the following: ■■ Equivalence: the relationship between the descrip- tors and the nondescriptors (synonymous and that is, as a systematical classification in a traditional way. As Gilchrist states, there is no consensus on the meaning of terms like taxonomy, thesaurus, or ontology.22 In addi- tion, much work in the field of ontologies has been done without taking advantage of similar work performed in the fields of linguistics and library science.23 This situa- tion is changing because of the increasing publication of works that relate the development of ontologies to the development of “classic” terminological tools (vocabular- ies, taxonomies, and thesauri). This paper emphasizes the importance and useful- ness of the intermediate products created at each stage of the evolutive process from glossary to ontology. The end product of the initial stage is a dictionary expressed as XML. The next stage in the evolutive process (figure 4) is the transformation of that dictionary into a tax- onomy through the addition of hierarchical relationships between concepts. To do this, it is necessary to undertake a lexical- semantic analysis of the original glossary. This can be done in a semiautomatic way by applying natural language processing (NLP) techniques, such as those recommended by Morales-del-Castillo et al.,24 for creat- ing thesauri. The basic processing sequence in linguistic engineering comprises the following steps: (1) incorpo- rate the original documents (in our case the dictionary obtained in the previous stage) into the information sys- tem; (2) identify the language in which they are written, distinguishing independent words; (3) “understand” the processed material at the appropriate level; (4) use this understanding to transform, search, or traduce data; (5) produce the new media required to present the produced outcomes; and finally, (6) present the final outcome to human users by means of the most appropriate periph- eral device—screen, speakers, printer, etc. An important aspect of this process is natural lan- guage comprehension. For that reason, several different kinds of programs are employed, including lemmatizers (which implement stemming algorithms to extract the lexeme or root of a word), morphologic analyzers (which glean sentence information from their constituent ele- ments: morphemes, words, and parts of speech), syntactic analyzers (which group sentence constituents to extract elements larger than words), and semantic models (which represent language semantics in terms of concepts and their relations, using abstraction, logical reasoning, orga- nization and data structuring capabilities). From the information in the software engineering dictionary and from a lexical analysis of it, it is possible to determine a hierarchical relationship when the name of a term contains the name of another one (for example, the term language and the terms programming language and hardware design language), or when expressions such as “is a” linked to the name of another term included in the glossary appear in the text of the term definition. (For 200 iNFOrMAtiON tecHNOlOGY AND liBrAries | DeceMBer 2010 indicating that high order language relates to both assembly and machine languages. The life cycle proposed in this paper (figure 4) includes a third step or phase that transforms the taxonomy obtained in the previous phase into a thesaurus through the incorporation of relationships between the concepts that complement the hierarchical relations included in the taxonomy. Basically, we have to add two types of relation- ships—equivalence and associative, represented in the standard thesauri with UF (and USE) and RT respectively. We will continue using XML to implement this new product. There are different ways of implementing a thesaurus using a language based on XML. For example, Matthews et al. proposed a standard RDF format,26 where as Hall created an ontology in DAML.27 In both cases, the authors modeled the general structure of quasi-synonymous). ISO establishes that the abbrevia- tion UF (Used For) precedes the nondescriptors linked to a descriptor; and the abbreviation USE is used in the opposite case. For example, a thesaurus developed from the IEEE glossary might include a descriptor “high order language” and an equivalence relationship with a nondescriptor “high level language” (figure 7). ■■ Hierarchical: a relationship between two descrip- tors. In the thesaurus one of these descriptors has been defined as superior to the other one. There are no hierarchical relationships between nondescrip- tors, nor between nondescriptors and descriptors. A descriptor can have no lower descriptors or several of them, and no higher descriptors or several of them. According to the ISO standard, hierarchy is expressed by means of the abbreviations BT (Broader Term), to indicate the generic or higher descriptors, and NT (Narrower Term), to indicate the specific or lower descriptors. The term at the head of the hierarchy to which a term belongs can be included, using the abbreviation TT (Top Term). Figure 7 presents these hierarchical relationships. ■■ Associative: a reciprocal relationship that is estab- lished between terms that are neither equivalent nor hierarchical, but are semantically or conceptually associated to such an extent that the link between them should be made explicit in the controlled vocabulary on the grounds that it may suggest additional terms for use in indexing or retrieval. It is generally indicated by the abbreviation RT (Related Term). There are no associative relationships between nondescriptors and descriptors, or between descriptors already linked by a hierarchical relation. It is possible to establish associative relationships between descriptors belonging to the same or differ- ent category. The associative relationships can be of very different types. For example, they can represent causality, instrumentation, location, similarity, origin, action, etc. Figure 7 shows two associative relations, .. HIGH ORDER LANGUAGE (descriptor) SN A programming language that... UF High level language (no-descriptor) UF Third generation language (no-descriptor) TT LANGUAGE BT PROGRAMMING LANGUAGE NT OBJECT ORIENTED LANGUAGE NT DECLARATIVE LANGUAGE RT ASSEMBLY LANGUAGE (contrast with) RT MACHINE LANGUAGE (contrast with) .. High level language USE HIGH ORDER LANGUAGE .. Third generation language USE HIGH ORDER LANGUAGE .. Figure 7. Fragment of a thesaurus entry Figure 6. Example of taxonomy entry ... GeNerAtiNG cOllABOrAtive sYsteMs FOr DiGitAl liBrAries | HilerA et Al. 201 terms. For example: . Or using the glossary notation: . ■■ The rest of the associative relationships (RT) that were included in the thesaurus correspond to the cross-references of the type “Contrast with” and “See also” that appear explicitly in the IEEE glossary. ■■ Neither compound descriptors nor groups of descrip- tors have been implemented because there is no such structure in the glossary. Ontology Ding and Foo state that “ontology promotes standard- ization and reusability of information representation through identifying common and shared knowledge. Ontology adds values to traditional thesauri through deeper semantics in digital objects, both conceptually, relationally and machine understandably.”29 This seman- tic richness may imply deeper hierarchical levels, richer relationships between concepts, the definition of axioms or inference rules, etc. The final stage of the evolutive process is the transfor- mation of the thesaurus created in the previous stage into an ontology. This is achieved through the addition of one or more of the basic elements of semantic complexity that differentiates ontologies from other knowledge represen- tation standards (such as dictionaries, taxonomies, and thesauri). For example: ■■ Semantic relationships between the concepts (classes) of the thesaurus have been added as properties or ontology slots. ■■ Axioms of classes and axioms of properties. These are restriction rules that are declared to be sat- isfied by elements of ontology. For example, to establish disjunctive classes ( ), have been defined, and quantification restrictions (existential or universal) and cardinality restrictions in the relation- ships have been implemented as properties. Software based on techniques of linguistic analysis has been developed to facilitate the establishment of the properties and restrictions. This software analyzes the definition text for each of the more than 1,500 glossary terms (in thesaurus format), isolating those words that a thesaurus from classes (rdf:Class or daml:class) and properties (rdf:Property or daml:ObjectProperty). In the first case they proposed five classes: ThesaurusObject, Concept, TopConcept, Term, ScopeNote; and several properties to implement the relations, like hasScope- Note (SN), IsIndicatedBy, PreferredTerm, UsedFor (UF), ConceptRelation, BroaderConcept (BT), NarrowerConcept (NT), TopOfHierarchy (TT) and isRelatedTo (RT). Recently the W3C has developed the SKOS specifica- tion, created to define knowledge organization schemes. In the case of thesauri, SKOS includes specific tags, such as skos:Concept, skos:scopeNote (SN), skos:broader (BT), skos:narrower (NT), skos:related (RT), etc., that are equivalent to those listed in the previous paragraph. Our specification does not make any statement about the formal relationship between the class of SKOS concept schemes and the class of OWL ontologies, which will allow different design patterns to be explored for using SKOS in combination with OWL. Although any of the above-mentioned formats could be used to implement the thesaurus, given that the end- product of our process is to be an ontology, our proposal is that the product to be generated during this phase should have a format compatible with the final ontology and with the previous taxonomy. Therefore a minimal number of changes will be carried out on the product created in the previous step, resulting in a knowledge representation tool similar to a thesaurus. That tool does not need to be modified during the following (final) phase of transformation into an ontology. Nevertheless, if for some reason it is necessary to have the thesaurus in one of the other formats (such as SKOS), it is possible to apply a simple XSLT transformation to the product. Another option would be to integrate a thesaurus ontology, such as the one proposed by Hall,28 with the ontology represent- ing the IEEE glossary. In the thesaurus implementation carried out in our project, the following limitations have been considered: ■■ Only the hierarchical relationships implemented in the taxonomy have been considered. These include relationsips of type “is-a,” that is, generalization rela- tionships or type–subset relationships. Relationships that can be included in the thesaurus marked with TT, BT, and NT, like relations of type “part of” (that is, partative relationships) have not been considered. Instead of considering them as hierarchical relation- ships, the final ontology includes the possibility of describing classes as a union of classes. ■■ The relationships of synonymy (UF and USE) used to model the cross-references in the IEEE glossary (“Syn” and “See,” respectively) were implemented as equiv- alent terms, that is, as equivalent axioms between classes (owl:equivalentClass or daml:sameClassAs), with inverse properties to reflect the preference of the 202 iNFOrMAtiON tecHNOlOGY AND liBrAries | DeceMBer 2010 match the name of other glossary terms (or a word in the definition text of other glossary terms). The isolated words will then be candidates for a relationship between both of them. (Figure 8 shows the candidate properties obtained from the Software Engineering glossary.) The user then has the option of creating relationships with the identified candidate words. The user must indicate, for every relationship to be created, the restriction type that it represents as well as existential or universal quan- tification or cardinality (minimum or maximum). After confirming this information, the program updates the file containing the ontology (OWL or DAML), adding the property to the class that represents the processed term. Figure 9 shows an example of the definition of two prop- erties and its application to the class HighOrderLanguage: a property Express with existential quantification over the class DataStructure to indicate that a language must repre- sent at least one data structure; and a property TranslateTo of universal type to indicate that any high-level language is translated into machine language (MachineLanguage). ■■ Results, Conclusions, and Future Work The existence of ontologies of specific knowledge domains (software engineering in this case) facilitates the process of finding resources about this discipline on the Semantic Web and in digital libraries, as well as the reuse of learn- ing objects of the same domain stored in repositories available on the Web.30 When a new resource is indexed in a library catalog, a new record that conforms to the ontology conceptual data model may be included. It will be necessary to assign its properties according to the concept definition included in the ontology. The user may later execute semantic queries that will be run by the search system that will traverse the ontology to identify the concept in which the user was interested to launch a wider query including the resources indexed under the concept. Ontologies, like the one that has been “evolved,” may also be used in an open way to index and search for resources on the Web. In that case, however, semantic search engines such as Swoogle (http://swoogle.umbc .edu/), are required in place of traditional syntactic search engines, such as Google. The creation of a complete ontology of a knowledge domain is a complex task. In the case of the domain presented in this paper, that of software engineering, although there have been initiatives toward ontology cre- ation that have yielded publications by renowned authors in the field,31 a complete ontology has yet to be created and published. This paper has described a process for developing a modest but complete ontology from a glossary of ter- minology, both in OWL format and DAML+OIL format, accept access accomplish account achieve adapt add adjust advance affect aggregate aid allocate allow allow symbolic naming alter analyze apply approach approve arrangement arrive assign assigned by assume avoid await begin break bring broke down builds call called by can be can be input can be used as can operate in cannot be usedas carry out cause change characterize combine communicate compare comply comprise conduct conform consist constrain construct contain contains no contribute control convert copy correct correspond count create debugs decompiles decomposedinto decrease define degree delineate denote depend depict describe design designate detect determine develop development direct disable disassembles display distribute divide document employ enable encapsulate encounter ensure enter establish estimate establish evaluate examine exchange execute after execute in executes expand express express as extract facilitate fetch fill follow fulfil generate give give partial given constrain govern have have associated have met have no hold identify identify request ignore implement imply improve incapacitate include incorporate increase indicate inform initiate insert install intend interact with interprets interrelate investigate invokes is is a defect in is a form of is a method of is a mode of is a part is a part of is a sequence is a sequenceof is a technique is a techniqueof is a type is a type of is ability is activated by is adjusted by is applied to is based is called by is composed is contained is contained in is establish is established is executed after is executed by is incorrect is independent of is manifest is measured in is not is not subdivided in is part is part of is performed by is performed on is portion is process by is produce by is produce in is ratio is represented by is the output is the result of is translated by is type is used is used in isolate know link list load locate maintain make make up may be measure meet mix modify monitors move no contain no execute no relate no use not be connected not erase not fill not have not involve not involving not translate not use occur occur in occur in a operate operatewith optimize order output parses pas pass test perform permit permitexecute permit the execution pertaining place preclude predict prepare prescribe present present for prevent preventaccessto process produce produce no propose provide rank reads realize receive reconstruct records recovery refine reflect reformat relate relation release relocates remove repair replace represent request require reserve reside restore restructure result resume retain retest returncontrolto reviews satisfy schedule send server set share show shutdown specify store store in structure submission of supervise supports suppress suspend swap synchronize take terminate test there are no through throughout transfer transform translate transmit treat through understand update use use in use to utilize value verify work in writes Figure 8. Candidate properties obtained from the linguistic analysis of the Software Engineering glossary GeNerAtiNG cOllABOrAtive sYsteMs FOr DiGitAl liBrAries | HilerA et Al. 203 to each term.) We defined 324 properties or relationships between these classes. These are based on a semiauto- mated linguistic analysis of the glossary content (for example, Allow, Convert, Execute, OperateWith, Produces, Translate, Transform, Utilize, WorkIn, etc.), which will be refined in future versions. The authors’ aim is to use this ontology, which we have called OntoGLOSE (Ontology GLossary Software Engineering), to unify the vocabulary. OntoGLOSE will be used in a more ambitious project, whose purpose is the development of a complete ontology in software engi- neering from the SWEBOK Guide.32 Although this paper has focused on this ontology, the method that has been described may be used to generate an ontology from any dictionary. The flexibility that OWL permits for ontology description, along with its compat- ibility with other RDF-based metadata languages, makes possible interoperability between ontologies and between ontologies and other controlled vocabularies and allows for the building of merged representations of multiple knowledge domains. These representations may eventu- ally be used in libraries and repositories to index and search for any kind of resource, not only those related to the original field. ■■ Acknowledgments This research is co-funded by the Spanish Ministry of Industry, Tourism and Commerce PROFIT program (grant TSI-020100-2008-23). The authors also want to acknowledge support from the TIFyC research group at the University of Alcala. References and Notes 1. M. Dörr et al., State of the Art in Content Standards (Amster- dam: OntoWeb Consortium, 2001). 2. D. Soergel, “The Rise of Ontologies or the Reinvention of Classification,” Journal of the American Society for Information Science 50, no. 12 (1999): 1119–20; A. Gilchrist, “Thesauri, Tax- onomies and Ontologies—An Etymological Note,” Journal of Documentation 59, no. 1 (2003): 7–18. 3. B. J. Wielinga et al., “From Thesaurus to Ontology,” Pro- ceedings of the 1st International Conference on Knowledge Capture (New York: ACM, 2001): 194–201: J. Qin and S. Paling, “Con- verting a Controlled Vocabulary into an Ontology: The Case of GEM,” Information Research 6 (2001): 2. 4. According to Van Heijst, Schereiber, and Wielinga, ontolo- gies can be classified as terminological ontologies, information ontologies, and knowledge modeling ontologies; terminological ontologies specify the terms that are used to represent knowl- edge in the domain of discourse, and they are in use principally to unify vocabulary in a certain domain. G. Van Heijst, A. T. which is ready to use in the Semantic Web. As described at the opening of this article, our aim has been to create a lightweight ontology as a first version, which will later be improved by including more axioms and relationships that increase its semantic expressiveness. We have tried to make this first version as tailored as possible to the initial glossary, knowing that later versions will be improved by others who might take on the work. Such improvements will increase the ontology’s utility, but will make it a less- faithful representation of the IEEE glossary from which it was derived. The ontology we have developed includes 1,521 classes that correspond to the same number of concepts represented in the IEEE glossary. (Included in this num- ber are the different meanings that the glossary assigns ... Figure 9. Example of ontology entry 204 iNFOrMAtiON tecHNOlOGY AND liBrAries | DeceMBer 2010 20. W3C, SKOS; Object Management Group, XML Metadata Interchange (XMI), 2003, http://www.omg.org/technology/doc- uments/formal/xmi.htm (accessed Oct. 5, 2009). 21. UML (Unified Modeling Language) is a standardized general-purpose modeling language (http://www.uml.org). Nowadays, different UML plugins for ontologies’ editors exist. These plugins allow working with UML graphic models. Also, it is possible to realize the UML models with a CASE tool, to export them to XML format, and to transform them to the ontol- ogy format (for example, OWL) using a XSLT sheet, as the one published in D. Gasevic, “UMLtoOWL: Converter from UML to OWL,” http://www.sfu.ca/~dgasevic/projects/UMLtoOWL/ (accessed Oct. 5, 2009). 22. Gilchrist, “Thesauri, Taxonomies and Ontologies.” 23. Soergel, “The Rise of Ontologies or the Reinvention of Classification.” 24. J. M. Morales-del-Castillo et al., “A Semantic Model of Selective Dissemination of Information for Digital Libraries,” Information Technology & Libraries 28, no. 1 (2009): 22–31. 25. International Standards Organization, ISO 2788:1986 Doc- umentation—Guidelines for the Establishment and Develop- ment of Monolingual Thesauri (Geneve: International Standards Organization, 1986). 26. B. M. Matthews, K. Miller, and M. D. Wilson, “A Thesau- rus Interchange Format in RDF,” 2002, http://www.w3c.rl.ac .uk/SWAD/thes_links.htm (accessed Feb. 10, 2009). 27. M. Hall, “CALL Thesaurus Ontology in DAML,” Dynam- ics Research Corporation, 2001, http://orlando.drc.com/daml/ ontology/CALL-Thesaurus (accessed Oct. 5, 2009). 28. Ibid. 29. Y. Ding and S. Foo, “Ontology Research and Develop- ment. Part 1—A Review of Ontology Generation,” Journal of Information Science 28, no. 2 (2002): 123–36. See also B. H. Kwas- nik, “The Role of Classification in Knowledge Representation and Discover,” Library Trends 48 (1999): 22–47. 30. S. Otón et al., “Service Oriented Architecture for the Imple- mentation of Distributed Repositories of Learning Objects,” International Journal of Innovative Computing, Information & Con- trol (2010), forthcoming. 31. O. Mendes and A. Abran, “Software Engineering Ontol- ogy: A Development Methodology,” Metrics News 9 (2004): 68–76; C. Calero, F. Ruiz, and M. Piattini, Ontologies for Software Engineering and Software Technology (Berlin: Springer, 2006). 32. IEEE, Guide to the Software Engineering Body of Knowledge (SWEBOK) (Los Alamitos, Calif.: IEEE Computer Society, 2004), http:// www.swebok.org (accessed Oct. 5, 2009). Schereiber, and B. J. Wielinga, “Using Explicit Ontologies in KBS Development,” International Journal of Human & Computer Studies 46, no. 2/3 (1996): 183–292. 5. R. Neches et al., “Enabling Technology for Knowledge Sharing,” AI Magazine 12, no. 3 (1991): 36–56. 6. O. Corcho, F. Fernández-López, and A. Gómez-Pérez, “Methodologies, Tools and Languages for Buildings Ontologies. Where Is Their Meeting Point?” Data & Knowledge Engineering 46, no. 1 (2003): 41–64. 7. Intitute of Electrical and Electronics Engineers (IEEE), IEEE Std 610.12-1990(R2002): IEEE Standard Glossary of Software Engineering Terminology (Reaffirmed 2002) (New York: IEEE, 2002). 8. J. Krause, “Semantic Heterogeneity: Comparing New Semantic Web Approaches with those of Digital Libraries,” Library Review 57, no. 3 (2008): 235–48. 9. T. Berners-Lee, J. Hendler, and O. Lassila, “The Semantic Web,” Scientific American 284, no. 5 (2001): 34–43. 10. World Wide Web Consortium (W3C), Resource Description Framework (RDF): Concepts and Abstract Syntax, W3C Recommen- dation 10 February 2004, http://www.w3.org/TR/rdf-concepts/ (accessed Oct. 5, 2009). 11. World Wide Web Consortium (W3C), Web Ontology Lan- guage (OWL), 2004, http://www.w3.org/2004/OWL (accessed Oct. 5, 2009). 12. World Wide Web Consortium (W3C), SKOS Simple Knowledge Organization System, 2009, http://www.w3.org/ TR/2009/REC-skos-reference-20090818/ (accessed Oct. 5, 2009). 13. M. M. Yee, “Can Bibliographic Data be Put Directly onto the Semantic Web?” Information Technology & Libraries 28, no. 2 (2009): 55-80. 14. L. F. Spiteri, “The Structure and Form of Folksonomy Tags: The Road to the Public Library Catalog,” Information Technology & Libraries 26, no. 3 (2007): 13–25. 15. Corcho, Fernández-López, and Gómez-Pérez, “Method- ologies, Tools and Languages for Buildings Ontologies.” 16. IEEE, IEEE Std 610.12-1990(R2002). 17. N. F. Noy and D. L. McGuinness, “Ontology Develop- ment 101: A Guide to Creating Your First Ontology,” 2001, Stan- ford University, http://www-ksl.stanford.edu/people/dlm/ papers/ontology-tutorial-noy-mcguinness.pdf (accessed Sept 10, 2010). 18. D. Baader et al., The Description Logic Handbook (Cam- bridge: Cambridge Univ. Pr., 2003). 19. World Wide Web Consortium, DAML+OIL Reference Description, 2001, http://www.w3.org/TR/daml+oil-reference (accessed Oct. 5, 2009); W3C, OWL. 3131 ---- BriDGiNG tHe GAP: selF-DirecteD stAFF tecHNOlOGY trAiNiNG | QuiNNeY, sMitH, AND GAlBrAitH 205 Kayla L. Quinney, Sara D. Smith, and Quinn Galbraith Bridging the Gap: Self-Directed Staff Technology Training of HBLL patrons. As anticipated, results indicated that students frequently use text messages, social networks, blogs, etc., while fewer staff members use these technolo- gies. For example, 42 percent of the students reported that they write a blog, while only 26 percent of staff and fac- ulty do so. Also, 74 percent of the students and only 30 percent of staff and faculty indicated that they belonged to a social network. After concluding that staff and faculty were not as connected as their student patrons are to tech- nology, library administration developed the Technology Challenge to help close this gap. The Technology Challenge was a self-directed training program requiring participants to explore new technol- ogy on their own by spending at least fifteen minutes each day learning new technology skills. This program was successful in promoting lifelong learning by teach- ing technology applicable to the work and home lives of HBLL employees. We will first discuss literature that shows how technology training can help academic librar- ians connect with student patrons, and then we will describe the Technology Challenge and demonstrate how it aligns with the principles of self-directed learning. The training will be evaluated by an analysis of the results of two surveys given to participants before and after the Technology Challenge was implemented. ■■ Library 2.0 and “Librarian 2.0” HBLL wasn’t the first to notice the gap between librar- ians and students, McDonald and Thomas noted that “Gaps have materialized,” and library technology does not always “provide certain services, resources, or possibilities expected by emerging user populations like the millennial generation.”1 College students, who grew up with technol- ogy, are “digital natives,” while librarians, many having learned technology later in life, are “digital immigrants.”2 The “digital natives” belong to the Millennial Generation, described by Shish and Allen as a generation of “learners raised on and confirmed experts in the latest, fastest, cool- est, greatest, newest electronic technologies.”3 According to Sweeny, when students use libraries, they expect the same “flexibility, geographic independence, speed of response, time shifting, interactivity, multitasking, and time savings” provided by the technology they use daily.4 Students are Undergraduates, as members of the Millennial Generation, are proficient in Web 2.0 technology and expect to apply these technologies to their coursework—including schol- arly research. To remain relevant, academic libraries need to provide the technology that student patrons expect, and academic librarians need to learn and use these tech- nologies themselves. Because leaders at the Harold B. Lee Library of Brigham Young University (HBLL) perceived a gap in technology use between students and their staff and faculty, they developed and implemented the Technology Challenge, a self-directed technology training program that rewarded employees for exploring technology daily. The purpose of this paper is to examine the Technology Challenge through an analysis of results of surveys given to participants before and after the Technology Challenge was implemented. The program will also be evaluated in terms of the adult learning theories of andragogy and self- directed learning. HBLL found that a self-directed approach fosters technology skills that librarians need to best serve students. In addition, it promotes lifelong learning hab- its to keep abreast of emerging technologies. This paper offers some insights and methods that could be applied in other libraries, the most valuable of which is the use of self-directed and andragogical training methods to help academic libraries better integrate modern technologies. L eaders at the Harold B. Lee Library of Brigham Young University (HBLL) began to suspect a need for technology training when employees were asked during a meeting if they owned an iPod or MP3 player. Out of the twenty attendees, only two raised their hands—one of whom worked for IT. Perceiving a technol- ogy gap between HBLL employees and student patrons, library leaders began investigating how they could help faculty and staff become more proficient with the tech- nologies that student patrons use daily. To best serve student patrons, academic librarians need to be proficient with the technologies that student patrons expect. HBLL found that a self-directed learning approach to staff tech- nology training not only fosters technology skills, but also promotes lifelong learning habits. To further examine the technology gap between librar- ians and students, the HBLL staff, faculty, and student employees were given a survey designed to explore generational differences in media and technology use. Student employees were surveyed as representatives of the larger student body, which composes the majority Kayla l. Quinney (quinster27@gmail.com) is research Spe- cialist, sara D. smith (saradsmith@gmail.com) is research Specialist, and Quinn Galbraith (quinn_galbraith@byu.edu) is library human resource Training and Development Manager, Brigham young university library, Provo, utah. 206 iNFOrMAtiON tecHNOlOGY AND liBrAries | DeceMBer 2010 2.0,” a program that “focuses on self-exploration and encourages staff to learn about new technologies on their own.”24 Learning 2.0 encouraged library staff to explore Web 2.0 tools by completing twenty-three exercises involving new technologies. PLCMC’s program has been replicated by more than 250 libraries and organizations worldwide,25 and several libraries have written about their experiences, including academic26 and public libraries.27 These programs—and the Technology Challenge implemented by HBLL—integrate the theories of adult learning. In the 1960s and 1970s, Malcolm Knowles intro- duced the theory of andragogy to describe the way adults learn.28 Knowles described adults as learners who (1) are self-directed, (2) use their experiences as a resource for learning, (3) learn more readily when they experience a need to know, (4) seek immediate application of knowl- edge, and (5) are best motivated by internal rather than external factors.29 The theory and practice of self-directed learning grew out of the first learning characteristic and assumes that adults prefer self-direction in determining and achieving learning goals, and therefore learners exer- cise independence in determining how and what they learn.30 These theories have had a considerable effect on adult education practice31 and employee development programs.32 When adults participate in trainings that align with the assumptions of andragogy, they are more likely to retain and apply what they have learned.33 ■■ The Technology Challenge HBLL’s Technology Challenge is similar to Learning 2.0 in that it encourages self-directed exploration of Web 2.0 technologies, but it differs in that participants were even more self-directed in exploration and that they were asked to participate daily. These features encouraged more self-directed learning in areas of participant interest as well as habit formation. It is not our purpose to critique Learning 2.0, but to provide some evidence and analysis to demonstrate the success of hands-on, self-directed training approaches and to suggest other ways for librar- ies to apply self-directed learning to technology training. The Technology Challenge was implemented from June 2007 to January 2008. HBLL staff included 175 full-time employees, 96 of whom participated in the challenge. (The student employees were not involved.) Participants were asked to spend fifteen minutes each day learning a new technology skill. HBLL leaders used rewards to make the program enjoyable and to motivate participation: For each minute spent learning technology, participants earned one point, and when one thousand points were earned, the participant would receive a gift certificate to the campus bookstore. Staff and faculty participated and tracked their progress through an online masters of “informal learning”; that is, they are accus- tomed to easily and quickly gathering information relevant to their lives from the internet and from friends. Shish and Allen claimed that Millennials prefer “interactive, hyper-linked multimedia over the traditional static, text- oriented printed items. They want a sense of control; they need experiential and collaborative approaches rather than formal, librarian-guided, library-centric services.”5 These students arrive on campus expecting “to handle the chal- lenges of scholarly research” using similar methods and technologies.6 Interactive technologies such as blogs, wikis, streaming media applications, and social networks, are referred to as “Web 2.0.” Abram argued that Web 2.0 technology “could be useful in an enterprise, institutional research, or com- munity environment, and could be driven or introduced by the library.”7 “Library 2.0” is a concept referring to a library’s integration of these technologies; it is essentially the use of “Web 2.0 opportunities in a library environ- ment.”8 Manesss described Library 2.0 is user-centered, social, innovative, and provider of a multimedia experi- ences.9 It is a community that “blurs the line between librarian and patron, creator and consumer, authority and novice.”10 Libraries have been using Web 2.0 tech- nology such as blogs,11 wikis,12 and social networks13 to better serve and connect with patrons. Blogs allow librar- ies to “provide news, information and links to internet resources,”14 and wikis create online study groups15 and “build a shared knowledge repository.”16 Social networks can be particularly useful in connecting with undergradu- ate students: Millennials use technology to collaborate and make collective decisions,17 and libraries can capitalize on this tendency by using social networks, which for students would mean, as Bates argues, “an informational equiva- lent of the reliance on one’s Facebook friends.”18 Students expect Library 2.0—and as libraries integrate new technologies, the staff and faculty of academic librar- ies need to become “Librarian 2.0.” According to Abram, Librarian 2.0 understands users and their needs “in terms of their goals and aspirations, workflows, social and con- tent needs, and more. Librarian 2.0 is where the user is, when the user is there.”19 The modern library user “needs the experience of the Web . . . to learn and succeed,”20 and the modern librarian can help patrons transfer technology skills to information seeking. Librarian 2.0 is prepared to help patrons familiar with Web 2.0 to “lever- age these [technologies] to make a difference in reaching their goals.”21 Therefore staff and faculty “must become adept at key learning technologies themselves.”22 Stephen Abram asked, “Are the expectations of our users increas- ing faster than our ability to adapt?”23 and this same concern motivated HBLL and other institutions to initiate staff technology training programs. The Public Library of Charlotte and Mecklenburg County of North Carolina (PLCMC) developed “Learning BriDGiNG tHe GAP: selF-DirecteD stAFF tecHNOlOGY trAiNiNG | QuiNNeY, sMitH, AND GAlBrAitH 207 their ability to learn and use technology. To be eligible to receive the gift card, participants were required to take this exit survey. Sixty-four participants, all of whom had met or exceeded the thousand-point goal, chose to complete this survey, so the results of this survey repre- sent the experiences of 66 percent of the participants. Of course, if those who had not completed the Technology Challenge had taken the survey the results may have been different, but the results do show how those who chose to actively participate reacted to this training program. The survey included both quantifiable and open-ended questions (see appendix B for survey results and a list of the open-ended questions). The survey results, along with an analysis of the structure of the Challenge itself, demonstrates that the program aligns with Knowles’s five principles of andragogy to successfully help employees develop both technology skills and learning habits. self-direction The Technology Challenge was self-directed because it gave participants the flexibility to select which tasks and challenges they would complete. Garrison wrote that in a self-directed program, “learners should be provided with choices of how they wish to proactively carry out the learning process. Material resources should be available, approaches suggested, flexible pacing accommodated, and questioning and feedback provided when needed.”34 HBLL provided a variety of challenges and training sessions related to various technologies. Technology Challenge participants were given the independence to choose which learning methods to use, including which training sessions to attend and which challenges to complete. According to the exit survey, the most popular training methods were small, instructor-led groups, followed by self-learning through reading books and articles. Group training sessions were organized by HBLL leadership and addressed topics such as Microsoft Office, RSS feeds, computer organization skills, and multimedia software. Other learning methods included web tutorials, DVDs, large group discussions, and one-on-one tutoring. The group training classes preferred by HBLL employees may be considered more teacher-directed than self-directed, but the Technology Challenge was self-directed as a whole in that learners were given the opportunity to choose what they learned and how they learned it. The structure of the Technology Challenge allowed participants to set their own pace. Staff and faculty were given several months to complete the challenge and were responsible to pace themselves. On the exit survey, one participant commented: “If I didn’t get anything done one week, there wasn’t any pressure.” Another enjoyed flexibility in deciding when and where to complete the tasks: “I liked being able to do the challenge anywhere. When I had a few minutes between appointments, classes, board game called “Techopoly.” Participation was voluntary, and staff and faculty were free to choose which tasks and challenges they would complete. Tasks fell into one of four categories: software, hardware, library technology, and the internet. Participants were required to complete one hundred points in each category, but beyond that, were able to decide how to spend their time. Examples of tasks included attending workshops, exploring online tutori- als, and reading books or articles about a relevant topic. For each hundred points earned, participants could com- plete a mini-challenge, which included reading blogs or e-books, listening to podcasts, or creating a photo CD (see appendix A for a more complete list). Participants who completed fifteen out of twenty possible challenges were entered into a drawing for another gift certificate. Before beginning the Challenge, all participants were surveyed about their current use of technology. On this survey, they indicated that they were most uncomfortable with blogs, wikis, image editors, and music players. These results provided a focus for Technology Challenge trainings and mini-challenges. While not all of these technologies may apply directly to their jobs, 60 percent indicated that they were interested in learning them. Forty-four percent reported that time was the greatest impediment to learn- ing new technology; therefore the daily fifteen-minute requirement was introduced with the hope that it was small enough to be a good incentive to participate but substantial enough to promote habit formation and allow employees enough time to familiarize themselves with the technology. Although some productivity may have been lost due to the time requirement (especially in cases where participants may have spent more than the required time), library leaders felt that technology training was an investment in HBLL employees and that, at least for a few months, it was worth any potential loss in productiv- ity. Because participants could chose how and when they learned technology, they could incorporate the Challenge into their work schedules according to their own needs, interests, and time constraints. Of ninety-six participants, sixty-six reached or exceeded the thousand-point goal, and eight participants earned more than two thousand points. Ten participants earned between five hundred and one thousand points, and another six earned between one hundred and five hundred. Although not all participants completed the Challenge, most were involved to some extent in learning technology during this time. ■■ The Technology Challenge and Adult Learning After finishing the Challenge, participants took an exit survey to evaluate the experience and report changes in 208 iNFOrMAtiON tecHNOlOGY AND liBrAries | DeceMBer 2010 were willing, even excited, to learn technology skills: 37 percent “agreed” and 60 percent “strongly agreed” that they were interested in learning new technology. Their desire to learn was cultivated by the survey itself, which helped them recognize and focus on this interest, and the Challenge provided a way for employees to channel their desire to learn technology. immediate Application Learners need to see an opportunity for immediate application of their knowledge: Ota et al. explained that “they want to learn what will help them perform tasks or deal with problems they confront in everyday situations and those presented in the context of application to real life.”39 Because of the need for immediate application, the Technology Challenge encouraged staff and faculty to learn technology skills directly related to their jobs—as well as technology that is applicable to their personal or home lives. HBLL leaders hoped that as staff became more comfortable with technology in general, they would be motivated to incorporate more complex technologies into their work. Here is one example of how the Technology Challenge catered to adult learners’ need to apply what they learn: Before designing the Challenge, HBLL held a training session to teach employees the basics of Photoshop. Even though attendees were on the clock, the turnout was discouraging. Library leaders knew they needed to try something new. In the revamped Photoshop workshop that was offered as part of the Technology Challenge, attendees brought family photos or film and learned how to edit and experiment with their photos and burn DVD copies. This time, the class was full: the same computer program that before drew only a few people was now exciting and useful. Focusing on employees’ personal interests in learning new software, instead of just on teaching the software, better motivated staff and faculty to attend the training. Motivation As stated by Ota et al., adults are motivated by external factors but are usually more motivated by internal fac- tors: “Adults are responsive to some external motivators (e.g., better job, higher salaries), but the most potent motivators are internal (e.g., desire for increased job satisfaction, self-esteem).”40 On the entrance survey, par- ticipants were given the opportunity to comment on their reasons for participating in the Challenge. The gift card, an example of an external motivation, was frequently cited as an important motivation. But many also com- mented on more internal motivations: “It’s important to my job to stay proficient in new technologies and I’d like to stay current”; “I feel that I need to be up-to-date or meetings I could complete some of the challenges.” Employees could also determine how much or how little of the Challenge they wanted to complete: many reached well over the thousand-point goal, while others fell a little short. Participants began at different skill levels, and thus could use the time and resources allotted to explore basic or more advanced topics according to their needs and interests. Garrison had noted the importance of providing resources and feedback in self-directed learning.35 The Techopoly website provided resources (such as specific blogs or websites to visit) and instructions on how to use and access technology within the library. HBLL also hired a student to assist staff and faculty one-on-one by explain- ing answers to their questions about technology and teaching other skills he thought may be relevant to their initial problem. The entrance and exit surveys provided opportunities for self-reflection and self-evaluation by questioning the participants’ use of technology before the Challenge and asking them to evaluate their proficiency in technology after the Challenge. use of experience The use of experience as a source of learning is impor- tant to adult learners: “The richest resource for learning resides in adults themselves; therefore, tapping into their experiences through experiential techniques (discussions, simulations, problem-solving activities, or case methods) is beneficial.”36 The small-group discussions and one-on- one problem solving made available to HBLL employees certainly fall into these categories. Small-group classes are one of the best ways to encourage adults to share and validate their experiences, and doing so increases retention and application of new information.37 The trainings and challenges encouraged participants to make use of their work and personal experiences by connecting the topic to work or home application. For example, one session discussed how blogs relate to libraries, and another helped participants learn Adobe Photoshop skills by editing per- sonal photographs. Need to Know Adult learners are more successful when they desire and recognize a need for new knowledge or skills. The role of a trainer is to help learners recognize this “need to know” by “mak[ing] a case for the value of learning.”38 HBLL used the generational survey and presurvey to develop a need and desire to learn. The results of the generational survey, which demonstrated a gap in technology use between librarians and students, were presented and discussed at a meeting held before the initiation of the Technology Challenge to help staff and faculty under- stand why it was important to learn 2.0 technology. Results of the presurvey showed that staff and faculty BriDGiNG tHe GAP: selF-DirecteD stAFF tecHNOlOGY trAiNiNG | QuiNNeY, sMitH, AND GAlBrAitH 209 statistical reports or working with colleagues from other libraries.” ■■ “I learned how to set up a server that I now maintain on a semi-regular basis. I learned a lot about SFX and have learned some Perl programming language as well that I use in my job daily as I maintain SFX.” ■■ “The new OCLC client was probably the most sig- nificant. I spent a couple of days in an online class learning to customize the client, and I use what I learned there every single day.” ■■ “I use Google docs frequently for one of the projects I am now working on.” Participants also indicated weaknesses in the Technology Challenge. Almost 20 percent of those who completed the Challenge reported that it was too easy. This is a valid point—the Challenge was designed to be easy so as not to intimidate staff or faculty who are less familiar with technology. It is important to note that these comments came from those who completed the Challenge—other participants may have found the tasks and mini-challenges more difficult. The goal was to provide an introduction to Web 2.0, not to train experts. However, a greater range of tasks and challenges could be provided in the future to allow staff and faculty more self- direction in selecting goals relevant to their experience. To encourage staff and faculty to attend sponsored training sessions as part of the Challenge, HBLL leaders decided to double points for time spent at these classes. This certainly encouraged participation, but it lead to “point inflation”—perhaps being one reason why so many reported that the Challenge was too easy to com- plete. The doubling of points may also have encouraged staff to spend more time in workshops and less time practicing or applying the skills learned. A possible solu- tion would be offering 1.5 points, or offering a set number of points for attendance instead of counting per minute. It also may have been informative for purpose of analy- sis to have surveyed both those who did not complete the Challenge as well as those who chose not to participate. Because the presurvey indicated that time was the biggest deterrent to learning and incorporating new technology, we assume that many of those who did not participate or who did not complete the challenge felt that they did not have enough time to do so. There is definitely potential for further investigation into why library staff would not want to participate in a technology training program, what would motivate them to participate, and how we could redesign the Technology Challenge to make it more appeal- ing to all of our staff and faculty. Several library employees have requested that HBLL sponsor another Technology Challenge program. Because of the success of the first and because of continuing inter- est in technology training, we plan to do so in the future. We will make changes and adjustments according to the on technology in order to effectively help patrons”; “to identify and become comfortable with new technologies that will make my work more efficient, more presentable, and more accurate.” ■■ Lifelong Learning Staff and faculty responded favorably to the training. None of the participants who took the exit survey disliked the challenge; 34 percent even reported that they strongly liked it. Ninety-five percent reported that they enjoyed the pro- cess of learning new technology, and 100 percent reported that they were willing to participate in another technology challenge—thus suggesting success in the goal of encour- aging lifelong technology learning. The exit survey results indicate that after completing the challenge, staff and faculty are more motivated to continue learning—which is exactly what HBLL leaders hoped to accomplish. Eighty-nine percent of the partici- pants reported that their desire to learn new technology had increased, and 69 percent reported that they are now able to learn new technology faster after completing the Technology Challenge. Ninety-seven percent claimed that they were more likely to incorporate new technology into home or work use, and 98 percent said they recognized the importance of staying on top of emerging technolo- gies. Participants commented that the training increased their desire to learn. One observed, “I often need a chal- lenge to get motivated to do something new,” and another participant reported feeling “a little more comfortable trying new things out.” The exit survey asked participants to indicate how they now use technology. One employee keeps a blog for her daughter’s dance company, and another said, “I’m on my way to a full-blown GoogleReader addiction.” Another participant applied these new skills at home: “I’m not so afraid of exploring the computer and other software programs. I even recently bought a computer for my own personal use at home.” The Technology Challenge was also successful in helping employees better serve patrons: “I can now better direct patrons to services that I would otherwise not have known about, such as streaming audio and video and e-book read- ers.” Another participant felt better connected to student patrons: “I understand the students better and the things they use on a daily basis.” Staff and faculty also found their new skills applicable to work beyond patron interaction, and many listed spe- cific examples of how they now use technology at work: ■■ “I have attended a few Microsoft Office classes that have helped me tremendously in doing my work more efficiently, whether it is for preparing monthly 210 iNFOrMAtiON tecHNOlOGY AND liBrAries | DeceMBer 2010 2. Richard T. Sweeny, “Reinventing Library Buildings and Services for the Millennial Generation,” Library Administration & Management 19, no. 4 (2005): 170. 3. Win Shish and Martha Allen, “Working with Generation- D: Adopting and Adapting to Cultural Learning and Change,” Library Management 28, no. 1/2 (2006): 89. 4. Sweeney, “Reinventing Library Buildings,” 170. 5. Shish and Allen, “Working with Generation-D,” 96. 6. Ibid., 98. 7. Stephen Abram, “Social Libraries: The Librarian 2.0 Pheonomenon,” Library Resources & Technical Services 52, no. 2 (2008): 21. 8. Ibid. 9. Jack M. Maness “Library 2.0 Theory: Web 2.0 and Its Implications for Libraries,” Webology 3, no. 2 (2006), http:// www.webology.ir/2006/v3n2/a25.html?q=link:webology.ir/ (accessed Jan. 8, 2010). 10. Ibid., under “Blogs and Wikis,” para. 4. 11. Laurel Ann Clyde, “Library Weblogs,” Library Manage- ment 22, no. 4/5 (2004): 183–89; Maness, “Library 2.0. Theory.” 12. See Matthew M. Bejune, “Wikis in Libraries,” Information Technology & Libraries 26, no. 3 (2007): 26–38 ; Darlene Fichter, “The Many Forms of E-Collaboration: Blogs, Wikis, Portals, Groupware, Discussion Boards, and Instant Messaging,” Online: Exploring Technology & Resources for Information Professionals 29, no. 4 (2005): 48–50; Maness, “Library 2.0 Theory.” 13. Mary Ellen Bates, “Can I Facebook That?” Online: Explor- ing Technology and Resources for Information Professionals 31, no. 5 (2007): 64; Sarah Elizabeth Miller and Lauren A. Jensen, “Con- necting and Communicating with Students on Facebook,” Com- puters in Libraries 27, no. 8 (2007): 18–22. 14. Clyde, “Library Weblogs,” 183. 15. Maness, “Library 2.0 Theory.” 16. Fichter, “Many Forms of E-Collaboration,” 50. 17. Sweeney, “Reinventing Library Buildings”; Bates, “Can I Facebook That?” 18. Bates, “Can I Facebook That?” 64. 19. Abram, “Social Libraries,” 21. 20. Ibid., 20. 21. Ibid., 21. 22. Shish and Allen, “Working with Generation-D,” 90. 23. Abram, “Social Libraries,” 20. 24. Helene Blowers and Lori Reed, “The C’s of Our Sea Change: Plans for Training Staff, from Core Competencies to Learning 2.0,” Computers in Libraries 27, no. 2 (2007): 11. 25. Helene Blowers, Learning 2.0, 2007, http://plcmclearning .blogspot.com (accessed Jan. 8, 2010). 26. For examples, see Ilana Kingsley and Karen Jensen, “Learning 2.0: A Tool for Staff training at the University of Alaska Fairbanks Rasmuson,” The Electronic Journal of Academic & Special Librarianship 12, no. 1 (2009), http://southernlibrari- anship.icaap.org/content/v10n01/kingsley_i01.html (accessed Jan. 8, 2010); Beverly Simmons, “Learning (2.0) to be a Social Library,” Tennessee Libraries 58, no. 2 (2008): 1–8. 27. For examples, see Christine Mackenzie, “Creating our Future: Workforce Planning for Library 2.0 and Beyond,” Aus- tralasian Public Libraries & Information Services 20, no. 3 (2007): 118–24; Liisa Sjoblom, “Embracing Technology: The Deschutes Public Library’s Learning 2.0 Program,” OLA Quarterly 14, no. 2 (2007): 2–6; Hui-Lan Titango and Gail L. Mason, “Learning Library 2.0: 23 Things @ SCPL,” Library Management 30, no. 1/2 feedback we have received, and continue to evaluate it and improve it based on survey results. The purpose of a second Technology Challenge would be to reinforce what staff and faculty have already learned, to teach new skills, and to help participants remember the importance of life- long learning when it comes to technology. ■■ Conclusion HBLL’s self-directed Technology Challenge was success- ful in teaching technology skills and in promoting lifelong learning—as well as in fostering the development of Librarian 2.0. Abram listed key characteristics and duties of Librarian 2.0, including learning the tools of Web 2.0; connecting people, technology, and information; embrac- ing “nontextual information and the power of pictures, moving images, sight, and sound”; using the latest tools of communication; and understanding the “emerging roles and impacts of the blogosphere, Web syndicasphere, and wikisphere.”41 Survey results indicated that HBLL employees are on their way to developing these attri- butes, and that they are better equipped with the skills and tools to keep learning. Like PLCMC’s Learning 2.0, the Technology Challenge could be replicated in libraries of various sizes. Obviously an exact replication would not be feasible or appropriate for every library—but the basic ideas, such as the prin- ciples of andragogy and self-directed learning could be incorporated, as well as the daily time requirement or the use of surveys to determine weaknesses or interests in technology skills. Whatever the case, there is a great need for library staff and faculty to learn emerging technolo- gies and to keep learning them as technology continues to change and advance. But the most important benefit of a self-directed train- ing program focusing on lifelong learning is effective employee development. The goal of any training pro- gram is to increase work productivity—and as employees become more productive and efficient, they are happier and more excited about their jobs. On the exit survey, one participant expressed initially feeling hesitant about the Technology Challenge and feared that it would increase an already hefty workload. However, once the Challenge began, the participant enjoyed “taking the time to learn about new things. I feel I am a better person/librarian because of it.” And that, ultimately, is the goal—not only to create better librarians, but also to create better people. Notes 1. Robert H. McDonald and Chuck Thomas, “Disconnects between Library Culture and Millennial Generation Values,” Educause Quarterly 29, no. 4 (2006): 4. BriDGiNG tHe GAP: selF-DirecteD stAFF tecHNOlOGY trAiNiNG | QuiNNeY, sMitH, AND GAlBrAitH 211 ers,” Journal of Extension 33 (2005), http://www.joe.org/ joe/2006december/tt5.php (accessed Jan. 8, 2010); Wayne G. West, “Group Learning in the Workplace,” New Directions for Adult and Continuing Education 71 (1996): 51–60. 33. Ota et al., “Needs of Learners.” 34. D. R. Garrison, “Self-directed Learning: Toward a Com- prehensive Model,” Adult Education Quarterly 48 (1997): 22. 35. Ibid. 36. Ota et al., “Needs of Learners,” under “Needs of the Adult Learner,” para. 4. 37. Ota et al., “Needs of Learners”; West, “Group Learning.” 38. Ota et al., “Needs of Learners,” under “Needs of the Adult Learner,” para. 2. 39. Ibid., para. 6. 40. Ibid., para 7. 41. Abram, “Social Library,” 21–22. (2009): 44–56; Illinois Library Association, “Continuous Improve- ment: The Transformation of Staff Development,” The Illinois Library Association Reporter 26, no. 2 (2008): 4–7; and Thomas Simpson, “Keeping up with Technology: Orange County Library Embraces 2.0,” Florida Libraries 20, no. 2 (2007): 8–10. 28. Sharan B. Merriam, “Andragogy and Self-Directed Learn- ing: Pillars of Adult Learning Theory,” New Directions for Adult & Continuing Education 89 (2001): 3–13. 29. Malcolm Shepherd Knowles, The Modern Practice of Adult Education: From Pedagogy to Andragogy (New York: Cambridge Books, 1980). 30. Jovita Ross-Gordon, “Adult Learners in the Classroom,” New Directions for Student Services 102 (2003): 43–52. 31. Merriam, “Pillars of Adult Learning”; Ross-Gordon, “Adult Learners.” 32. Carrie Ota et al., “Training and the Needs of Learn- Appendix A. Technology Challenge “Mini Challenges” Technology Challenge participants had the opportunity to complete fifteen of twenty mini-challenges to become eligible to win a second gift certificate to the campus bookstore. Below are some examples of technology mini-challenges: 1. Read a library or a technology blog 2. Listen to a library podcast 3. Check out a book from Circulation’s new self-checkout machine 4. Complete an online copyright tutorial 5. Catalog some books on LibraryThing 6. Read an e-book with Sony eBook Reader or Amazon Kindle 7. Scan photos or copy them from a digital camera and then burn them onto a CD 8. Backup data 9. Change computer settings 10. Schedule meetings with Microsoft Outlook 11. Create a page or comment on a page on the library’s intranet wiki 12. Use one of the library’s music databases to listen to music 13. Use WordPress or Blogger to create a blog 14. Post a photo on a blog 15. Use Google Reader or Bloglines to subscribe to a blog or news page using RSS 16. Reserve and check out a digital camera, camcorder, DVR, or slide scanner from the multimedia lab and create some- thing with it 17. Convert media on the analog media racks 18. Edit a family photograph using photo-editing software 19. Attend a class in the multimedia lab 20. Make a phone call using Skype 212 iNFOrMAtiON tecHNOlOGY AND liBrAries | DeceMBer 2010 How did you like the Technology Challenge overall? Answer Response Percent Strongly disliked 0 0 Disliked 0 0 Liked 42 66 Strongly liked 22 34 How did you like the reporting system used for the Technology Challenge (the Techopoly Game)? Answer Response Percent Strongly disliked 0 0 Disliked 4 6 Liked 41 64 Strongly liked 19 30 Would you participate in another Technology Challenge? Answer Response Percent Yes 64 100 No 0 0 What percentage of time did you spend using the following methods of learning? (participants were asked to allocate 100 points among the categories) Category Average Response Instructor-led large group 15.3 Instructor-led small group 27 One-on-one instruction 3.5 Web tutorial 12.8 Self-learning (books, articles) 27.4 DVDs .5 Small group discussion 2.7 Large group discussion 2.6 Other 6.7 I am more likely to incorporate new technology into my home or work life. Answer Response Percent Strongly disagree 0 0 Disagree 2 3 Agree 49 77 Strongly agree 13 20 I enjoy the process of making new technology a part of my work or home life. Answer Response Percent Strongly disagree 0 0 Disagree 2 3 Agree 37 58 Strongly agree 24 38 After completing the Technology Challenge, my desire to learn new technologies has increased. Answer Response Percent Strongly disagree 0 0 Disagree 7 11 Agree 44 69 Strongly agree 13 20 I feel I now learn new technologies more quickly. Answer Response Percent Strongly disagree 0 0 Disagree 20 31 Agree 39 61 Strongly agree 5 8 Appendix B. Exit Survey Results BriDGiNG tHe GAP: selF-DirecteD stAFF tecHNOlOGY trAiNiNG | QuiNNeY, sMitH, AND GAlBrAitH 213 Open-Ended Questions ■■ What would you change about the technology chal- lenge? ■■ What did you like about the Technology Challenge? ■■ What technologies were you introduced to during the Technology Challenge that you now use on a regular basis? ■■ In what was do you feel the Technology Challenge has benefited you the most? How much more proficient do you feel in . . . Category Not any Somewhat A lot Hardware 31% 64% 5% Software 8% 72% 20% Internet resources 17% 68% 15% Library technology 23% 64% 13% In order for you to succeed in your job, how important is keeping abreast of new technologies to you? Answer Response Percent Not important 1 2 Important 22 34 Very important 41 64 3132 ---- 214 iNFOrMAtiON tecHNOlOGY AND liBrAries | DeceMBer 2010 Margaret Brown-Sica, Jeffrey Beall, and Nina McHale Next-Generation Library Catalogs and the Problem of Slow Response Time and librarians will benefit from knowing what typical and acceptable response times are in online catalogs, and this information will assist in the design and evaluation of library discovery systems. This study also looks at bench- marks in response time and defines what is unacceptable and why. When advanced features and content in library catalogs increase response time to the extent that users become disaffected and use the catalog less, NextGen cata- logs represent a step backward, not forward. In August 2009, the Auraria Library launched an instance of the WorldCat Local product from OCLC, dubbed WorldCat@Auraria. The Library’s traditional catalog—named Skyline and running on the Innovative Interfaces platform—still runs concurrently with WorldCat@Auraria. Because WorldCat Local currently lacks a library circulation module that the Library was able to use, the legacy catalog is still required for its circulation functionality. In addition, Skyline contains MARC records from the SerialsSolution 360 MARC prod- uct. Since many of these records are not yet available in the OCLC WorldCat database, these records are being maintained in the legacy catalog to enable access to the Library’s extensive collection of online journals. Almost immediately upon implementation of WorldCat Local, many Library staff began to express concern about the product’s slow response time. They bemoaned its slowness both at the reference desk and during library instruction sessions. Few of the discus- sions of the product’s slow response time evaluated this weakness in the context of its advanced features. Several of the reference and instruction librarians even stated that they refused to use it any longer and that they were not recommending it to students and faculty. Indeed, many stated that they would only use the legacy Skyline catalog from then on. Therefore we decided to analyze the product’s response time in relation to the legacy catalog. We also decided to further our study by examining response time in library catalogs in general, including several different online catalog products from different vendors. ■■ Response Time The term response time can mean different things in dif- ferent contexts. Here we use it to mean the time it takes for all files that constitute a single webpage (in the case of testing performed, a permalink to a bibliographic record) to travel across the Internet from a Web server to the computer on which the page is to be displayed. We do not include the time it takes for the browser to render the page, only the time it takes for the files to arrive to the requesting computer. Typically, a single webpage is made of multiple files; these are sent via the Internet from a Web Response time as defined for this study is the time that it takes for all files that constitute a single webpage to travel across the Internet from a Web server to the end user’s browser. In this study, the authors tested response times on queries for identical items in five different library catalogs, one of them a next-generation (NextGen) catalog. The authors also discuss acceptable response time and how it may affect the discovery process. They suggest that librarians and vendors should develop standards for acceptable response time and use it in the product selec- tion and development processes. N ext-generation, or NextGen, library catalogs offer advanced features and functionality that facilitate library research and enable Web 2.0 features such as tagging and the ability for end users to create lists and add book reviews. In addition, individual catalog records now typically contain much more data than they did in earlier generations of online catalogs. This additional data can include the previously mentioned tags, lists, and reviews, but a bibliographic record may also con- tain cover images, multiple icons and graphics, tables of contents, holdings data, links to similar items, and much more. This additional data is designed to assist catalog users in the selection, evaluation, and access of library materials. However, all of the additional data and features have the disadvantage of increasing the time it takes for the information to flow across the Internet and reach the end user. Moreover, the code that handles all this data is much more complex than the coding used in earlier, traditional library catalogs. Slow response time has the potential to discourage both library patrons from using the catalog and library staff from using or recommending it. During a reference interview or library instruction ses- sion, a slow response time creates an awkward lull in the process, a delay that decreases confidence in the mind of library users, especially novices who are accustomed to the speed of an open Internet search. The two-fold purpose of this study is to define the concept of response time as it relates to both traditional and NextGen library catalogs and to measure some typical response times in a selection of library catalogs. Libraries Margaret Brown-sica (margaret.brown-sica@ucdenver.edu) is assistant Professor, associate Director of Technology Strat- egy and learning Spaces, Jeffrey Beall (jeffrey.beall@ucden- ver.edu) is assistant Professor, Metadata librarian, and Nina McHale (nina.mchale@ucdenver.edu) is assistant Professor, web librarian, university of colorado Denver. Next-GeNerAtiON liBrArY cAtAlOGs | BrOWN-sicA, BeAll, AND McHAle 215 Mathews posted an article called “5 Next Gen Library Catalogs and 5 Students: Their Initial Impressions.”7 Here he shares student impressions of several NextGen catalogs. Regarding slow response time Mathews notes, “Lots of comments on slowness. One student said it took more than ten seconds to provide results. Some other comments were: ‘that’s unacceptable’ and ‘slow-motion search, typical library.’” Nagy and Garrison, on Lauren’s Library Blog, emphasized that any “cross-silo federated search” is “as slow as the slower silos.”8 Any search inter- face is as slow as the slowest database from which it pulls information; however, that does not make users more likely to wait for search results. In fact, many users will not even know—or care—what is happening behind the scenes in a NextGen catalog. The assertion that slow response time makes well- intentioned improvements to an interface irrelevant is supported by an article that analyzes the development of Semantic Web browsers. Frachtenberg notes that users, however, have grown to expect Web search engines to provide near-instantaneous results, and a slow search engine could be deemed unusable even if it provides highly relevant results. It is therefore imperative for any search engine to meet its users’ interactivity expectations, or risk losing them.9 This is not just a library issue. Users expect a fast response to all Web queries, and we can learn from studies on general Web response time and how it affects the user experience. Huang and Fong-Ling help explain different user standards when using websites. Their research suggests that “hygiene factors” such as “navigation, information display, ease of learning and response time” are more important to people using “utilitarian” sites to accomplish tasks rather than “hedo- nistic” sites.10 In other words, response time importance increases when the user is trying to perform a task— such as research—and possibly even more for a task that may be time sensitive—such as trying to complete an assignment for class. ■■ Method For testing response time in an assortment of library cat- alogs, we used the WebSitePulse service (http://www .websitepulse.com). WebSitePulse provides in-depth website and server diagnostic services that are intended to save e-business customers time and money by reporting errors and Web server and website performance issues to clients. A thirty-day free trial is available for potential cus- tomers to review the full array of their services; however, the free Web Page Test, available at http://www.website server and arrive sequentially at the computer where the request was initiated. While the World Wide Web Consortium (W3C) does not set forth any particular guidelines regarding response time, go-to usability expert Jakob Nielsen states that “0.1 second is about the limit for having the user feel that the system is reacting instantaneously.”1 He further posits that 1.0 second is “about the limit for the user’s flow of thought to stay uninterrupted, even though the user will notice the delay.”2 Finally, he asserts that: 10 seconds is about the limit for keeping the user’s attention focused on the dialogue. For longer delays, users will want to perform other tasks while waiting for the computer to finish, so they should be given feedback indicating when the computer expects to be done. Feedback during the delay is especially impor- tant if the response time is likely to be highly variable, since users will then not know what to expect.3 Even though this advice dates to 1994, Nielsen noted even then that it had “been about the same for many years.”4 ■■ Previous Studies The chief benefit of studying response time is to estab- lish it as a criterion for evaluating online products that libraries license and purchase, including NextGen online catalogs. Establishing response-time benchmarks will aid in the evaluation of these products and will help libraries convey the message to product vendors that fast response time is a valuable product feature. Long response times will indicate that a product is deficient and suffers from poor usability. It is important to note, however, that sometimes library technology environments can be at fault in lengthening response time as well; in “Playing Tag In the Dark: Diagnosing Slowness In Library Response Time,” Brown-Sica diag- nosed delays in response time by testing such variables as vendor and proxy issues, hardware, bandwidth, and network traffic.5 In that case, inadequate server specifi- cations and settings were at fault. While there are many articles on NextGen catalogs, few of them discuss the issue of response time in rela- tion to their success. Search slowness has been reported in library literature about NextGen catalogs’ metasearch cousins, federated search products. In a 2006 review of federated search tools MetaLib and WebFeat, Chen noted that “a federated search could be dozens of times slower than Google.”6 More comments about the negative effects of slow response time in NextGen catalogs can be found in popular library technology blogs. On his blog, 216 iNFOrMAtiON tecHNOlOGY AND liBrAries | DeceMBer 2010 ■■ Findings: Skyline Versus WorldCat@Auraria In figure 2, the bar graph shows a sample load time for the permalink to the bibliographic record for the title Hard Lessons: The Iraq Reconstruction Experience in Skyline, Auraria’s traditional catalog load time for the page is pulse.com/corporate/alltools.php, met our needs. To use the webpage test, simply select “Web Page Test” from the dropdown menu, input a URL—in the case of the testing done for this study, the perma- link for one of three books (see, for example, figure 1)—enter the validation code, and click “Test It.” WebSitePulse returns a bar graph (figure 2) and a table (figure 3) of the file activity from the server sending the composite files to the end user ’s Web browser. Each line represents one of the files that make up the rendered webpage. They load sequentially, and the bar graph shows both the time it took for each file to load and the order in which the files were received. Longer seg- ments of the bar graph provide visual indication of where a slow-loading webpage might encounter sticking points—for example, wait- ing for a large image file or third-party content to load. Accompanying the bar graph is a table describing the file transmissions in more detail, including DNS, connection, file redirects (if applicable), first and last bytes, file trans- mission times, and file sizes. Figure 1. Permalink screen shot for the record for the title Hard Lessons in Auraria Library’s Skyline catalog Figure 2. WebSitePulse webpage test bar graph results for Skyline (traditional) catalog record Figure 3. WebSitePulse webpage test table results for Skyline (traditional) catalog record Next-GeNerAtiON liBrArY cAtAlOGs | BrOWN-sicA, BeAll, AND McHAle 217 requested at items 8, 14, 15, 17, 26, and 27. The third parties include Yahoo! API services, the Google API ser- vice, ReCAPTCHA, and AddThis. ReCAPTCHA is used to provide security within WorldCat Local with opti- cal character recognition images (“captchas”), and the AddThis API is used to provide bookmarking function- ality. At number 22, a connection is made to the Auraria Library Web server to retrieve a logo image hosted on the Web server. At number 28, the cover photo for Hard Lessons is retrieved from an OCLC server. The files listed in figure 6 details the complex process of Web brows- ers’ assembly of them. Each connection to third-party content, while all relatively short, allows for addi- tional features and functionality, but lengthens overall response. As figure 6 shows, the response time is slightly more than 10 seconds, which, according to Nielsen, “is about the limit for keeping the user ’s attention focused on the dialogue.”12 While widgets, third-party content, and other Web 2.0 tools add desirable content and functionality to the Library’s catalog, they also do slow response time considerably. The total file size for the bibliographic record in WorldCat@Auraria—compared to Skyline’s 84.64 KB—is 633.09 KB. As will be shown in the test results below for the catalog and NextGen catalog products, bells and whistles added to traditional 1.1429 seconds total. The record is composed of a total of fourteen items, including image files (GIFs), cascad- ing style sheet (CSS) files, and JavaScript (JS) files. As the graph is read downward, the longer segments of the bars reveal the sticking points. In the case of Skyline, the nine image files, two CSS files, and one JS file loaded quickly; the only cause for concern is the red line at item four. This revealed that we were not taking advantage of the option to add a favicon to our III catalog. The Web librarian provided the ILS server technician with the same favi- con image used for the Library’s website, correcting this issue. The Skyline catalog, judging by this data, falls into Nielsen’s second range of user expectations regarding response time, which is more than one second, or “about the limit for the user’s flow of thought to stay uninter- rupted, even though the user will notice the delay.”11 Further detail is provided in figure 3; this table lists each of the webpage’s component files, and various times asso- ciated with the delivery of each file. The column on the right lists the size in kilobytes of each file. The total size of the combined files is 84.64 KB. In contrast to Skyline’s meager 14 files, WorldCat Local requires 31 items to assemble the webpage (figure 4) for the same bibliographic record. Figures 5 and 6 show that this includes 10 CSS files, 10 JavaScript files, and 8 images files (GIFs and PNGs). No item in particular slows down the overall process very much; the longest- loading item is number 13, which is a wait for third-party content, a connection to Yahoo!’s User Interface (YUI) API service. Additional third-party content is being Figure4. Permalink screen shot for the record for the title Hard Lessons in WorldCat@Auraria Figure 5. WebSitePulse webpage test bar graph results for WorldCat@Auraria record 218 iNFOrMAtiON tecHNOlOGY AND liBrAries | DeceMBer 2010 total time for each permalinked bibliographic record to load as reported by the WebSitePulse tests; this number appears near the lower right-hand corner of the tables in figures 3, 6, 9, 12, and 15. We selected three books that were each held by all five of our test sites, verifying that we were search- ing the same three bibliographic records in each of the online catalogs by looking at the OCLC number in the records. Each of the catalogs we tested has a permalink feature; this is a stable URL that always points to the same record in each catalog. Using a permalink approximates conducting a known-item search for that item from a catalog search screen. We saved these links and used them in our searches. The bib- liographic records we tested were for these books; the permalinks used for testing follow the books: Book 1: Hard Lessons: The Iraq Reconstruction Experience. Washington, D.C.: Special Inspector General, Iraq Reconstruction, 2009 (OCLC number 302189848). Permalinks used: ■■ WorldCat@Auraria: http://aurarialibrary.worldcat .org/oclc/302189848 ■■ Skyline: http://skyline.cudenver.edu/record=b243 3301~S0 ■■ LCOC: http://lccn.loc.gov/2009366172 ■■ UT Austin: http://catalog.lib.utexas.edu/record= b7195737~S29 ■■ USC: http://library.usc.edu/uhtbin/cgisirsi/ x/0/0/5?searchdata1=2770895{CKEY} Book 2: Ehrenreich, Barbara. Nickel and Dimed: On (Not) Getting by in America. 1st ed. New York: Metropolitan, 2001 (OCLC number 256770509). Permalinks used: ■■ WorldCat@Auraria: http://aurarialibrary.worldcat .org/oclc/45243324 ■■ Skyline: http://skyline.cudenver.edu/record=b187 0305~S0 ■■ LCOC: http://lccn.loc.gov/00052514 ■■ UT Austin: http://catalog.lib.utexas.edu/record= b5133603~S29 ■■ USC: http://library.usc.edu/uhtbin/cgisirsi/ x/0/0/5?searchdata1=1856407{CKEY} Book 3: Langley, Lester D. Simón Bolívar: Venezuelan Rebel, American Revolutionary. Lanham: Rowman & Littlefield catalogs slowed response time considerably, even dou- bling it in one case. Are they worth it? The response of Auraria’s reference and instruction staff seems to indi- cate that they are not. ■■ Gathering More Data: Selecting the Books and Catalogs to Study To broaden our comparison and to increase our data collection, we also tested three other non-Auraria cata- logs. We designed our study to incorporate a number of variables. We decided to link to bibliographic records for three different books in the five different online catalogs tested. These included Skyline and WorldCat@Auraria as well three additional online public access catalog products, for a total of two instances of Innovative Interfaces products, one of a Voyager catalog, and one of a SirsiDynix catalog. We also selected online catalogs in different parts of the country: WorldCatLocal in Ohio; Skyline in Denver; the Library of Congress’ Online Catalog (LCOC) in Washington, D.C.; the University of Texas at Austin’s (UT Austin) online catalog; and the University of Southern California’s (USC) online catalog, named Homer, in Los Angeles. We also did our testing at different times of the day. One book was tested in the morning, one at midday, and one in the afternoon. WebSitePulse performs its webpage tests from three different locations in Seattle, Munich, and Brisbane; we selected Seattle for all of our tests. We recorded the Figure 6. WebSitePulse webpage test table results for WorldCat@Auraria record Next-GeNerAtiON liBrArY cAtAlOGs | BrOWN-sicA, BeAll, AND McHAle 219 .org/oclc/256770509 ■■ Skyline: http://skyline.cudenver.edu/record=b242 6349~S0 ■■ LCOC: http://lccn.loc.gov/2008041868 ■■ UT Austin: http://catalog.lib.utexas.edu/record= b7192968~S29 ■■ USC: http://library.usc.edu/uhtbin/cgisirsi/ x/0/0/5?searchdata1=2755357{CKEY} We gathered the data for thirteen days in early November 2009, an active period in the middle of the semester. For each test, we recorded the response time total in seconds. The data is displayed in tables 1–3. We searched bibliographic records for three books in five library catalogs over thirteen days (3 x 5 x 13) for a total of 195 response time measurements. The WebSitePulse data is calculated to the ten thousandth of a second, and we recorded the data exactly as it was presented. Publishers, c2009 (OCLC number 256770509). Permalinks used: ■■ WorldCat@Auraria: http://aurarialibrary.worldcat Table 1. Response Times for Book 1 Response time in seconds Day Wor ld- Cat Skyline LC UT Austin USC 1 10.5230 1.3191 2.6366 3.6643 3.1816 2 10.5329 1.2058 1.2588 3.5089 4.0855 3 10.4948 1.2796 2.5506 3.4462 2.8584 4 13.2433 1.4668 1.4071 3.6368 3.2750 5 10.5834 1.3763 3.6363 3.3143 4.6205 6 11.2617 1.2461 2.3836 3.4764 2.9421 7 20.5529 1.2791 3.3990 3.4349 3.2563 8 12.6071 1.3172 3.6494 3.5085 2.7958 9 10.4936 1.1767 2.6883 3.7392 4.0548 10 10.1173 1.5679 1.3661 3.7634 3.1165 11 9.4755 1.1872 1.3535 3.4504 3.3764 12 12.1935 1.3467 4.7499 3.2683 3.4529 13 11.7236 1.2754 1.5569 3.1250 3.1230 Average 11.8310 1.3111 2.5105 3.4874 3.3953 Table 2. Response Times for Book 2 Response time in seconds Day World- Cat Skyline LC UT Austin USC 1 10.9524 1.4504 2.5669 3.4649 3.2345 2 10.5885 1.2890 2.7130 3.8244 3.7859 3 10.9267 1.3051 0.2168 4.0154 3.6989 4 13.8776 1.3052 1.3149 4.0293 3.3358 5 10.6495 1.3250 4.5732 3.5775 3.2979 6 11.8369 1.3645 1.3605 3.3152 2.9023 7 11.3482 1.2348 2.3685 3.4073 3.5559 8 10.7717 1.2317 1.3196 3.5326 3.3657 9 11.1694 1.0997 1.0433 2.8096 2.6839 10 19.0694 1.6479 2.5779 4.3595 2.6945 11 12.0109 1.1945 2.5344 3.0848 18.5552 12 12.6881 0.7384 1.3863 3.7873 3.9975 13 11.6370 1.1668 1.2573 3.3211 3.6393 Average 12.1174 1.2579 1.9410 3.5791 4.5190 Table 3. Response Times for Book 3 Response time in seconds Day World- Cat Skyline LC UT Austin USC 1 10.8560 1.3345 1.9055 3.7001 2.6903 2 10.1936 1.2671 1.8801 3.5036 2.7641 3 11.0900 1.5326 1.3983 3.5983 3.0025 4 10.9030 1.4557 2.0432 3.6248 2.9285 5 12.3503 1.5972 3.5474 3.6428 4.5431 6 9.1008 1.1661 1.4440 3.4577 3.1080 7 9.6263 1.1240 2.3688 3.1041 3.3388 8 10.9539 1.1944 1.4941 2.8968 3.4224 9 11.0001 1.2805 1.3255 3.3644 2.7236 10 10.2231 1.3778 1.3131 3.3863 3.4885 11 10.1358 1.2476 2.3199 3.4552 2.9302 12 12.0109 1.1945 2.5344 3.0848 18.5552 13 11.5881 1.2596 2.5245 3.8040 3.8506 Average 10.7717 1.3101 2.0076 3.4325 4.4112 Table 4. Averages Response time in seconds Book World- Cat Skyline LC UT Austin USC Book 1 11.8310 1.3111 2.5105 3.4874 3.3953 Book 2 12.1174 1.2579 1.9410 3.5791 4.5190 Book 3 10.7717 1.3101 2.0076 3.4325 4.4112 Average 11.5734 1.2930 2.1530 3.4997 4.1085 220 iNFOrMAtiON tecHNOlOGY AND liBrAries | DeceMBer 2010 university of colorado Denver: skyline (innovative interfaces) As previously mentioned, the traditional catalog at Auraria Library runs on an Innovative Interfaces integrated library system (ILS). Testing revealed a missing favicon image file that the Web server tries to send each time (item 4 in figure 3). However, this did not negatively affect the response time. The catalog’s response time was good, with an aver- age of 1.2930 seconds, giving it the fastest average time among all the test sites in the testing period. As figure 1 shows, however, Skyline is a typical legacy catalog that is designed for a traditional library environment. library of congress: Online catalog (voyager) The average response time for the LCOC was 2.0076 ■■ Results The data shows the response times for each of the three books in each of the five online catalogs over the thirteen- day testing period. The raw data was used to calculate averages for each book in each of the five online catalogs, and then we calculated averages for each of the five online catalogs (table 4). The averages show that during the testing period, the response time varied between 1.2930 seconds for the Skyline library catalog in Denver to 11.5734 seconds for WorldCat@Auraria, which has its servers in Ohio. university of colorado Denver: Worldcat@Auraria WorldCat@Auraria was routinely over Nielsen’s ten- second limit, sometimes taking as long as twenty sec- onds to load all the files to generate a single webpage. As previously discussed, this is due to the high number and variety of files that make up a single bibliographic record. The files sent also include cover images, but they are small and do not add much to the total time. After our tests on WorldCat@Auraria were conducted, the site removed one of the features on pages for individual resources, namely the “similar items” feature. This fea- ture was one of the most file-intensive on a typical page, and its removal should speed up page loads. However, WorldCat@Auraria had the highest average response time by far of the five catalogs tested. Figure 7. Permalink screen shot for the record for the title Hard Lessons in the Library of Congress online catalog Figure 8. WebSitePulse webpage test bar graph results for Library of Congress online catalog record Figure 9. WebSitePulse webpage test table results for Library of Congress online catalog record Next-GeNerAtiON liBrArY cAtAlOGs | BrOWN-sicA, BeAll, AND McHAle 221 item 14 is a script, that while hosted on the ILS server, que- ries Amazon.com to return cover image art (figures 11–12). The average response time for UT Austin’s catalog was 3.4997 seconds. This example demonstrates that response times for traditional (i.e., not NextGen) catalogs can be slowed down by additional content as well. university of southern california: Homer (sirsiDynix) The average response time for USC’s Homer catalog was 4.1085 seconds, making it the second slowest after seconds. This was the second fastest average among the five catalogs tested. While, like Skyline, the bibliographic record page is sparsely decorated (figure 7), this pays dividends in response time, as there are only two CSS files and three GIF files to load after the HTML content loads (figure 9). Figure 8 shows that initial connection time is the longest factor in load time; however, it is still short enough to not have a negative effect. Total file size is 19.27 KB. As with Skyline, the page itself (figure 7) is not particularly end-user friendly to nonlibrarians. university of texas at Austin: library catalog (innovative interfaces) UT Austin, like Auraria Library, runs an Innovative Interfaces ILS. The library catalog also includes book cover images, one of the most attractive NextGen features (figure 10), and as shown in figure 12, third-party content is used to add features and functionality (items 16 and 17). UT Austin’s catalog uses a Google JavaScript API (item 16 in figure 12) and LibraryThing’s Catalog Enhancement prod- uct, which can add book recommendations, tag browsing, and alternate editions and translations. Total content size for the bibliographic record is considerably larger than Skyline and the LCOC at 138.84 KB. It appears as though inclusion of cover art nearly doubles the response time; Figure 10. Permalink screen shot for the record for the title Hard Lessons in University of Texas at Austin’s library catalog Figure 11. WebSitePulse webpage test bar graph results for University of Texas at Austin’s library catalog record Figure 12. WebSitePulse webpage test table results for University of Texas at Austin’s library catalog record 222 iNFOrMAtiON tecHNOlOGY AND liBrAries | DeceMBer 2010 completed. Added functionality and features in library search tools are valuable, but there is a tipping point when these features slow down a product’s response time to where users find the search tool too slow or unreliable. Based on the findings of this study, we recom- mend that libraries adopt Web response time standards, such as those set forth by Nielsen, for evaluating vendor search products and creating in-house search products. Commercial tools like WebSitePulse make this type of data collection simple and easy. Testing should be con- ducted for an extended period of time, preferably during a peak period—i.e., during a busy time of the semes- ter for academic libraries. We further recommend that reviews of electronic resources add response time as an WorldCat@Auraria, and the slowest among the tradi- tional catalogs. This SirsiDynix catalog appears to take a longer time than the other brands of catalogs to make the initial connection to the ILS; this accounts for much of the slowness (see figures 14 and 15). Once the initial connection is made, however, the remaining content loads very quickly, with one exception: item 13 (see fig- ure 15), which is a connection to the third-party provider Syndetic Solutions, which provides cover art, a summary, an author biography, and a table of contents. While the display of this content is attractive and well-integrated to the catalog (figure 13), it adds 1.2 seconds to the total response time. Also, as shown in item 14 and 15, USC’s Homer uses the AddThis service to add bookmarking enhancements to the catalog. Total combined file size is 148.47 KB, with the bulk of the file size (80 KB) coming from the initial connection (item 1 in figure 15). ■■ Conclusion An eye-catching interface and valuable content are lost on the end user if he or she moves on before a search is Figure 13. Permalink screen shot for the record for the title Hard Lessons in Homer, the University of Southern California’s catalog Figure 14. WebSitePulse webpage test bar graph results for Homer, the University of Southern California’s catalog Figure 15. WebSitePulse webpage test table results for Homer, the University of Southern California’s catalog Next-GeNerAtiON liBrArY cAtAlOGs | BrOWN-sicA, BeAll, AND McHAle 223 4. Ibid. 5. Margaret Brown-Sica. “Playing Tag In the Dark: Diagnos- ing Slowness In Library Response Time,” Information Technology & Libraries 27, no. 4 (2008): 29–32. 6. Xiaotian Chen, “MetaLib, WebFeat, and Google: The Strengths and Weaknesses of Federated Search Engines Com- pared with Google,” Online Information Review 30, no. 4 (2006): 422. 7. Brian Mathews, “5 Next Gen Library Catalogs and 5 Stu- dents: Their Initial Impressions,” online posting, May 1, 2009, The Ubiquitous Librarian Blog, http://theubiquitouslibrarian .typepad.com/the_ubiquitous_librarian/2009/05/5-next-gen- library-catalogs-and-5-students-their-initial-impressions.html (accessed Feb. 5, 2010) 8. Andrew Nagy and Scott Garrison, “Next-Gen Catalogs Are Only Part of the Solution,” online posting. Oct. 4, 2009, Lauren’s Library Blog, http://laurenpressley.com/library/2009/10/next -gen-catalogs-are-only-part-of-the-solution/ (accessed Feb. 5, 2010). 9. Eitan Frachtenberg, “Reducing Query Latencies in Web Search Using Fine-Grained Parallelism,” World Wide Web 12, no. 4 (2009): 441–60. 10. Travis K Huang and Fu Fong-Ling, “Understanding User Interface Needs of E-commerce Web Sites,” Behaviour & Information Technology 28, no. 5 (2009): 461–69, http://www .informaworld.com/10.1080/01449290903121378 (accessed Feb. 5, 2010). 11. Nielsen, Usability Engineering, 135. 12. Ibid. evaluation criterion. Additional research about response time as defined in this study might look at other search tools, to include article databases, and especially other metasearch products that collect and aggregate search results from several remote sources. Further studies with more of a technological focus could include discussions of optimizing data delivery methods—again, in the case of metasearch tools from multiple remote sources—to reduce response time. Finally, product designers should pay close attention to response time when designing information retrieval products that libraries purchase. ■■ Acknowledgments The authors wish to thank Shelley Wendt, library data analyst, for her assistance in preparing the test data. References 1. Jakob Nielsen, Usability Engineering (San Francisco: Morgan Kaufmann, 1994): 135. 2. Ibid. 3. Ibid. 3134 ---- 102 iNForMAtioN tecHNoloGY AND liBrAries | septeMBer 2010 LITA committees and interest groups are being asked to step up to the table and develop action plans to imple- ment the strategies the LITA membership have identified as crucial to the association’s ongoing success. Members of the board are liaisons to each of the committees, and there is a board liaison to the interest groups. These indi- viduals will work with committee chairs, interest group chairs, and the membership to implement LITA’s plan for the future. The committee and interest group chairs are being asked to contribute those actions plans by the 2011 ALA Midwinter Meeting. They will be compiled and made available to all LITA and ALA members for their use through the LITA website (http://lita.org) and ALA Connect (http://connect.ala.org). What is in it for you? LITA is known for its leadership opportunities, continuing education, training, publica- tions, expertise in standards and information policy, and knowledge and understanding of current and cutting- edge technologies. LITA provides you with opportunities to develop those leadership skills that you can use in your job and lifelong career. The skills working within a group of individuals to implement a program, influence standards and policy, collaborate with other ALA divi- sions, and publish can be taken home to your library. Your participation documents your value as an employee and your commitment to lifelong learning. In today’s work environment, employers look for staff with proven skills who have contributed to the good of the organization and the profession. LITA needs your participation in developing and implementing continuing education programs, publish- ing articles and books, and illustrating by your actions why others want to join the association. How can you do that? Volunteer for a committee, help develop a continu- ing education program, write an article, write a book, role model for others with your LITA participation, and recruit. What does your association gain? A solid struc- ture to support its members in accomplishing the mission, vision, and strategic plan they identified as core for years to come. Look for opportunities to participate and develop those skills. We will be working with committee and interest group chairs to develop meeting management tool kits over the next year, create opportunities to par- ticipate virtually, identify emerging leaders of all types, collaborate with other divisions, and provide input on national information policy and standards through ALA’s Office for Information Technology Policy and other simi- lar organizations. If you want to be involved, be sure to let LITA committee and interest group chairs, the board, and your elected officers know. C loud computing. Web 3.0 or the Semantic Web. Google Editions. Books in copyright and books out of copyright. Born digital. Digitized material. The reduction of Stanford University’s Engineering Library book collection by 85 percent. The publishing paradigm most of us know, and have taken for granted, has shifted. Online databases came and we managed them. Then CD-ROMs showed up and mostly went away. And, along came the Internet, which we helped implement, use, and now depend on. How we deal with the current shifts happening in information and technology during the next five to ten years will say a great deal about how the library and information community reinvents itself for its role in the twenty-first century. This shift is different, and it will create both opportunities and challenges for everyone, including those who manage information and those who use it. As a reflection of the shifts in the information arena, LITA is facing its own challenges as an association. It has had a long and productive role in the American Library Association (ALA) dating back to 1966. The talent among the association members is amazing, solid, and a tribute to the individuals who belong to and participate in LITA. LITA’s members are leaders to the core and recognized as standouts within ALA as they push the edge of what information management means, and can mean. For the past three years, LITA members, the board, and the executive committee have been working on a strategic plan for LITA. That process has been described in Michelle Frisque’s “President’s Message” (ITAL v. 29, no. 2) and elsewhere. The plan was approved at the 2010 ALA Annual Conference in Washington, D.C. A plan is not cast in concrete. It is a dynamic, living document that provides the fabric that drives the association. Why is this process important now more than ever? We are all dealing with the current recession. Libraries are retrenching. People face challenges participating in the library field on various levels. The big information players on the national and international level are chang- ing the playing field. As membership, each of us has an opportunity to affect the future of information and tech- nology locally, nationally, and internationally. This plan is intended to ensure LITA’s role as a “go to” place for people in the library, information, and technology fields well into the twenty-first century. Karen J. starr (kstarr@nevadaculture.org) is lITa President 2010–11 and assistant administrator for library and Develop- ment Services, nevada State library and archives, Carson City. Karen J. StarrPresident’s Message: Moving Forward 3135 ---- eDitoriAl BoArD tHouGHts | DeHMlow 103 I n the age of the Internet, Google, and the nearly crushing proliferation of metadata, libraries have been struggling with how to maintain their relevance and survive in the face of shrinking budgets and misinformed questions about whether libraries still provide value. In case there was ever any question, the answer is “of course we do.” Still, an evolving environment and changing con- text has motivated us to rethink what we do and how we do it. Our response to the shifting environment has been to envision how libraries can provide the best value to our patrons despite an information ecosystem that duplicates (and to some extent replaces) services that have been a core part of our profession for ages. At the same time, we still have to deal with procedures for managing resources we acquire and license, and many of the systems and pro- cesses that have served us so well for so many years are not suitable for today’s environment. Many have talked about the need to invest in the distinctive services we provide and unique collections we have (e.g., preserving the world’s knowledge and digi- tizing our unique holdings) as a means to add value to libraries. There are many other ways libraries create value for our users, and one of the best is for us to respond to needs that are specific to our organizations and users— specialized services, focused collections, contextualized discovery, all integrated into environments in which our patrons work, such as course management systems, Google, etc. The library market has responded to many of our needs with ERMSs and next-generation resource management and discovery solutions. All of this is a good start, but like any solution that is designed to work for the greatest common denominator, they often leave a “desired functionality gap” because no one system can do everything for everyone, no development today can address all of the needs of tomorrow, and very rarely do all of the disparate systems integrate with each other. So where does that leave libraries? Well, every prob- lem is an opportunity, and there are two important areas that libraries can invest in to ensure that they progress at the same pace as technology, their users, and the mar- ket: open systems that have application programmer interfaces (APIs), and programmers. APIs are a means to access the data and functionality of our vended or open- source systems using a program as opposed to the default interface. APIs often take the shape of XML travelling in the same way that webpages do, accessed via a URL, but they also can be as complex as writing code in the same language as the base system, for example software devel- opment kits (SDKs). The key here is that APIs provide a way to work with the data in our systems, be they back- end inventory or front-end discovery interfaces, in ways that weren’t conceived by the software developers. This flexibility enables organizations to respond more rapidly to changing needs. No matter which side of the open- source/vended solution fence you sit on, openness needs to be a fundamental part of any decision process for any new system (or information service) to avoid being stifled by vendor or open-source developer priorities that don’t necessarily reflect your own. The second opportunity is perhaps the more diffi- cult one given the state of library budgets and that the resources that are needed to hire programmers are higher than most other library staff. But having local program- ming skills easily accessible will be vital to our ability to address our users’ specific needs and change our internal processes as we need to. I think it is good to have at least one technical person who comes from an industry outside of libraries. They bring knowledge that we don’t neces- sarily have and fresh perspectives on how we do things. If it is not possible to hire a programmer, I would encourage technology managers to look closely at their existing staff, locate those in the organization who are able to think outside of the box, and provide some time and space for them to grow their skill set. I am not so obtuse as to suggest that anyone can be programmer—like any skill it requires a general aptitude and a fundamental inter- est—but I am a self-taught developer who had a technical aptitude and an strong desire to learn new things, and I suspect that there are many underutilized staff in librar- ies that with a little encouragement, mentoring, and some new technical knowledge, could easily work with APIs and SDKs, thereby opening the door for organizations to be nimble and responsive to both internal and external needs. I recognize that with heavy demands it can be difficult to give up some of these highly valued people’s time, but the payoff is overwhelmingly worth it. These days I can only chuckle at the doomsday predictions about libraries and the death of our services— Google’s dominance in the search arena has never really made me worried that libraries would become irrelevant. We have too much that Google does not, specifically licensed content that our users desire, and we have rela- tionships with our users that Google will be incapable of having. I have confidence that what we have to offer will be valuable to our users for some time to come. However, it will take a willingness to evolve with our environment and to invest in skill sets that come at a premium even when it is difficult to do so. Mark Dehmlow Editorial Board Thoughts: Adding Value in the Internet Age— Libraries, Openness, and Programmers Mark Dehmlow (mdehmlow@nd.edu) is Digital Initiatives librar- ian, university of notre Dame, notre Dame, Indiana. 3136 ---- 104 iNForMAtioN tecHNoloGY AND liBrAries | septeMBer 2010 development of such a mediation mechanism calls for an empirical assessment of various issues surrounding metadata-creation practices. The critical issues concerning metadata practices across distributed digital collections have been rela- tively unexplored. While examining learning objects and e-prints communities of practice, Barton, Currier, and Hey point out the lack of formal investigation of the metadata- creation process.2 As will be discussed in the following section, some researchers have begun to assess the current state of descriptive practices, metadata schemata, and content standards. However, the literature has not yet developed to a point where it affords a comprehensive picture. Given the propagation of metadata projects, it is important to continue to track changes in metadata-cre- ation practices while they are still in constant flux. Such efforts are essential for adding new perspectives to digital library research and practices in an environment where metadata best practices are being actively sought after to aid in the creation and management of high-quality digital collections. This study examines the prevailing current state of metadata-creation practices in digital repositories, col- lections, and libraries, which may include both digitized and born-digital resources. Using nationwide survey data, mostly drawn from the community of catalog- ing and metadata professionals, we seek to investigate issues in creating descriptive metadata elements, using controlled vocabularies for subject access, and propa- gating metadata and metadata guidelines beyond local environments. We will address the following research questions: 1. Which metadata schema(ta) and content standard(s) are employed in individual digital repositories and collections? 2. Which controlled vocabulary schema(ta) are used to facilitate subject access? 3. What criteria are applied in selecting metadata and controlled-vocabulary schema(ta)? 4. To what extent are mechanisms for exposing and sharing metadata integrated into current metadata- creation practices? In this article, we first review recent studies relating to current metadata-creation practices across digital col- lections. Then we present the survey method employed to conduct this study, the general characteristics of survey participants, and the validity of the collected data, fol- lowed by the study results. We report on how metadata and controlled vocabulary schema(ta) are being used across institutions, and we present a data analysis of current metadata-creation practices. The final section summarizes the study and presents some suggestions for future studies. This study explores the current state of metadata-creation practices across digital repositories and collections by using data collected from a nationwide survey of mostly cataloging and metadata professionals. Results show that MARC, AACR2, and LCSH are the most widely used metadata schema, content standard, and subject- controlled vocabulary, respectively. Dublin Core (DC) is the second most widely used metadata schema, followed by EAD, MODS, VRA, and TEI. Qualified DC’s wider use vis-à-vis Unqualified DC (40.6 percent versus 25.4 percent) is noteworthy. The leading criteria in selecting metadata and controlled-vocabulary schemata are collec- tion-specific considerations, such as the types of resources, nature of the collection, and needs of primary users and communities. Existing technological infrastructure and staff expertise also are significant factors contributing to the current use of metadata schemata and controlled vocabularies for subject access across distributed digital repositories and collections. Metadata interoperability remains a major challenge. There is a lack of exposure of locally created metadata and metadata guidelines beyond the local environments. Homegrown locally added meta- data elements may also hinder metadata interoperability across digital repositories and collections when there is a lack of sharable mechanisms for locally defined extensions and variants. M etadata is an essential building block in facili- tating effective resource discovery, access, and sharing across ever-growing distributed digital collections. Quality metadata is becoming critical in a networked world in which metadata interoperability is among the top challenges faced by digital libraries. However, there is no common data model that catalog- ing and metadata professionals can readily reference as a mediation mechanism during the processes of descriptive metadata creation and controlled vocabu- lary schemata application for subject description.1 The Jung-ran park (jung-ran.park@ischool.drexel.edu) is assistant Professor, College of Information Science and Technology, Drex- el university, Philadelphia, and Yuji tosaka (tosaka@tcnj.edu) is Cataloging/Metadata librarian, TCnJ library, The College of new Jersey, Ewing, new Jersey. Jung-ran Park and Yuji Tosaka Metadata Creation Practices in Digital Repositories and Collections: Schemata, Selection Criteria, and Interoperability MetADAtA creAtioN prActices iN DiGitAl repositories AND collectioNs | pArK AND tosAKA 105 possible increase in the use of locally developed schemata as many projects added new types of nontextual digital objects that could not be adequately described by existing metadata schemata.6 There is a lack of research concerning the current use of content standards; however, it is reasonable to suspect that content-standards use exhibits patterns similar to that of metadata because of their often close association with particular metadata schemata. The OCLC RLG sur- vey reveals that Anglo-American Cataloguing Rules, 2nd edition (AACR2)—the traditional cataloging rule that has most often been used in conjunction with MARC—is the most widely used content standard (81 percent). AACR2 is followed by Describing Archives: A Content Standard (DACS) with 42 percent; Descriptive Cataloging of Rare Materials with 33 percent; Archives, Personal Papers, Manuscripts (APPM) with 25 percent; and Cataloging Cultural Objects (CCO) with 21 percent.7 In the same way as metadata schemata, there appears to be a concentration of a few controlled vocabulary schemata at research institutions. Ma’s ARL survey, for example, shows that the Library of Congress Subject Headings (LCSH) and Name Authority File (NAF) were used by most survey respondents (96 percent and 88 percent, respectively). These two predominantly adopted vocabularies are followed by several domain-specific vocabularies, such as Art and Architecture Thesaurus (AAT), Library of Congress Thesaurus for Graphical Materials (TGM) I and II, Getty Thesaurus of Geographic Names (TGN), and the Getty Union List of Artists Names (ULAN), which were used by between 30 percent to more than 60 percent of respondents.8 The OCLC RLG survey reports similar results; however, nearly half of the OCLC RLG survey respondents (N = 9) indicated that they had also built and maintained one or more locally developed thesauri.9 While creating and sharing information about local metadata implementations is an important step toward increased interoperability, recent studies tend to paint a grim picture of current local documentation practices and open accessibility. In a nationwide study of institutional repositories in U.S. academic libraries, Markey et al. found that only 61.3 percent of the 446 survey participants with operational institutional repositories had imple- mented policies for metadata schemata and authorized metadata creators.10 The OCLC RLG survey also high- lights limited collaboration and sharing of the metadata guidelines both within and across the institutions. It finds that even when there are multiple units creating metadata within the same institution, metadata-creation guidelines often are unlikely to be shared (28 percent do not share; 53 percent sometimes share).11 A mixed result is reported on the exposure of meta- data to outside service providers. In an ARL survey, the University of Houston Libraries Institutional Repository ■■ Literature Review As evinced by the principles and practices of bib- liographic control through shared cataloging, successful resource access and sharing in the networked envi- ronment demands semantic interoperability based on accurate, complete, and consistent resource description. The recent survey by Ma finds that the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) and metadata crosswalks have been adopted by 83 percent and 73 percent of respondents, respectively. Even though the sample comes only from sixty-eight Association of Research Libraries (ARL) member librar- ies, and the figures thus may be skewed higher than those of the entire population of academic libraries, there is little doubt that interoperability is a critical issue given the rapid proliferation of metadata schemata throughout digital libraries.3 While there is a variety of metadata schemata cur- rently in use for organizing digital collections, only a few of them are widely used in digital repositories. In her ARL survey, Ma reports that the MARC format is the most widely used metadata schema (91 percent), followed by Encoded Archival Description (EAD) (84 percent), Unqualified Dublin Core (DC) (78 percent), and Qualified DC (67 percent).4 Similarly, a 2007 member survey by OCLC Research Libraries Group (RLG) programs gath- ered information from eighteen major research libraries and cultural heritage institutions and also found that MARC is the most widely used scheme (65 percent), fol- lowed by EAD (43 percent), Unqualified DC (30 percent), and Qualified DC (29 percent). The different levels of use reported by these studies are probably due to different sample sizes and compositions, but results nonetheless suggest that metadata use at research institutions tends to rely on a small number of major schemata.5 There may in fact be much greater diversity in meta- data use patterns when the scope is expanded to include both research and nonresearch institutions. Palmer, Zavalina, and Mustafoff, for example, tracked trends from 2003 through 2006 in metadata selection and application practices at more than 160 digital collections developed through Institute of Museum and Library Services grants. They found that despite perceived limitations, use of DC is the most widespread, with more than half of the digital collections using it alone or in combination with other schemata. MARC ranks second, with nearly 30 percent using it alone or in combination. The authors found that the choice of metadata schema is largely influenced by practices at peer institutions and compatibility with a content management system. What is most striking, how- ever, is the finding that locally developed schemata are used as often as MARC. There is a decline in the percent- age of digital projects using multiple metadata schemata (from 53 percent to 38 percent). Yet the authors also saw a 106 iNForMAtioN tecHNoloGY AND liBrAries | septeMBer 2010 ■■ Method The objective of the research reported in this paper is to examine the current state of metadata-creation practices in terms of the creation of descriptive metadata elements, the use of controlled vocabularies for subject access, and the exposure of metadata and metadata guidelines beyond local environments. We conducted a Web survey using WebSurveyor (now Vovici: http://www.vovici .com). The survey included both structured and open- ended questions. It was extensively reviewed by members of an advisory board—a group of three experts in the field—and it was pilot-tested prior to being officially launched. The survey included many multiple-response questions that called for respondents to check all appli- cable answers. We recruited participants through survey invitation messages and subsequent reminders to the electronic mailing lists of communities of metadata and cataloging professionals. Table 1 shows the mailing lists employed for the study. We also sent out individual invitations and distributed flyers to selected metadata and cataloging ses- sions during the 2008 ALA Midwinter Meeting, held that year in Philadelphia. The survey attracted a large number of initial par- ticipants (N = 1,371), but during the sixty-two days from August 6 to October 6, 2008, we only received 303 com- pleted responses via the survey management system. We suspect that the high incompletion rate (77.9 percent) stems from the fact that the subject matter may have been outside the scope of many participants’ job responsibili- ties. The length of the survey may also have been a factor in the incompletion rate. The profiles of respondents’ job titles (see table 2) Task Force found that exposing metadata to OAI-PMH service providers is an established practice used by nearly 90 percent of the respondents.12 Ma’s ARL survey also reports the wide adoption of OAI-PMH (83 per- cent). These results underscore the virtual consensus on the critical importance of exposing metadata to achieve interoperability and make locally created metadata useful across distributed digital repositories and collections.13 By contrast, the OCLC RLG survey shows that only one-tenth of the respondents stated that all non-MARC metadata is exposed to OAI harvesters, while 30 percent indicated that only some of it was available. The prominent theme revealed by the OCLC RLG survey is an “inward focus” in current metadata practices, marked by the “use of local tools to reach a generally local audience.”14 In summary, recent studies show that the current practice of metadata creation is problematic due to the lack of a mechanism for integrating various types of metadata schemata, content standards, and controlled vocabularies in ways that promote an optimal level of interoperability across digital collections and repositories. The problems are exacerbated in an environment where many institutions lack local documentation delineating the metadata-creation process. At the same time, researchers have only recently begun studying these issues, and the body of literature is at an incipient stage. The research that was done often targeted different populations, and sample sizes were different (some very small). In some cases the literature exhibits contradictory findings about issues surrounding metadata practices, increasing the difficulty in understanding the current state of metadata creation. This points out the need for further research of current metadata-creation practice. Table 1. Electronic mailing lists for the survey Electronic Mailing Lists E-mail Address Autocat Dublin Core Listserv Metadata Librarians Listserv Library and Information Technology Association Listserv Online Audiovisual Catalogers Electronic Discussion List Subject Authority Cooperative Program Listserv Serialist Text Encoding Initiative Listserv Electronic Resources in Libraries Listserv Encoded Archival Description Listserv autocat@listserv.syr.edu dc-libraries@jiscmail.ac.uk metadatalibrarians@lists.monarchos.com lita-l@ala.org olac-list@listserv.acsu.buffalo.edu sacolist@listserv.loc.gov serialst@list.uvm.edu tei-l@listserv.brown.edu eril-l@listserv.binghamton.edu ead@listserv.loc.gov MetADAtA creAtioN prActices iN DiGitAl repositories AND collectioNs | pArK AND tosAKA 107 and job responsibilities (see table 3) clearly show that most of the individuals who completed the survey engage professionally in activities directly relevant to the research objectives, such as descriptive and subject cataloging, metadata creation and management, author- ity control, nonprint and special material cataloging, electronic resource and digital project management, and integrated library system (ILS) management. Although the largest number of participants (135, or 44.6 percent) chose the “Other” category regarding their job title (see table 2), it is reasonable to assume that the vast majority can be categorized as cataloging and meta- data professionals.15 Most job titles given as “Other” are associated with one of the professional activities listed in table 4. Thus it is reasonable to assume that the respondents are in an appropriate position to provide first-hand, accurate information about the current state of metadata creation in their institutions. Concerning the institutional background of partici- pants, of the 303 survey participants, fewer than half (121, or 39.9 percent) provided institutional information. We believe that this is mostly due to the fact that the question was optional, following a suggestion from the Institutional Review Board at Drexel University. Of those that provided their institutional background, the majority (75.2 percent) are from academic libraries, followed by participants from public libraries (17.4 percent) and from other institutions (7.4 percent). Table 3. Participants’ job responsibilities (multiple responses) Job Responsibilities Number of Participants General cataloging (e.g., descriptive and subject cataloging) 171 (56.4%) Metadata creation and management 153 (50.5%) Authority control 147 (48.5%) Nonprint cataloging (e.g., microform, music scores, photographs, video recordings) 133 (43.9%) Special material cataloging (e.g., rare books, foreign language materials, government documents) 126 (41.6%) Digital project management 101 (33.3%) Electronic resource management 62 (20.5%) ILS management 59 (19.5%) Other 51 (16.8%) Survey question: what are your primary job responsibilities? (please check all that apply) Table 2. Job titles of participants (multiple responses) Job Titles Number of Participants Other 135 (44.6%) Cataloger/cataloging librarian/ catalog librarian 99 (32.7%) Metadata librarian 29 (9.6%) Catalog & metadata librarian 26 (8.6%) Head, cataloging 26 (8.6%) Electronic resources cataloger 17 (5.6%) Cataloging coordinator 15 (5.0%) Head, cataloging & metadata services 15 (5.0%) N = 227. Survey question: what is your working job title? (please check all that apply) Table 4. Professional activities specified in “Other” category in table 2 Professional Activities Number of Participants Cataloging & metadata creation 31 (10.2%) Digital projects management 23 (7.6%) Technical services 17 (5.6%) Archiving 16 (5.3%) Electronic resources and serials management 6 (2.0%) Library system administration/ other 6 (2.0%) N = 99. Survey question: If you selected other, please specify. 108 iNForMAtioN tecHNoloGY AND liBrAries | septeMBer 2010 It is noteworthy that use of Qualified DC was higher than that of Unqualified DC. This result is different from the ARL survey and a member survey conducted ■■ Results In this section, we will present the findings of this study in the following three areas: (1) metadata and controlled vocabulary schemata and metadata tools used, (2) criteria for selecting metadata and controlled vocabulary schemata, and (3) exposing metadata and metadata guidelines beyond local environments. Metadata and controlled Vocabulary schemata and Metadata tools used A great variety of digital objects were handled by the survey participants, as figure 1 shows. The most frequently han- dled object was text, cited by 86.5 percent of the respondents. About three-fourths of the respondents described audiovi- sual materials (75.2 percent), while 60.1 percent described images and 51.8 per- cent described archival materials. More than 65 percent of the respondents han- dled electronic resources (68.3 percent) and digitized resources (66.7 percent), while approximately half handled born- digital resources (52.5 percent). The types of materials described in digital collections were diverse, encompassing both digitized and born-digital materi- als; however, digitization accounted for a slightly greater percentage of meta- data creation. To handle these diverse digital objects, the respondents’ institutions employed a wide range of metadata schemata, as figure 2 shows. Yet there were a few schemata that were widely used by cataloging and metadata pro- fessionals. Specifically, 84.2 percent of the respondents’ institutions used MARC; DC was also popular, with 25.4 percent using Unqualified DC and 40.6 percent using Qualified DC to create metadata. EAD also was frequently cited (31.7 percent). In addition to these major types of metadata schemata, the respondents’ institutions also employed Metadata Object Description Schema (MODS) (17.8 percent), Visual Resource Association (VRA) Core (14.9 percent), and Text Encoding Initiative (TEI) (12.5 percent). Figure 1. Materials/resources handled (multiple responses) Survey question: what type of materials/resources do you and your fellow catalogers/metadata librar- ians handle? (please check all that apply) Figure 2. Metadata schemata used (multiple responses) Survey question: which metadata schema(s) do you and your fellow catalogers/metadata librarians use? (please check all that apply) MetADAtA creAtioN prActices iN DiGitAl repositories AND collectioNs | pArK AND tosAKA 109 custom metadata elements derives from the imperative to accommodate the perceived needs of local collec- tions and users, as indicated by the two most common responses: (1) “to reflect the nature of local collec- tions/resources” (76.9 percent) and (2) “to reflect the characteristics of target audience/community of local collections” (58.3 percent). Local conditions were also cited from institutional and technical standpoints. Many institutions (34.3 percent) follow existing local practices for cataloging and metadata creation while other insti- tutions (18.5 percent) are making homegrown metadata additions because of constraints imposed by their local systems. Table 6 summarizes the most frequently used con- trolled vocabulary schematas by resource type. By far the most widely used schema across all resource types was LCSH. The preeminence of LCSH evinces the criti- cal role that it plays as the de facto form of controlled vocabulary for subject description. Library of Congress Classification (LCC) was the second choice for all resource types other than images, cultural objects, and archives. For digital collections of these resource types and digitized resources, AAT was the second most used controlled vocabulary, a fact that reflects its purpose as a domain-specific terminology used for describing works of art, architecture, visual resources, material culture, and archival materials. While traditional metadata schemata, content stan- dards, and controlled vocabularies such as MARC, AACR2, and LCSH clearly were preeminent in the majority of the respondents’ institutions, current meta- data creation in digital repositories and collections faces new challenges from the enormous volume of online and digital resources.19 Approximately one-third of the respondents’ institutions (33.8 percent) were meeting this challenge with tools for semiautomatic metadata generation. Yet a majority of respondents (52.5 percent) indicated that their institutions did not use any such tools for metadata creation and management. This result seems to contrast with Ma’s finding that automatic meta- data generation was used in some capacity in nearly by OCLC RLG programs (as described in “Literature Review” on page 105).16 In these surveys, Unqualified DC was more frequently cited than Qualified DC. One pos- sible explanation of this less frequent use of Unqualified DC may lie in the limitations of Unqualified DC meta- data semantics. Survey respondents also reported on problems using DC metadata, which were mostly caused by semantic ambiguities and semantic overlaps of cer- tain DC metadata elements.17 Limitations and issues of Unqualified DC metadata semantics are discussed in depth in Park’s study.18 In light of these results, examin- ing trends of Qualified DC use in a future study would be interesting. Despite the wide variety of schemata reported in use, there seemed to be an inclination to use only one or two metadata schemata for resource description. As shown in table 5, the majority of the respondents’ institutions (53.6 percent) used only one schema for metadata creation, while approximately 37 percent used two or three sche- mata (26.2 percent and 10.3 percent, respectively). The institutions using more than three schemata during the metadata-creation processes comprised only 9.9 percent of the respondents. Turning to content standards (see figure 3), we found that AACR2 was the most widely used standard, indi- cated by 84.5 percent of respondents. This high percentage clearly reflects the continuing preeminence of MARC as the metadata schema of choice for digital collections. DC application profiles also showed a large user base, indicated by more than one-third of respondents (37.0 percent). More than one quarter of the respondents (28.4 percent) used EAD application guidelines as developed by the Society of American Archivists and the Library of Congress, while 10.6 percent used RLG Best Practice Guidelines for Encoded Archival Description (2002). About one quarter (25.7 percent) indicated DACS as their content standard. Homegrown standards and guidelines are local appli- cation profiles that clarify existing content standards and specify how values for metadata elements are selected and represented to meet the requirements of a particular context. As shown in the results on metadata schemata, it is noteworthy that homegrown content standards and guidelines constituted one of the major choices of partici- pants, indicated by more than one-fifth of the institutions (22.1 percent). Almost two-fifths of the survey partici- pants (38 percent) also reported that they add homegrown metadata elements to a given metadata schema. Slightly less than half of the participants (47.2 percent) indicated otherwise. The local practice of creating homegrown content guidelines and metadata elements during the metadata- creation process deserves a separate study; this study only briefly touches on the basis for locally added custom metadata elements. The motivation to create Table 5. Number of metadata schemata in use Number of Metadata Schemata in Use Number of Participants 1 141 (53.6%) 2 69 (26.2%) 3 27 (10.3%) 4 or more 26 (9.9%) N=263. Survey question: which metadata schema(s) do you and your fellow catalogers/metadata librarians use the most? (please check all that apply) 110 iNForMAtioN tecHNoloGY AND liBrAries | septeMBer 2010 criteria for selecting Metadata and controlled Vocabulary schemata What are the factors that have shaped the current state of metadata-creation practices reported thus far? In this section, we turn our attention to constraints that affect decision making at institutions in the selection of meta- data and controlled vocabulary schemata for subject description. Figure 4 presents the percentage of different metadata schemata selection criteria described by survey par- ticipants. First, collection-specific considerations clearly played a major role in the selection. The most frequently cited reason was “types of resources” (60.4 percent). This response reflects the fact that a large number of metadata schemata have been developed, often with wide varia- tion in content and format, to better handle particular two-thirds of ARL libraries.20 Because semiautomatic metadata application is reported in-depth in a separate study, we only briefly sketch the topic here.21 The semiautomatic metadata application tools used in the respondents’ digital reposi- tories and collections can be classified into five categories of common characteristics: (1) metadata format conver- sion, (2) templates and editors for metadata creation, (3) automatic metadata creation, (4) library system for bibliographic and authority control, and (5) metadata harvesting and importing tools. As table 7 illustrates, among those institutions that have introduced semiautomatic metadata generation tools, “metadata format conversion” (38.6 percent) and “templates and editors for metadata creation” (27 per- cent) are the two most frequently cited tools. Figure 3. Content standards used (multiple responses) Survey question: what content standard(s) and/or guidelines do you and your fellow catalogers/metadata librarians use? (please check all that apply) MetADAtA creAtioN prActices iN DiGitAl repositories AND collectioNs | pArK AND tosAKA 111 job responsibility, “expertise of staff” (44.2 percent) and “integrated library system” (39.9 percent) appeared to highlight the key role that MARC continues to play in the metadata-creation process for digital collections (see fig- ure 2). “Budget” also appeared to be an important factor in metadata selection (17.2 percent), showing that funding levels played a considerable role in metadata decisions. types of information resources. The primary factor in selecting metadata schemata is their suit- ability for describing the most common type of resources han- dled by the survey participants. The second and third most common criteria, “target users/ audience” (49.8 percent) and “subject matters of resources” (46.9 percent), also seem to reflect how domain-specific metadata schemata are applied. In making decisions on meta- data schemata, respondents weighed materials in particular subject areas (e.g., art, education, and geography) and the needs of particu- lar communities of practice as their primary users and audiences. However, existing technological infrastructure and resource constraints also determine options. Given the prominence of general library cataloging as a primary Table 6. The most frequently used controlled vocabulary schema(s) by resource type (multiple responses) LCSH LCC DDC AAT TGM ULAN TGN Other Text 79.5% (241) 35.6% (108) 16.8% (51) 10.2% (31) 6.9% (21) 3.6% (11) 5.0% (15) 14.2% (43) Audiovisual materials 67.3% (204) 25.1% (76) 12.9% (39) 9.2% (28) 8.6% (26) 4.0% (12) 5.0% (15) 14.5% (44) Cartographic materials 44.9% (136) 17.5% (53) 7.3% (22) 5.0% (15) 4.3% (13) 1.3% (4) 4.3% (13) 6.3% (19) Images 43.2% (131) 11.9% (36) 5.6% (17) 25.7% (78) 20.1% (61) 9.9% (30) 10.6% (32) 11.2% (34) Cultural objects (e.g., museum objects) 20.1% (61) 7.3% (22) 4.3% (13) 13.2% (40) 6.3% (19) 4.6% (14) 3.0% (9) 7.9% (24) Archives 44.2% (134) 11.6% (35) 6.3% (19) 11.9% (36) 6.6% (20) 3.0% (9) 2.6% (8) 12.2% (37) Electronic resources 60.7% (184) 23.4% (71) 8.6% (26) 5.3% (16) 3.6% (11) 1.7% (5) 3.0% (9) 14.2% (43) Digitized resources 51.8% (157) 15.5% (47) 5.0% (15) 15.5% (47) 10.2% (31) 6.6% (20) 7.6% (23) 15.2% (46) Born-digital resources 43.9% (133) 13.5% (41) 5.6% (17) 8.3% (25) 7.3% (22) 4.3% (13) 4.6% (14) 13.9% (42) Survey question: which controlled vocabulary schema(s) do you and your fellow catalogers/metadata librarians use most? (Please check all that apply) Table 7. Types of semi-automatic metadata generation tools in use Types Response Rating Metadata format conversion 38 (38.6%) Templates and editors for metadata creation 26 (27.0%) Automatic metadata creation 16 (16.7%) Library system for bibliographic and authority control 15 (15.6%) Metadata harvesting and importing tools 8 (8.3%) N = 96. Survey question: Please describe the (semi)automatic metadata generation tools you use. 112 iNForMAtioN tecHNoloGY AND liBrAries | septeMBer 2010 the software used by their insti- tutions—i.e., “integrated library system” (39.9 percent), “digital collection or asset management software” (25.4 percent), “institu- tional repository software” (19.8 percent), “union catalogs” (14.9 percent), and “archival manage- ment software” (5.6 percent)—as a reason for their selection of meta- data schemata. Metadata decisions thus seem to be driven by a vari- ety of local technology choices for developing digital repositories and collections. As shown in figure 5, similar patterns are observed with regard to selection criteria for controlled vocabulary schemata. Three of the four selection criteria receiv- ing majority responses—“target users/audience” (55.4 percent), “type of resources” (54.8 percent), and “nature of the collection” (50.2 percent)—suggest that controlled vocabulary decisions are influ- enced primarily by the substantive purpose and scope of controlled vocabularies for local collections. A major consideration seems to be whether particular controlled vocabularies are suitable for rep- resenting standard data values to improve access and retrieval for target audiences. “Metadata standards,” another selection criteria frequently cited in the survey (54.1 percent), reflects how some domain-spe- cific metadata schemata tend to dictate the use of particular con- trolled vocabularies. At the same time, the results also suggest that resources and technological infra- structure available to institutions were also important reasons for their selections. “Expertise of staff” (38.3 percent) seems to be a straightforward practical rea- son: the application of controlled vocabularies is highly dependent on the width and depth of staff expertise available. Likewise, when implementing controlled vocabularies in the digital environment, some institutions also took into account At the same time, it is noteworthy that while responses were not mutually exclusive, many respondents cited Figure 4. Criteria for selecting metadata schemata (multiple responses) Question: which criteria were applied in selecting metadata schemata? (please check all that apply) Figure 5. Criteria for selecting controlled vocabulary schemata (multiple responses) Question: which criteria are applied in selecting controlled vocabulary schemata? (Please check all that apply) MetADAtA creAtioN prActices iN DiGitAl repositories AND collectioNs | pArK AND tosAKA 113 for search engines and 63.2 percent for OAI harvesters), a result that may be interpreted as a tendency to create metadata primarily for local audiences. Why do many institutions fail to make their locally created meta- data available to other institutions despite wide consensus on the importance of metadata sharing in a networked world? Responses from those institutions exposing none or not all of their metadata (see table 8) reveal that financial, personnel, and technical issues are major hin- drances in promoting the exposure of metadata outside the immediate local environment. Some institutions are not confident that their current metadata practices are able to sat- isfy the technical requirements for producing standards-based interop- erable metadata. Another reason frequently mentioned is copyright concerns about limited-access mate- rials. Yet some respondents simply do not see any merit to exposing their item-level metadata, citing its relative uselessness for resource discovery outside their institutions. As stated earlier, the practice of adding home- grown metadata elements seems common among many institutions. While locally created metadata elements accommodate local needs and requirements, they may also hinder metadata interoperability across digital repositories and collections if mechanisms for finding information about such locally defined extensions and variants are absent. Homegrown metadata guidelines document local data models and function as an essential mechanism for metadata creation and quality assurance within and across digital repositories and collections.23 In this regard, it is essential to examine locally created metadata guidelines and best practices.24 However, the results of the survey analysis evince that the vast majority of institutions (72.0 percent) provided no public access to local application profiles on their websites while only 19.6 percent of respondents’ institutions made them available online to the public. ■■ Conclusion Metadata plays an essential role in managing, organizing, and searching for information resources. In the networked existing system features for authority control and con- trolled vocabulary searching, as exhibited by 17.2 percent of responses for “digital collection/or asset management software.” exposing Metadata and Metadata Guidelines beyond local environments Metadata interoperability across distributed digital repositories and collections is fast becoming a major issue.22 The proliferation of open-source and commercial digital library platforms using a variety of metadata sche- mata has implications on the librarians’ ability to create shareable and interoperable metadata beyond the local environment. To what extent are mechanisms for sharing metadata integrated into the current metadata-creation practices described by the respondents? Figure 6 summarizes the responses concerning the uses of three major mechanisms for metadata exposure. Approximately half of respondents exposed at least some of their metadata to search engines (52.8 percent) and union catalogs such as OCLC WorldCat (50.6 percent). More than one-third of the respondents exposed all or some of their metadata through OAI harvesters (36.8 percent). About half or more of the respondents either did not expose their metadata or were not sure about the current operations at their institutions (e.g., 47.2 percent Figure 6. Mechanism to expose metadata (multiple responses) Survey question: Do you/your organization expose your metadata to oaI (open archives Initiative) harvesters, union catalogs or search engines? 114 iNForMAtioN tecHNoloGY AND liBrAries | septeMBer 2010 The DC metadata schema is the second most widely employed according to this study, with Qualified DC used by 40.6 percent of responding institutions and Unqualified DC used by 25.4 percent. EAD is another fre- quently cited schema (31.7 percent), followed by MODS (17.8 percent), VRA (14.9 percent), and TEI (12.5 percent). A trend of Qualified DC being used (40.6 percent) more often than Unqualified DC (25.4 percent) is noteworthy. One possible explanation of this trend may be derived from the fact that semantic ambiguities and overlaps in some of the Unqualified DC elements interfere with use in resource description.25 Given the earlier surveys report- ing the higher use of Unqualified DC over Qualified DC, more in-depth examination of their use trends may be an important avenue for future studies. Despite active research and promising results obtained from some experimental tools, practical applications of semiautomatic metadata generation have been incor- porated into the metadata-creation processes by only one-third of survey participants. The leading criteria in selecting metadata and controlled vocabulary schemata are derived from collec- tion-specific considerations of the type of resources, the nature of the collections, and the needs of primary users and communities. Existing technological infrastructure, encompassing digital collection or asset management software, archival management software, institutional repository software, integrated library systems, and union catalogs also greatly influence the selection pro- cess. The skills and knowledge of metadata professionals and the expertise of staff also are significant factors in understanding current practices in the use of metadata schemata and controlled vocabularies for subject access across distributed digital repositories and collections. The survey responses reveal that metadata interop- erability remains a challenge in the current networked environment despite growing awareness of its impor- tance. For half of the survey respondents, exposing metadata to the service providers, such as OAI harvesters, union catalogs, and search engines, does not seem to be a high priority because of local financial, personnel, and technical constraints. Locally created metadata elements are added in many digital repositories and collections in large part to meet local descriptive needs and serve the target user community. While locally created metadata elements accommodate local needs, they may also hinder metadata interoperability across digital repositories and collections when shareable mechanisms are not in place for such locally defined extensions and variants. Locally created metadata guidelines and application profiles are essential for metadata creation and quality assurance; however, most custom content guidelines and best practices (72 percent) are not made publicly avail- able. The lack of a mechanism to facilitate public access to local application profiles and metadata guidelines may environment, the enormous volume of online and digital resources creates an impending research need to evalu- ate the issues surrounding the metadata-creation process and the employment of controlled vocabulary schemata across ever-growing distributed digital repositories and collections. In this paper we explored the current status of metadata-creation practices through an examination of survey responses drawn mostly from cataloging and metadata professionals (see tables 2, 3, and 4). The results of the study indicate that current metadata practices still do not create conditions for interoperability. Despite the proliferation of newer metadata sche- mata, the survey responses showed that MARC currently remains the most widely used schema for providing resource description and access in digital repositories, collections, and libraries. The continuing predominance of MARC goes hand-in-hand with the use of AACR2 as the primary content standard for selecting and represent- ing data values for descriptive metadata elements. LCSH is used as the de facto controlled vocabulary schema for providing subject access in all types of digital repositories and collections, while domain-specific subject terminolo- gies such as AAT are applied at significantly higher rates in digital repositories handling nonprint resources such as images, cultural objects, and archival materials. Table 8. Sample reasons for not exposing metadata Not all our metadata conforms to standards required Not all our metadata is OAI compliant Lack of expertise and time and money to develop it IT restrictions Security concerns on the part of our information technology department Some collections/records are limited access and not open to the general public We think that having WorldCat available for traditional library materials that many libraries have is a better service to people than having each library dump our catalog out on the web Varies by tool and collection, but usually a restriction on the material, a technical barrier, or a feeling that for some collections the data is not yet sufficiently robust “still in a work in progress” Survey question: If you selected “some, but not all” or “no” in question 13 [see figure 6], please tell why you do not expose your metadata. MetADAtA creAtioN prActices iN DiGitAl repositories AND collectioNs | pArK AND tosAKA 115 presented at 2003 Dublin Core Conference: Supporting Commu- nities of Discourse and Practice—Metadata Research & Appli- cations, Seattle, Wash., Sept. 28–Oct. 2, 2003), http://dcpapers .dublincore.org/ojs/pubs/article/view/732/728 (accessed Mar. 24, 2009); Sarah Currier et al., “Quality Assurance for Digital Learning Object Repositories: Issues for the Metadata-creation process,” ALT-J 12 (2004): 5–20. 3. Jin Ma, Metadata, SPEC Kit 298 (Washington, D.C.: Asso- ciation of Research Libraries, 2007): 13, 28. 4. Ibid., 12, 21–22. 5. Karen Smith-Yoshimura, RLG Programs Descriptive Meta- data Practices Survey Results (Dublin, Ohio: OCLC, 2007): 6–7, http://www.oclc.org/programs/publications/reports/2007-03 .pdf (accessed Mar. 24, 2009); Karen Smith-Yoshimura and Diane Cellentani, RLG Programs Descriptive Metadata Practices Survey Results: Data Supplement (Dublin, Ohio: OCLC, 2007): 16, http://www.oclc.org/programs/publications/reports/2007-04 .pdf (accessed Mar. 24, 2009). 6. Carole Palmer, Oksana Zavalina, and Megan Mustafoff, “Trends in Metadata Practices: A Longitudinal Study of Col- lection Federation” (paper presented at the Seventh ACM/ IEES-CS Joint Conference on Digital Libraries, Vancouver, Brit- ish Columbia, Canada, June 18–23, 2007), http://hdl.handle .net/2142/8984 (accessed Mar. 24, 2009). 7. Smith-Yoshimura, RLG Programs Descriptive Metadata Prac- tices Survey Results, 7; Smith-Yoshimura and Cellentani, RLG Programs Descriptive Metadata Practices Survey Results, 17. 8. Ma, Metadata, 12, 22–23. 9. Smith-Yoshimura, RLG Programs Descriptive Metadata Prac- tices Survey Results, 7; Smith-Yoshimura and Cellentani, RLG Programs Descriptive Metadata Practices Survey Results, 18–21. 10. Karen Markey et al., Census of Institutional Repositories in the United States: MIRACLE Project Research Findings (Washing- ton, D.C.: Council on Library & Information Resources, 2007): 3, 46–50, http://www.clir.org/pubs/reports/pub140/pub140.pdf (accessed Mar. 24, 2009). 11. Yoshimura and Cellentani, RLG Programs Descriptive Meta- data Practices Survey Results, 24. 12. University of Houston Libraries Institutional Repository Task Force, Institutional Repositories, SPEC Kit 292 (Washington, D.C.: Association of Research Libraries, 2006): 18, 78. 13. Ma, Metadata, 13, 28. 14. Smith-Yoshimura, RLG Programs Descriptive Metadata Prac- tices Survey Results, 9, 11; Smith-Yoshimura and Cellentani, RLG Programs Descriptive Metadata Practices Survey Results, 27–29. 15. For the metrics of job responsibilities used to analyze job descriptions and competencies of cataloging and metadata professionals, see Jung-ran Park, Caimei Lu, and Linda Marion, “Cataloging Professionals in the Digital Environment: A Content Analysis of Job Descriptions,” Journal of the American Society for Information Science & Technology 60 (2009): 844–57; Jung-ran Park and Caimei Lu, “Metadata Professionals: Roles and Competen- cies as Reflected in Job Announcements, 2003–2006,” Cataloging & Classification Quarterly 47 (2009): 145–60. 16. Ma, Metadata; Smith-Yoshimura, RLG Programs Descriptive Metadata Practices Survey Result. 17. Jung-ran Park and Eric Childress, “Dublin Core Metadata Semantics: An Analysis of the Perspectives of Information Pro- fessionals,” Joural of Information Science 35, no. 6 (2009): 727–39. 18. Park, “Semantic Interoperability.” 19. Jung-ran Park, “Metadata Quality in Digital Repositories: hinder cross-checking for quality metadata and creating shareable metadata that can be harvested for a high level of consistency and interoperability across distributed digital collections and repositories. Development of a searchable registry for publicly available metadata guide- lines has potential to enhance metadata interoperability. A constraining factor of this study derives from the participant population; thus we have not attempted to generalize the findings of the study. However, results indicate a pressing need for a common data model that is shareable and interoperable across ever-growing dis- tributed digital repositories and collections. Development of such a common data model demands future research of a practical and interoperable mediation mechanism underlying local implementation of metadata elements, semantics, content standards, and controlled vocabular- ies in a world where metadata can be distributed and shared widely beyond the immediate local environment and user community. (Other issues such as semiauto- matic metadata application, DC metadata semantics, custom metadata elements, and the professional devel- opment of cataloging and metadata professionals are explained in-depth in separate studies.)26 For future studies, incorporation of other research methods (such as follow-up telephone surveys and face-to-face focus group interviews) could be used to better understand the cur- rent status of metadata-creation practices. Institutional variation also needs be taken into account in the design of future studies. ■■ Acknowledgments This study is supported through an early career develop- ment research award from the Institute of Museum and Library Services. We would like to express our apprecia- tion to the reviewers for their invaluable comments. References 1. Jung-ran Park, “Semantic Interoperability and Metadata Quality: An Analysis of Metadata Item Records of Digital Image Collections,” Knowledge Organization 33 (2006): 20–34; Rachel Heery, “Metadata Futures: Steps toward Semantic Interoper- ability,” in Metadata in Practice, ed. Diane I. Hillman and Elaine L. Westbrooks, 257–71 (Chicago: ALA, 2004); Jung-ran Park, “Semantic Interoperability across Digital Image Collections: A Pilot Study on Metadata Mapping” (paper presented at the Canadian Association for Information Science 2005 Annual Con- ference, London, Ontario, June 2–4, 2005), http://www.cais-acsi .ca/proceedings/2005/park_J_2005.pdf (accessed Mar. 24, 2009). 2. Jane Barton, Sarah Currier, and Jessie M. N. Hey, “Building Quality Assurance into Metadata Creation: An Analysis Based on the Learning Objects and E-Prints Communities of Practice” (paper 116 iNForMAtioN tecHNoloGY AND liBrAries | septeMBer 2010 A Survey of the Current State of the Art,” in “Metadata and Open Access Repositories,” ed. Michael S. Babinec and Holly Mercer, special issue, Cataloging & Classification Quarterly 47, no. 3/4 (2009): 213–38. 20. Ma, Metadata, 12, 24. The OCLC RLG survey found that about 40 percent of the respondents were able to generate some metadata automatically. See Smith-Yoshimura, RLG Programs Descriptive Metadata Practices Survey Results, 6; Yoshimura and Cellentani, RLG Programs Descriptive Metadata Practices Survey Results, 35. 21. Jung-ran Park and Caimei Lu, “Application of Semi- Automatic Metadata Generation in Libraries: Types, Tools, and Techniques,” Library & Information Science Research 31, no. 4 (2009): 225–31. 22. Park, “Semantic Interoperability”; Sarah L. Shreeves et al., “Is ‘Quality’ Metadata ‘Shareable’ Metadata? The Implications of Local Metadata Practices for Federated Collections” (paper presented at the 12th National Conference of the Association of College and Research Libraries, Apr. 7–10, 2005, Minneapolis, Minnesota), https://www.ideals.uiuc.edu/handle/2142/145 (accessed Mar. 24, 2009); Amy S. Jackson et al., “Dublin Core Metadata Harvested through OAI-PMH,” Journal of Library Meta- data 8, no. 1 (2008): 5–21; Lois Mai Chan and Marcia Lei Zeng, “Metadata Interoperability and Standardization—A Study of Methodology Part I: Achieving Interoperability at the Schema Level,” D-Lib Magazine 12, no. 6 (2006), http://www.dlib.org/ dlib/june06/chan/06chan.html (accessed Mar. 24, 2009); Marcia Lei Zeng and Lois Mai Chan, “Metadata Interoperability and Standardization—A Study of Methodology Part II: Achieving Interoperability at the Record and Repository Levels,” D-Lib Magazine 12, no. 6 (2006), http://www.dlib.org/dlib/june06/ zeng/06zeng.html (accessed Mar. 24, 2009). 23. Thomas R. Bruce and Diane I. Hillmann, “The Con- tinuum of Metadata Quality: Defining, Expressing, Exploiting,” in Metadata in Practice, ed. Hillman and Westbrooks, 238–56; Heery, “Metadata Futures”; Park, “Metadata Quality in Digital Repositories.” 24. Jung-ran Park, ed., “Metadata Best Practices: Current Issues and Future Trends,” special issue, Journal of Library Meta- data 9, no. 3/4 (2009). 25. See Park, “Semantic Interoperability”; Park and Childress, “Dublin Core Metadata Semantics.” 26. Park and Childress, “Dublin Core Metadata Semantics”; Park and Lu, “Application of Semi-Automatic Metadata Genera- tion in Libraries.” 3137 ---- Batch Loading coLLections into dspace | WaLsh 117 Maureen P. Walsh Batch Loading Collections into DSpace: Using Perl Scripts for Automation and Quality Control colleagues briefly described batch loading MARC meta- data crosswalked to DSpace Dublin Core (DC) in a poster session.2 Mishra and others developed a Perl script to create the DSpace archive directory for batch import of electronic theses and dissertations (ETDs) extracted with a Java program from an in-house bibliographic database.3 Mundle used Perl scripts to batch process ETDs for import into DSpace with MARC catalog records or Excel spreadsheets as the source metadata.4 Brownlee used Python scripts to batch process comma-separated values (CSV) files exported from Filemaker database software for ingest via the DSpace item importer.5 More in-depth descriptions of batch loading are pro- vided by Thomas; Kim, Dong, and Durden; Proudfoot et al.; Witt and Newton; Drysdale; Ribaric; Floyd; and Averkamp and Lee. However, irrespective of reposi- tory software, each describes a process to populate their repositories dissimilar to the workflows developed for the Knowledge Bank in approach or source data. Thomas describes the Perl scripts used to convert MARC catalog records into DC and to create the archive directory for DSpace batch import.6 Kim, Dong, and Durden used Perl scripts to semiauto- mate the preparation of files for batch loading a University of Texas Harry Ransom Humanities Research Center (HRC) collection into DSpace. The XML source metadata they used was generated by the National Library of New Zealand Metadata Extraction Tool.7 Two subsequent proj- ects for the HRC revisited the workflow described by Kim, Dong, and Durden.8 Proudfoot and her colleagues discuss importing meta- data-only records from departmental RefBase, Thomson Reuters EndNote, and Microsoft Access databases into ePrints. They also describe an experimental Perl script written to scrape lists of publications from personal web- sites to populate ePrints.9 Two additional workflow examples used citation databases as the data source for batch loading into repositories. Witt and Newton provide a tutorial on trans- forming EndNote metadata for Digital Commons with XSLT (Extensible Stylesheet Language Transformations).10 Drysdale describes the Perl scripts used to convert Thomson Reuters Reference Manager files into XML for the batch loading of metadata-only records into the University of Glascow’s ePrints repository.11 The Glascow ePrints batch workflow is additionally described by Robertson and Nixon and Greig.12 Several workflows were designed for batch loading ETDs into repositories. Ribaric describes the automatic This paper describes batch loading workflows developed for the Knowledge Bank, The Ohio State University’s institutional repository. In the five years since the incep- tion of the repository approximately 80 percent of the items added to the Knowledge Bank, a DSpace repository, have been batch loaded. Most of the batch loads utilized Perl scripts to automate the process of importing meta- data and content files. Custom Perl scripts were used to migrate data from spreadsheets or comma-separated values files into the DSpace archive directory format, to build collections and tables of contents, and to provide data quality control. Two projects are described to illus- trate the process and workflows. T he mission of the Knowledge Bank, The Ohio State University’s (OSU) institutional repository, is to col- lect, preserve, and distribute the digital intellectual output of OSU’s faculty, staff, and students.1 The staff working with the Knowledge Bank have sought from its inception to be as efficient as possible in adding content to DSpace. Using batch loading workflows to populate the repository has been integral to that efficiency. The first batch load into the Knowledge Bank was August 29, 2005. Over the next four years, 698 collections con- taining 32,188 items were batch loaded, representing 79 percent of the items and 58 percent of the collections in the Knowledge Bank. These batch loaded collections vary from journal issues to photo albums. The items include articles, images, abstracts, and transcripts. The majority of the batch loads, including the first, used custom Perl scripts to migrate data from Microsoft Excel spreadsheets into the DSpace batch import format for descriptive meta- data and content files. Perl scripts have been used for data cleanup and quality control as part of the batch load pro- cess. Perl scripts, in combination with shell scripts, have also been used to build collections and tables of contents in the Knowledge Bank. The workflows using Perl scripts to automate batch import into DSpace have evolved through an iterative process of continual refinement and improvement. Two Knowledge Bank projects are pre- sented as case studies to illustrate a successful approach that may be applicable to other institutional repositories. ■■ Literature Review Batch ingesting is acknowledged in the literature as a means of populating institutional repositories. There are examples of specific batch loading processes mini- mally discussed in the literature. Branschofsky and her Maureen p. Walsh (walsh.260@osu.edu) is Metadata Librarian/ Assistant Professor, The Ohio State University Libraries, Colum- bus, Ohio. 118 inFoRMation technoLogY and LiBRaRies | septeMBeR 2010 relational database PostgreSQL 8.1.11 on the Red Hat Enterprise Linux 5 operating system. The structure of the Knowledge Bank follows the hierarchical arrangement of DSpace. Communities are at the highest level and can be divided into subcommunities. Each community or subcommunity contains one or more collections. All items—the basic archival elements in DSpace—are con- tained within collections. Items consist of metadata and bundles of bitstreams (files). DSpace supports two user interfaces: the original interface based on JavaServer Pages (JSPUI) and the newer Manakin (XMLUI) interface based on the Apache Cocoon framework. At this writing, the Knowledge Bank continues to use the JSPUI interface. The default metadata used by DSpace is a Qualified DC schema derived from the DC library application profile.18 The Knowledge Bank uses a locally defined extended version of the default DSpace Qualified DC schema, which includes several additional element quali- fiers. The metadata management for the Knowledge Bank is guided by a Knowledge Bank application profile and a core element set for each collection within the reposi- tory derived from the application profile.19 The metadata librarians at OSUL create the collection core element sets in consultation with the community representatives. The core element sets serve as metadata guidelines for sub- mitting items to the Knowledge Bank regardless of the method of ingest. The primary means of adding items to collections in DSpace, and the two ways used for Knowledge Bank ingest, are (1) direct (or intermediated) author entry via the DSpace Web item submission user inter- face and (2) in batch via the DSpace item importer. Recent enhancements to DSpace, not yet fully explored for use with the Knowledge Bank, include new ingest options using Simple Web-service Offering Repository Deposit (SWORD), Open Archives Initiative Object Reuse and Exchange (OAI-ORE), and DSpace package import- ers such as the Metadata Encoding and Transmission Standard Submission Information Package (METS SIP) preparation of ETDs from the Internet Archive (http:// www.archive.org/) for ingest into DSpace using PHP utilities.13 Floyd describes the processor developed to automate the ingest of ProQuest ETDs via the DSpace item importer.14 Also using ProQuest ETDs as the source data, Averkamp and Lee described using XSLT to transform the ProQuest data to Bepress’ (The Berkeley Electronic Press) schema for batch loading into a Digital Commons repository.15 The Knowledge Bank workflows described in this paper use Perl scripts to generate DC XML and create the archive directory for batch loading metadata records and content files into DSpace using Excel spreadsheets or CSV files as the source metadata. ■■ Background The Knowledge Bank, a joint initiative of the OSU Libraries (OSUL) and the OSU Office of the Chief Information Officer, was first registered in the Registry of Open Access Repositories (ROAR) on September 28, 2004.16 As of December 2009 the repository held 40,686 items in 1,192 collections. The Knowledge Bank uses DSpace, the open-source Java-based repository software jointly developed by the Massachusetts Institute of Technology Libraries and Hewlett-Packard.17 As a DSpace reposi- tory, the Knowledge Bank is organized by communities. The fifty-two communities currently in the Knowledge Bank include administrative units, colleges, departments, journals, library special collections, research centers, symposiums, and undergraduate honors theses. The com- monality of the varied Knowledge Bank communities is their affiliation with OSU and their production of knowl- edge in a digital format that they wish to store, preserve, and distribute. The staff working with the Knowledge Bank includes a team of people from three OSUL areas—Technical Services, Information Technology, and Preservation—and the contracted hours of one systems developer from the OSU Office of Information Technology (OIT). The OSUL team members are not individually assigned full-time to the repository. The current OSUL team includes a librarian reposi- tory manager, two metadata librarians, one systems librarian, one systems developer, two technical services staff members, one preservation staff mem- ber, and one graduate assistant. The Knowledge Bank is cur- rently running DSpace 1.5.2 and the Figure 1. DSpace simple archive format archive_directory/ item_000/ dublin_core.xml--qualified Dublin Core metadata contents --text file containing one line per filename file_l.pdf --files to be added as bitstreams to the item file_2.pdf item_001/ dublin_core.xml file_1.pdf ... Batch Loading coLLections into dspace | WaLsh 119 ■■ Case Studies the issues of the Ohio Journal of Science OJS was jointly published by OSU and the Ohio Academy of Science (OAS) until 1974, when OAS took over sole control of the journal. The issues of OJS are archived in the Knowledge Bank with a two year rolling wall embargo. The issues for 1900 through 2003, a total of 639 issues containing 6,429 articles, were batch loaded into the Knowledge Bank. Due to rights issues, the retrospec- tive batch loading project had two phases. The project to digitize OJS began with the 1900–1972 issues that OSU had the rights to digitize and make publicly available. OSU later acquired the rights for 1973–present, and (accounting for the embargo period) 1973–2003 became phase 2 of the project. The two phases of batch loads were the most complicated automated batch loading processes developed to date for the Knowledge Bank. To batch load phase 1 in 2005 and phase 2 in 2006, the systems devel- opers working with the Knowledge Bank wrote scripts to build collections, generate DC XML from the source metadata, create the archive directory, load the metadata and content files, create tables of contents, and load the tables of contents into DSpace. The OJS community in the Knowledge Bank is orga- nized by collections representing each issue of the journal. The systems developers used scripts to automate the building of the collections in DSpace because of the number needed as part of the retrospective project. The individual articles within the issues are items within the collections. There is a table of contents for the articles in each issue as part of the collection homepages.21 Again, due to the number required for the retrospective project, the systems developers used scripts to automate the cre- ation and loading of the tables of contents. The tables of contents are contained in the HTML introductory text sec- tion of the collection pages. The tables of contents list title, authors, and pages. They also include a link to the item record and a direct link to the article PDF that includes the file size. For each phase of the OJS project, a vendor con- tracted by OSUL supplied the article PDFs and an Excel spreadsheet with the article-level metadata. The metadata format. This paper describes ingest via the DSpace batch item importer. The DSpace item importer is a command-line tool for batch ingesting items. The importer uses a simple archive format diagramed in figure 1. The archive is a directory of items that contain a subdirectory of item metadata, item files, and a contents file listing the bitstream file names. Each item’s descriptive metadata is contained in a DC XML file. The format used by DSpace for the DC XML files is illustrated in figure 2. Automating the process of creating the Unix archive directory has been the main function of the Perl scripts written for the Knowledge Bank batch loading workflows. A systems developer uses the test mode of the DSpace item importer tool to validate the item directories before doing a batch load. Any significant errors are corrected and the process is repeated. After a successful test, the batch is loaded into the staging instance of the Knowledge Bank and quality checked by a metadata librarian to identify any unexpected results and script or data problems that need to be corrected. After a successful load into the staging instance the batch is loaded into the production instance of the Knowledge Bank. Most of the Knowledge Bank batch loading work- flows use Excel spreadsheets or CSV files as the source for the descriptive item metadata. The creation of the metadata contained in the spreadsheets or files has var- ied by project. In some cases the metadata is created by OSUL staff. In other cases the metadata is supplied by Knowledge Bank communities in consultation with a metadata librarian or by a vendor contracted by OSUL. Whether the source metadata is created in-house or exter- nally supplied, OSUL staff are involved in the quality control of the metadata. Several of the first communities to join the Knowledge Bank had very large retrospective collection sets to archive. The collection sets of two of those early adopt- ers, the journal issues of the Ohio Journal of Science (OJS) and the abstracts of the OSU International Symposium on Molecular Spectroscopy currently account for 59 percent of the items in the Knowledge Bank.20 The successful batch loading workflows developed for these two com- munities—which continue to be active content suppliers to the repository—are presented as case studies. Figure 2. DSpace Qualified Dublin Core XML Notes on the Bird Life of Cedar Point 1901-04 Griggs, Robert F. 120 inFoRMation technoLogY and LiBRaRies | septeMBeR 2010 article-level metadata to Knowledge Bank DC, as illus- trated in table 1. The systems developers used the mapping as a guide to write Perl scripts to transform the vendor metadata into the DSpace schema of DC. The workflow for the two phases was nearly identical, except each phase had its own batch loading scripts. Due to a staff change between the two phases of the project, a former OSUL systems developer was responsible for batch loading phase 1 and the OIT systems developer was responsible for phase 2. The phase 1 scripts were all writ- ten in Perl. The four scripts written for phase 1 created the archive directory, performed database operations to build the collections, generated the HTML introduction table of contents for each collection, and loaded the tables of contents into DSpace via the database. For phase 2, the OIT systems developer modified and added to the phase 1 batch processing scripts. This case study focuses on phase 2 of the project. Batch processing for phase 2 of OJS The annotated scripts the OIT systems developer used for phase 2 of the OJS project are included in appen- dix A, available on the ITALica weblog (http://ital-ica .blogspot.com/). A shell script (mkcol.sh) added collec- tions based on a listing of the journal issues. The script performed a login as a selected user ID to the DSpace Web interface using the Web access tool Curl. A subsequent simple looping Perl script (mkallcol.pl) used the stored credentials to submit data via this channel to build the collections in the Knowledge Bank. The metadata.pl script created the archive directory for each collection. The OIT systems developer added the PDF file for each item to Unix. The vendor-supplied meta- data was saved as Unicode text format and transferred to Unix for further processing. The developer used vi com- mands to manually modify metadata for characters illegal in XML (e.g., “<” and “&”). (Although manual steps were used for this project, the OIT systems developer improved the Perl scripts for subsequent projects by add- ing code for automated transformation of the input data to help ensure XML validity.) The metadata.pl script then processed each line of the metadata along with the cor- responding data file. For each item, the script created the DC XML file and the contents file and moved them and the PDF file to the proper directory. Load sets for each col- lection (issue) were placed in their own subdirectory, and a load was done for each subdirectory. The items for each collection were loaded by a small Perl script (loaditems. pl) that used the list of issues and their collection IDs and called a shell script (import.sh) for the actual load. The tables of contents for the issues were added to the Knowledge Bank after the items were loaded. A Perl script (intro.pl) created the tables of contents using the meta- data and the DSpace map file, a stored mapping of item received from the vendor had not been customized for the Knowledge Bank. The OJS issues were sent to a vendor for digitization and metadata creation before the Knowledge Bank was chosen as the hosting site of the digitized jour- nal. The OSU Digital Initiatives Steering Committee 2002 proposal for the OJS digitization project had predated the Knowledge Bank DSpace instance. OSUL staff performed quality-control checks of the vendor-supplied metadata and standardized the author names. The vendor supplied the author names as they appeared in the articles—in direct order, comma separated, and including any “and” that appeared. In addition to other quality checks per- formed, OSUL staff edited the author names in the spreadsheet to conform to DSpace author-entry conven- tion (surname first). Semicolons were added to separate author names, and the extraneous ands were removed. A former metadata librarian mapped the vendor-supplied Table 1. Mapping of vendor metadata to Qualified Dublin Core Vendor-Supplied Metadata Knowledge Bank Dublin Core File [n/a: PDF file name] Cover Title dc.identifier.citation* ISSN dc.identifier.issn Vol. dc.identifier.citation* Iss. dc.identifier.citation* Cover Date dc.identifier.citation* Year dc.date.issued Month dc.date.issued Fpage dc.identifier.citation* Lpage dc.identifier.citation* Article Title dc.title Author Names dc.creator Institution dc.description Abstract dc.description.abstract n/a dc.language.iso n/a dc.rights n/a dc.type *format: [Cover Title]. v[Vol.], n[Iss.] ([Cover Date]), [Fpage]-[Lpage] Batch Loading coLLections into dspace | WaLsh 121 directories to item handles created during the load. The tables of contents were added to the Knowledge Bank using a shell script (installintro.sh) similar to what was used to create the collections. Installintro.sh used Curl to simulate a user adding the data to DSpace by performing a login as a selected user ID to the DSpace Web interface. A simple looping Perl script (ldallintro.pl) called installintro.sh and used the stored credentials to submit the data for the tables of contents. the abstracts of the osU international symposium on Molecular spectroscopy The Knowledge Bank contains the abstracts of the papers presented at the OSU International Symposium on Molecular Spectroscopy (MSS), which has met annually since 1946. Beginning with the 2005 Symposium, the complete presentations from authors who have autho- rized their inclusion are archived along with the abstracts. The MSS community in the Knowledge Bank currently contains 17,714 items grouped by decade into six col- lections. The six collections were created “manually” via the DSpace Web interface prior to the batch loading of the items. The retrospective years of the Symposium (1946–2004) were batch loaded in three phases in 2006. Each Symposium year following the retrospective loads was batch loaded individually. Retrospective Mss Batch Loads The majority of the abstracts for the retrospective loads were digitized by OSUL. A vendor was contracted by OSUL to digitize the remainder and to supply the meta- data for the retrospective batch loads. The files digitized by OSUL were sent to the vendor for metadata capture. OSUL provided the vendor a metadata template derived from the MSS core element set. The metadata taken from the abstracts comprised author, affiliation, title, year, session number, sponsorship (if applicable), and a full transcription of the abstract. To facilitate searching, the formulas and special characters appearing in the titles and abstracts were encoded using LaTeX, a document prepara- tion system used for scientific data. The vendor delivered the metadata in Excel spreadsheets as per the spreadsheet template provided by OSUL. Quality-checking the meta- data was an essential step in the workflow for OSUL. The metadata received for the project required revisions and data cleanup. The vendor originally supplied incomplete files and spreadsheets that contained data errors, includ- ing incorrect numbering, data in the wrong fields, and inconsistency with the LaTeX encoding. The three Knowledge Bank batch load phases for the retrospective MSS project corresponded to the staged receipt of metadata and digitized files from the vendor. The annotated scripts used for phase 2 of the project, which included twenty years of the OSU International Symposium between 1951 and 1999, are included in appendix B, available on the ITALica weblog. The OIT systems developer saved the metadata as a tab-separated file and added it to Unix along with the abstract files. A Perl script (mkxml2.pl) transformed the metadata into DC XML and created the archive directories for load- ing the metadata and abstract files into the Knowledge Bank. The script divided the directories into separate load sets for each of the six collections and accounted for the inconsistent naming of the abstract files. The script added the constant data for type and language that was not included in the vendor-supplied metadata. Unlike the OJS project, where multiple authors were on the same line of the metadata file, the MSS phase 2 script had to code for authors and their affiliations on separate lines. Once the load sets were made, the OIT systems devel- oper ran a shell script to load them. The script (import_ collections.sh) was used to run the load for each set so that the DSpace item import command did not need to be constructed each time. annual Mss Batch Loads A new workflow was developed for batch loading the annual MSS collection additions. The metadata and item files for the annual collection additions are supplied by the MSS community. The community provides the Symposium metadata in a CSV file and the item files in a Tar archive file. The Symposium uses a Web form for LaTeX–formatted abstract submissions. The community processes the electronic Symposium submissions with a Perl script to create the CSV file. The metadata delivered in the CSV file is based on the template created by the author, which details the metadata requirements for the project. The OIT systems developer borrowed from and modi- fied earlier Perl scripts to create a new script for batch processing the metadata and files for the annual Symposium collection additions. To assist with the development of the new script, I provided the developer a mapping of the community CSV headings to the Knowledge Bank DC fields. I also provided a sample DC XML file to illustrate the desired result of the Perl transformation of the com- munity metadata into DC XML. For each new year of the Symposium, I create a sample DC XML result for an item to check the accuracy of the script. A DC XML example from a 2009 MSS item is included in appendix C, available on the ITALica weblog. Unlike the previous retrospective MSS loads in which the script processed multiple years of the Symposium, the new script processes one year at a time. The annual Symposiums are batch loaded indi- vidually into one existing MSS decade collection. The new script for the annual loads was tested and refined by load- ing the 2005 Symposium into the staging instance of the 122 inFoRMation technoLogY and LiBRaRies | septeMBeR 2010 ■■ Summary and Conclusion Each of the batch loads that used Perl scripts had its own unique features. The format of content and associ- ated metadata varied considerably, and custom scripts to convert the content and metadata into the DSpace import format were created on a case-by-case basis. The differ- ences between batch loads included the delivery format of the metadata, the fields of metadata supplied, how metadata values were delimited, the character set used for the metadata, the data used to uniquely identify the files to be loaded, and how repeating metadata fields were identi- fied. Because of the differences in supplied metadata, a separate Perl script for generating the DC XML and archive directory for batch loading was written for each project. Each new Perl script borrowed from and modified earlier scripts. Many of the early batch loads were firsts for the Knowledge Bank and the staff working with the reposi- tory, both in terms of content and in terms of metadata. Dealing with community- and vendor-supplied metadata and various encodings (including LaTeX), each of the early loads encountered different data obstacles, and in each case solutions were written in Perl. The batch loading code has matured over time, and the progression of improvements is evident in the example scripts included in the appendixes. Batch loading can greatly reduce the time it takes to add content and metadata to a repository, but successful Knowledge Bank. Problems encountered with character encoding and file types were resolved by modifying the script. The metadata and files for the Symposium years 2005, 2006, and 2007 were made available to OSUL in 2007, and each year was individually loaded into the existing Knowledge Bank col- lection for that decade. These first three years of community-supplied CSV files contained author metadata inconsistent with Knowledge Bank author entries. The names were in direct order, upper- case, split by either a semicolon or “and,” and included extraneous data, such as an address. The OIT systems developer wrote a Perl script to correct the author metadata as part of the batch loading workflow. An annotated section of that script illustrating the author modifica- tions is included in appendix D, available on the ITALica weblog. The MSS com- munity revised the Perl script they used to generate the CSV files by including an edited version of this author entry cor- rection script and were able to provide the expected author data for 2008 and 2009. The author entries received for these years were in inverted order (surname first) and mixed case, were semicolon separated, and included no extraneous data. The receipt of consistent data from the community for the last two years has facilitated the stan- dardized workflow for the annual MSS loads. The scripts used to batch load the 2009 Symposium year are included in appendix E, which appears at the end of this text. The OIT systems developer unpacked the Tar file of abstracts and presentations into a directory named for the year of the Symposium on Unix. The Perl script written for the annual MSS loads (mkxml. pl) was saved on Unix and renamed mkxml2009.pl. The script was edited for 2009 (including the name of the CSV file and the location of the directories for the unpacked files and generated XML). The CSV headings used by the community in the new file were checked and verified against the extract list in the script. Once the Perl script was up-to-date and the base directory was created, the OIT systems developer ran the Perl script to gener- ate the archive directory set for import. The import.sh script was then edited for 2009 and run to import the new Symposium year into the staging instance of the Knowledge Bank as a quality check prior to loading into the live repository. The brief item view of an example MSS 2009 item archived in the Knowledge Bank is shown in figure 3. Figure 3. MSS 2009 archived item example Batch Loading coLLections into dspace | WaLsh 123 Proceedings of the 2003 International Conference on Dublin Core and Metadata Applications: Supporting Com- munities of Discourse and Practice—Metadata Research & Applications, Seattle, Washington, 2003, http://dcpapers .dublincore.org/ojs/pubs/article/view/753/749 (accessed Dec. 21, 2009). 3. R. Mishra et al., “Development of ETD Repository at IITK Library using DSpace,” in International Conference on Semantic Web and Digital Libraries (ICSD-2007), ed. A. R. D. Prasad and Devika P. Madalli (2007), 249–59. http://hdl.handle .net/1849/321 (accessed Dec. 21, 2009). 4. Todd M. Mundle, “Digital Retrospective Conversion of Theses and Dissertations: An In House Project” (paper presented to the 8th International Symposium on Electronic Theses & Dis- sertations, Sydney, Australia, Sept. 28–30, 2005), http://adt.caul .edu.au/etd2005/papers/080Mundle.pdf (accessed Dec. 21, 2009). 5. Rowan Brownlee, “Research Data and Repository Meta- data: Policy and Technical Issues at the University of Sydney Library,” Cataloging & Classification Quarterly 47, no. 3/4 (2009): 370–79. 6. Steve Thomas, “Importing MARC Data into DSpace,” 2006, http://hdl.handle.net/2440/14784 (accessed Dec. 21, 2009). 7. Sarah Kim, Lorraine A. Dong, and Megan Durden, “Auto- mated Batch Archival Processing: Preserving Arnold Wesker’s Digital Manuscripts,” Archival Issues 30, no. 2 (2006): 91–106. 8. Elspeth Healey, Samantha Mueller, and Sarah Ticer, “The Paul N. Banks Papers: Archiving the Electronic Records of a Digitally-Adventurous Conservator,” 2009, https://pacer .ischool.utexas.edu/bitstream/2081/20150/1/Paul_Banks_ Final_Report.pdf (accessed Dec. 21, 2009); Lisa Schmidt, “Pres- ervation of a Born Digital Literary Genre: Archiving Legacy Macintosh Hypertext Files in DSpace,” 2007, https://pacer .ischool.utexas.edu/bitstream/2081/9007/1/MJ%20WBO%20 Capstone%20Report.pdf (accessed Dec. 21, 2009). 9. Rachel E. Proudfoot et al., “JISC Final Report: IncReASe (Increasing Repository Content through Automation and Ser- vices),” 2009, http://eprints.whiterose.ac.uk/9160/ (accessed Dec. 21, 2009). 10. Michael Witt and Mark P. Newton, “Preparing Batch Deposits for Digital Commons Repositories,” 2008, http://docs .lib.purdue.edu/lib_research/96/ (accessed Dec. 21, 2009). 11. Lesley Drysdale, “Importing Records from Reference Man- ager into GNU EPrints,” 2004, http://hdl.handle.net/1905/175 (accessed Dec. 21, 2009). 12. R. John Robertson, “Evaluation of Metadata Workflows for the Glasgow ePrints and DSpace Services,” 2006, http://hdl .handle.net/1905/615 (accessed Dec. 21, 2009); William J. Nixon and Morag Greig, “Populating the Glasgow ePrints Service: A Mediated Model and Workflow,” 2005, http://hdl.handle .net/1905/387 (accessed Dec. 21, 2009). 13. Tim Ribaric, “Automatic Preparation of ETD Material from the Internet Archive for the DSpace Repository Platform,” Code4Lib Journal no. 8 (Nov. 23, 2009), http://journal.code4lib.org/ articles/2152 (accessed Dec. 21, 2009). 14. Randall Floyd, “Automated Electronic Thesis and Disser- tations Ingest,” (Mar. 30, 2009), http://wiki.dlib.indiana.edu/ confluence/x/01Y (accessed Dec. 21, 2009). 15. Shawn Averkamp and Joanna Lee, “Repurposing Pro- batch loading workflows are dependent upon the quality of data and metadata loaded. Along with testing scripts and checking imported metadata by first batch loading to a development or staging environment, quality control of the supplied metadata is an integral step. The flexibility of Perl allowed testing and revising to accommodate prob- lems encountered with how the metadata was supplied for the heterogeneous collections batch loaded into the Knowledge Bank. However, toward the goal of standard- izing batch loading workflows, the staff working with the Knowledge Bank iteratively refined not only the scripts but also the metadata requirements for each project and how those were communicated to the data suppliers with mappings, explicit metadata examples, and sample desired results. The efficiency of batch loading workflows is greatly enhanced by consistent data and basic stan- dards for how metadata is supplied. Batch loading is not only an extremely efficient means of populating an institutional repository, it is also a value- added service that can increase buy-in from the wider campus community. It is hoped that by openly sharing examples of our batch loading scripts we are contributing to the development of an open library of code that can be borrowed and adapted by the library community toward future institutional repository success stories. ■■ Acknowledgments I would like to thank Conrad Gratz, of OSU OIT, and Andrew Wang, formerly of OSUL. Gratz wrote the shell scripts and the majority of the Perl scripts used for auto- mating the Knowledge Bank item import process and ran the corresponding batch loads. The early Perl scripts used for batch loading into the Knowledge Bank, including the first phase of OJS and MSS, were written by Wang. Parts of those early Perl scripts written by Wang were borrowed for subsequent scripts written by Gratz. Gratz provided the annotated scripts appearing in the appendixes and consulted with the author regarding the description of the scripts. I would also like to thank Amanda J. Wilson, a for- mer metadata librarian for OSUL, who was instrumental to the success of many of the batch loading workflows created for the Knowledge Bank. References and Notes 1. The Ohio State University Knowledge Bank, “Institu- tional Repository Policies,” 2007, http://library.osu.edu/sites/ kbinfo/policies.html (accessed Dec. 21, 2009). The Knowledge Bank homepage can be found at https://kb.osu.edu/dspace/ (accessed Dec. 21, 2009). 2. Margret Branschofsky et al., “Evolving Meta- data Needs for an Institutional Repository: MIT’s DSpace,” 124 inFoRMation technoLogY and LiBRaRies | septeMBeR 2010 Appendix E. MSS 2009 Batch Loading Scripts -- mkxml2009.pl -- #!/usr/bin/perl use Encode; # Routines for UTF encoding use Text::xSV; # Routines to process CSV files. use File::Basename; # Open and read the comma separated metadata file. my $csv = new Text::xSV; #$csv->set_sep(' '); # Use for tab separated files. $csv->open_file("MSS2009.csv"); $csv->read_header(); # Process the CSV column headers. # Constants for file and directory names. $basedir = "/common/batch/input/mss/"; $indir = "$basedir/2009"; $xmldir= "./2009xml"; $imagesubdir= "processed_images”; $filename = "dublin_core.xml"; # Process each line of metadata, one line per item. $linenum = 1; while ($csv->get_row()) { # This divides the item's metadata into fields, each in its own variable. my ( $identifier, $title, $creators, $description_abstract, $issuedate, $description, $description2, Appendixes A–D available at http://ital-ica.blogspot.com/ Quest Metadata for Batch Ingesting ETDs into an Institutional Repository,” Code4Lib Journal no. 7 (June 26, 2009), http://journal .code4lib.org/articles/1647 (accessed Dec. 21, 2009). 16. Tim Brody, Registry of Open Access Repositories (ROAR), http://roar.eprints.org/ (accessed Dec. 21, 2009). 17. DuraSpace, DSpace, http://www.dspace.org/ (accessed Dec. 21, 2009). 18. Dublin Core Metadata Initiative Libraries Working Group, “DC-Library Application Profile (DC-Lib),” http://dublincore .org/documents/2004/09/10/library-application-profile/ (accessed Dec. 21, 2009). 19. The Ohio State University Knowledge Bank Policy Com- mittee, “OSU Knowledge Bank Metadata Application Profile,” http://library.osu.edu/sites/techservices/KBAppProfile.php (accessed Dec. 21, 2009). 20. Ohio Journal of Science (Ohio Academy of Sci- ence), Knowledge Bank community, http://hdl.handle .net/1811/686 (accessed Dec. 21, 2009); OSU International Sym- posium on Molecular Spectroscopy, Knowledge Bank commu- nity, http://hdl.handle.net/1811/5850 (accessed Dec. 21, 2009). 21. Ohio Journal of Science (Ohio Academy of Science), Ohio Journal of Science: Volume 74, Issue 3 (May, 1974), Knowledge Bank collection, http://hdl.handle.net/1811/22017 (accessed Dec. 21, 2009). Batch Loading coLLections into dspace | WaLsh 125 $abstract, $gif, $ppt, ) = $csv->extract( "Talk_id", "Title", "Creators", "Abstract", "IssueDate", "Description", "AuthorInstitution", "Image_file_name", "Talk_gifs_file", "Talk_ppt_file" ); $creatorxml = ""; # Multiple creators are separated by ';' in the metadata. if (length($creators) > 0) { # Create XML for each creator. @creatorlist = split(/;/,$creators); foreach $creator (@creatorlist) { if (length($creator) > 0) { $creatorxml .= '' .$creator.’’.”\n “; } } } # Done processing creators for this item. # Create the XML string for the Abstract. $abstractxml = ""; if (length($description_abstract) > 0) { # Convert special metadata characters for use in xml/html. $description_abstract =~ s/\&/&/g; $description_abstract =~ s/\>/>/g; $description_abstract =~ s/\' .$description_abstract.''; } # Create the XML string for the Description. $descriptionxml = ""; if (length($description) > 0) { # Convert special metadata characters for use in xml/html. $description=~ s/\&/&/g; $description=~ s/\>/>/g; $description=~ s/\' .$description.''; } Appendix E. MSS 2009 Batch Loading Scripts (cont.) 126 inFoRMation technoLogY and LiBRaRies | septeMBeR 2010 # Create the XML string for the Author Institution. $description2xml = ""; if (length($description2) > 0) { # Convert special metadata characters for use in xml/html. $description2=~ s/\&/&/g; $description2=~ s/\>/>/g; $description2=~ s/\' .'Author Institution: '.$description2.''; } # Convert special characters in title. $title=~ s/\&/&/g; $title=~ s/\>/>/g; $title=~ s/\:encoding(UTF-8)", "$basedir/$subdir/$filename"); print fh <<"XML"; $identifier $title $issuedate $abstractxml $descriptionxml $description2xml Article en $creatorxml XML close($fh); # Create contents file and move files to the load set. # Copy item files into the load set. if (defined($abstract) && length($abstract) > 0) { system "cp $indir/$abstract $basedir/$subdir"; } $sourcedir = substr($abstract, 0, 5); if (defined($ppt) && length($ppt) > 0 ) { system "cp $indir/$sourcedir/$sourcedir/*.* $basedir/$subdir/"; } if (defined($gif) && length($gif) > 0 ) { system "cp $indir/$sourcedir/$imagesubdir/*.* $basedir/$subdir/"; } # Make the 'contents' file and fill it with the file names. Appendix E. MSS 2009 Batch Loading Scripts (cont.) Batch Loading coLLections into dspace | WaLsh 127 system "touch $basedir/$subdir/contents"; if (defined($gif) && length($gif) > 0 && -d "$indir/$sourcedir/$imagesubdir" ) { # Sort items in reverse order so they show up right in DSpace. # This is a hack that depends on how the DB returns items # in unsorted (physical) order. There are better ways to do this. system "cd $indir/$sourcedir/$imagesubdir/;" . " ls *[0-9][0-9].* | sort -r >> $basedir/$subdir/contents"; system "cd $indir/$sourcedir/$imagesubdir/;" . " ls *[a-zA-Z][0-9].* | sort -r >> $basedir/$subdir/contents"; } if (defined($ppt) && length($ppt) > 0 && -d "$indir/$sourcedir/$sourcedir" ) { system "cd $indir/$sourcedir/$sourcedir/;" . " ls *.* >> $basedir/$subdir/contents"; } # Put the Abstract in last, so it displays first. system "cd $basedir/$subdir; basename $abstract >>" . " $basedir/$subdir/contents"; $linenum++; } # Done processing an item. -------------------------------------------------------------------------------------------------- -- import.sh –- #!/bin/sh # # Import a collection from files generated on dspace # COLLECTION_ID=1811/6635 EPERSON=[name removed]@osu.edu SOURCE_DIR=./2009xml BASE_ID=`basename $COLLECTION_ID` MAPFILE=./map-dspace03-mss2009.$BASE_ID /dspace/bin/dsrun org.dspace.app.itemimport.ItemImport --add --eperson=$EPERSON --collection=$COLLECTION_ID --source=$SOURCE_DIR --mapfile=$MAPFILE Appendix E. MSS 2009 Batch Loading Scripts (cont.) 3139 ---- tHe Next GeNerAtioN liBrArY cAtAloG | YANG AND HoFMANN 141 Sharon Q. Yang and Melissa A. Hofmann The Next Generation Library Catalog: A Comparative Study of the OPACs of Koha, Evergreen, and Voyager Open source has been the center of attention in the library world for the past several years. Koha and Evergreen are the two major open-source integrated library sys- tems (ILSs), and they continue to grow in maturity and popularity. The question remains as to how much we have achieved in open-source development toward the next-generation catalog compared to commercial systems. Little has been written in the library literature to answer this question. This paper intends to answer this question by comparing the next-generation features of the OPACs of two open-source ILSs (Koha and Evergreen) and one proprietary ILS (Voyager’s WebVoyage). M uch discussion has occurred lately on the next- generation library catalog, sometimes referred to as the Library 2.0 catalog or “the third generation catalog.”1 Different and even conflicting expectations exist as to what the next-generation library catalog comprises: In two sentences, this catalog is not really a catalog at all but more like a tool designed to make it easier for students to learn, teachers to instruct, and scholars to do research. It provides its intended audience with a more effective means for finding and using data and information.2 Such expectations, despite their vagueness, eventually took concrete form in 2007.3 Among the most prominent features of the next-generation catalog are a simple keyword search box, enhanced browsing possibilities, spelling corrections, relevance ranking, faceted naviga- tion, federated search, user contribution, and enriched content, just to mention a few. Over the past three years, libraries, vendors, and open-source communities have intensified their efforts to develop OPACs with advanced features. The next-generation catalog is becoming the cur- rent catalog. The library community welcomes open-source integrated library systems (ILSs) with open arms, as evi- denced by the increasing number of libraries and library consortia that have adopted or are considering open- source options, such as Koha, Evergreen, and the Open Library Environment Project (OLE Project). Librarians see a golden opportunity to add features to a system that will take years for a proprietary vendor to develop. Open-source OPACs, especially that of Koha, seem to be more innovative than their long-established propri- etary counterparts, as our investigation shows in this paper. Threatened by this phenomenon, ILS vendors have rushed to improve their OPACs, modeling them after the next-generation catalog. For example, Ex Libris pushed out its new OPAC, WebVoyage 7.0, in August of 2008 to give its OPAC a modern touch. One interesting question remains. In a competition for a modernized OPAC, which OPAC is closest to our visions for the next-generation library catalog: open- source or proprietary? The comparative study described in this article was conducted in the hope of yielding some information on this topic. For libraries facing options between open-source and proprietary systems, “a thor- ough process of evaluating an integrated library system (ILS) today would not be complete without also weighing the open source ILS products against their proprietary counterparts.”3 ■■ Scope and Purpose of the Study The purpose of the study is to determine which OPAC of the three ILSs—Koha, Evergreen, or WebVoyage—offers more in terms of services and is more comparable to the next-generation library catalog. The three systems include two open-source and one proprietary ILSs. Koha and Evergreen are chosen because they are the two most popular and fully developed open-source ILSs in North America. At the time of the study, Koha had 936 implementations worldwide; Evergreen had 543 library users.4 We chose WebVoyage for comparison because it is the OPAC of the Voyager ILS by Ex Libris, the biggest ILS vendor in terms of personnel and marketplace.5 It also is one of the more popular ILSs in North America, with a customer base of 1,424 libraries, most of which are academic.6 As the sample only includes three ILSs, the study is very limited in scope, and the findings cannot be extrapolated to all open-source and proprietary cata- logs. But, hopefully, readers will gain some insight into how much progress libraries, vendors, and open-source communities have achieved toward the next-generation catalog. ■■ Literature Review A review of the library literature found two relevant studies on the comparison of OPACs in recent years. The first study was conducted by two librarians in Slovenia investigating how much progress libraries had made toward the next-generation catalog.7 Six online catalogs sharon Q. Yang (yangs@rider.edu) is Systems librarian and Melissa A. Hofmann (mhofmann@rider.edu) is Bibliographic Control librarian, rider university. 142 iNForMAtioN tecHNoloGY AND liBrAries | septeMBer 2010 were examined and evaluated, including WorldCat, the Slovene union catalog COBISS, and those of four public libraries in the United States. The study also compared services provided by the library catalogs in the sample with those offered by Amazon. The comparison took place primarily in six areas: search, presentation of results, enriched content, user participation, personaliza- tion, and Web 2.0 technologies applied in OPACs. The authors gave a detailed description of the research results supplemented by tables and snapshots of the catalogs in comparison. The findings indicated that “the progress of library catalogues has really been substantial in the last few years.” Specifically, the library catalogues have made “the best progress on the content field and the least in user participation and personalization.” When compared to services offered by Amazon, the authors concluded that “none of the six chosen catalogues offers the com- plete package of examined options that Amazon does.”8 In other words, library catalogs in the sample still lacked features compared to Amazon. The other comparative study was conducted by Linda Riewe, a library school student, in fulfillment for her master’s degree from San Jose University. The research described in her thesis is a questionnaire sur- vey targeted at 361 libraries that compares open-source (specifically, Koha and Evergreen) and propriety ILSs in North America. More than twenty proprietary systems were covered, including Horizon, Voyager, Millennium, Polaris, Innopac, and Unicorn.9 Only a small part of her study was related to OPACs. It involved three questions about OPACs and asked librarians to evaluate the ease of use of their ILS OPAC’s search engines, their OPAC search engine’s completeness of features, and their per- ception of how easy it is for patrons to make self-service requests online for renewals and holds. A scale of 1 to 5 was used (1 = least satisfied; 5= very satisfied) regarding the three aspects of OPACs. The mean and medium satis- faction ratings for open-source OPACs were higher than those of proprietary ones. Koha’s OPAC was ranked 4.3, 3.9, and 3.9, respectively in mean, the highest on the scale in all three categories, while the proprietary OPACs were ranked 3.9, 3.6, and 3.6.10 Evergreen fell in the middle, still ahead of proprietary OPACs. The findings reinforced the perception that open-source catalogs, especially Koha, offer more advanced features than proprietary ones. As Riewe’s study focused more on the cost and user satisfac- tion with ILSs, it yielded limited information about the connected OPACs. No comparative research has measured the progress of open-source versus proprietary catalogs toward the next-generation library catalog. Therefore the comparison described in this paper is the first of its kind. As only Koha, Everygreen, and Voyager’s OPACs are examined in this paper, the results cannot be extrapolated. Studies on a larger scale are needed to shed light on the progress librarians have made toward the next-generation catalog. ■■ Method The first step of the study was identifing and defin- ing of a set of measurements by which to compare the three OPACs. A review of library literature on the next-generation library catalog revealed different and somewhat conflicting points of views as to what the next- generation catalog should be. As Marshall Breeding put it, “There isn’t one single answer. We will see a number of approaches, each attacking the problem somewhat dif- ferently.”11 This study decided to use the most commonly held visions, which are summarized well by Breeding and by Morgan’s LITA executive summary.12 The ten parameters identified and used in the comparison were taken primarily from Breeding’s introduction to the July/ August 2007 issue of Library Technology Reports, “Next- Generation Library Catalogs.”13 The ten features reflect some librarians’ visions for a modern catalog. They serve as additions to, rather than replacements of, the feature sets commonly found in legacy catalogs. The following are the definitions of each measurement: ■■ A single point of entry to all library information: “Information” refers to all library resources. The next-generation catalog contains not only biblio- graphical information about printed books, video tapes, and journal titles but also leads to the full text of all electronic databases, digital archives, and any other library resources. It is a federated search engine for one-stop searching. It not only allows for one search leading to a federation of results, it also links to full-text electronic books and journal articles and directs users to printed materials. ■■ State-of-the-art Web interface: Library catalogs should be “intuitive interfaces” and “visually appealing sites” that compare well with other Internet search engines.14 A library’s OPAC can be intimidating and complex. To attract users, the next-generation catalog looks and feels similar to Google, Amazon, and other popular websites. This criterion is highly subjective, however, because some users may find Google and Amazon anything but intuitive or appealing. The underlying assumption is that some Internet search engines are popular, and a library catalog should be similar to be popular themselves. ■■ Enriched content: Breeding writes, “Legacy catalogs tend to offer text-only displays, drawing only on the MARC record. A next-generation catalog might bring in content from different sources to strengthen the visual appeal and increase the amount of informa- tion presented to the user.”15 The enriched content tHe Next GeNerAtioN liBrArY cAtAloG | YANG AND HoFMANN 143 includes images of book covers, CD and movie cases, tables of contents, summaries, reviews, and photos of items that traditionally are not present in legacy catalogs. ■■ Faceted navigation: Faceted navigation allows users to narrow their search results by facets. The types of facets may include subjects, authors, dates, types of materials, locations, series, and more. Many dis- covery tools and federated search engines, such as Villanova University’s VuFind and Innovative Interface’s Encore, have used this technology in searches.16 Auto-Graphics also applied this feature in their OPAC, AGent Iluminar.17 ■■ Simple keyword search box: The next-generation catalog looks and feels like popular Internet search engines. The best example is Google’s simple user interface. That means that a simple keyword search box, instead of a controlled vocabulary or specific-field search box, should be presented to the user on the opening page with a link to an advanced search for user in need of more complex searching options. ■■ Relevancy: Traditional ranking of search results is based on the frequency and positions of terms in bibliographical records during keyword searches. Relevancy has not worked well in OPACs. In addi- tion, popularity is another factor that has not been taken into consideration in relevancy ranking. For instance, “When ranking results from the library’s book collection, the number of times that an item has been checked out could be considered an indicator of popularity.”18 By the same token, the size and font of tags in a tag cloud or the number of comments users attach to an item may also be considered relevant in ranking search results. So far, almost no OPACs are capable of incorporating circulation statistics into relevancy ranking. ■■ “Did you mean . . . ?”: When a search term is not spelled correctly or nothing is found in the OPAC in a keyword search, the spell checker will kick in and suggest the correct spelling or recommend a term that may match the user’s intended search term. For exam- ple, a modern catalog may generate a statement such as “Did you mean . . . ?” or “Maybe you meant . . . .” This may be a very popular and useful service in modern OPACs. ■■ Recommendations and related materials: The next- generation catalog is envisioned as promoting read- ing and learning by making recommendations of additional related materials to patrons. This feature is an imitation of Amazon and websites that promote selling by stating “Customers who bought this item also bought . . . .” Likewise, after a search in the OPAC, a statement such as “Patrons who borrowed this book also borrowed the following books . . .” may appear. ■■ User contribution—ratings, reviews, comments, and tag- ging: Legacy catalogs only allow catalogers to add content. In the next-generation catalog, users can be active contributors to the content of the OPAC. They can rate, write reviews, tag, and comment on items. User contribution is an important indicator for use and can be used in relevancy ranking. ■■ RSS feeds: The next-generation catalog is dynamic because it delivers lists of new acquisitions and search updates to users through RSS feeds. Modern catalogs are service-oriented; they do more than pro- vide a simple display search results. The second step is to apply these ten visions to the OPACs of Koha, Evergreen, and WebVoyage to determine if they are present or absent. The OPACs used in this study included three examples from each system. They may have been product demos and live catalogs ran- domly chosen from the user list on the product websites. The latest releases at the time of the study was Koha 3.0, Evergreen 2.0, WebVoyage 7.1. In case of discrepancies between product descriptions and reality, we gave pre- cedence to reality over claims. In other words, even if the product documentation lists and describes a feature, this study does not include it if the feature is not in action either in the demo or live catalogs. Despite the fact that a planned future release of one of those investigated OPACs may add a feature, this study only recorded what existed at the time of the comparison. The following are the OPACs examined in this paper. Koha ■■ Koho Demo for Academic Libraries: http://academic .demo.kohalibrary.com/ ■■ Wagner College: http://wagner.waldo.kohalibrary .com/ ■■ Clearwater Christian College: http://ccc.kohalibrary .com/ evergreen ■■ Evergreen Demo: http://demo.gapines.org/opac/ en-US/skin/default/xml/index.xml ■■ Georgia PINES: http://gapines.org/opac/en-US/ skin/default/xml/index.xml ■■ Columbia Bible College at http://columbiabc .evergreencatalog.com/opac/en-CA/skin/default/ xml/index.xml webVoyage ■■ Rider University Libraries: http://voyager.rider.edu ■■ Renton College library: http://renton.library.ctc .edu/vwebv/searchBasic 144 iNForMAtioN tecHNoloGY AND liBrAries | septeMBer 2010 ■■ Shoreline College library: http://shoreline.library .ctc.edu/vwebv/searchBasic The final step includes data collection and compila- tion. A discussion of findings follows. The study draws conclusions about which OPAC is more advanced and has more features of the next-generation library catalog. ■■ Findings Each of the OPACs of Koha, Evergreen, and WebVoyage are examined for the presence of the ten features of the next-generation catalog. single point of entry for All library information None of the OPACs of the three ILSs provides true fed- erated searching. To varying degrees, each is limited in access, showing an absence of contents from elec- tronic databases, digital archives, and other sources that generally are not located in the legacy catalog. Of the three, Koha is more advanced. While WebVoyage and Evergreen only display journal-holdings information in their OPACs, Koha links journal titles from its catalog to ProQuest’s Serials Solutions, thus leading users to full- text journals in the electronic databases. The example in figure 1 (Koha demo) shows the journal title Unix Update with an active link to the full-text journal in the availabil- ity field. The link takes patrons to Serials Solutions, where full text at the journal-title level is listed for each database (see figure 2). Each link will take you into the full text in each database. state-of-the-Art web interface As beauty is in the eye of the beholder, the interface of a catalog can be appealing to one user but prohibitive to another. With this limitation in mind, the out-of-the- box user interface at the demo sites was considered for each OPAC. All the three catalogs have the Google-like simplicity in presentation. All of the user interfaces are highly customizable. It largely depends on the library to make the user interface appealing and welcoming to users. Figures 3–5 show snapshots from each ILSs demo sites and have not been customized. However, there are a few differences in the “state of the art.” For one, Koha’s navigation between screens relies solely on the browser’s Forward and Back buttons, while WebVoyage and Evergreen have internal naviga- tion buttons that more efficiently take the user between title lists, headings lists, and record displays, and between records in a result set. While all three OPACs offer an advanced search page with multiple boxes for entering search terms, only WebVoyage makes the relationship between the terms in different boxes clear. By the use of a drop-down box, it makes explicit that the search terms are by default ANDed and also allows for the selection of OR and NOT. In Koha’s and Evergreen’s advanced search, however, the terms are ANDed only, a fact that is not at all obvious to the user. In the demo OPACs examined, there is no option to choose OR or NOT between rows, nor is there any indication that the search is ANDed. The point of providing multiple search boxes is to guide users in constructing a Boolean search without their having to worry about operators and syntax. In Koha, however, users have to type an OR or NOT statement themselves within the text box, thus defeating the purpose of hav- ing multiple boxes. While Evergreen allows for a NOT construction within a row (“does not contain”), it does not provide an option for OR (“contains” and “matches exactly” are the other two options available). See figures Figure 1. Link to full-text journals in Serials Solutions in Koha Figure 2. Links to Serials Solutions from Koha tHe Next GeNerAtioN liBrArY cAtAloG | YANG AND HoFMANN 145 6–8. Thus Koha’s and Evergreen’s advanced search is less than intuitive for users and certainly less functional than WebVoyage’s. enriched content To varying degrees, enriched content is present in all three catalogs, with Koha providing the most. While all three catalogs have book covers and movie-container art, Koha has much more in its catalog. For instance, it displays tags, descriptions, comments, and Amazon reviews. WebVoyage displays links to Google Books for book reviews and content summaries but does not have tags, descriptions, and comments in the catalog. See fig- ures 9–11. Faceted Navigation The Koha OPAC is the only catalog of the three to offer faceted navigation. The “Refine your search” feature allows users to narrow search results by availability, places, libraries, authors, topics, and series. Clicking on a term within a facet adds that term to the search query and generates a narrower list of results. The user may then choose another facet to further refine the search. While Evergreen appears to have faceted navigation upon first glance, it actually does not possess this feature. The following facets appear after a search generates hits: “Relevant subjects,” “Relevant authors,” and “Relevant series.” But choosing a term within a facet does not nar- row down the previous search. Instead, it generates an entirely new search with the selected term; it does not add the new term to the previous query. Users must manually combine the terms in the simple search box or through the advanced search page. WebVoyage also does not offer faceted navigation—it only provides an option to “Filter your search” by format, language, and date when a set of results is returned. See figures 12–14. Keyword searching Koha, Evergreen, and WebVoyage all present a simple keyword search box with a link to the advanced search (see figures 3–5). relevancy Neither Koha, Evergreen, nor WebVoyage provide any evidence for meeting the criteria of the next-gener- ation catalog’s more inclusive vision of relevancy ranking, such as accounting for an item’s popularity or allowing user tags. Koha uses Index Data’s Zebra program for its relevance ranking, which “reads structured records in a variety of input formats . . . and allows access to them through exact boolean search Figure 3. Koha: state-of-the-art user interface Figure 5. Voyager: state-of-the-art user interface Figure 4. Evergreen: state-of-the-art user interface 146 iNForMAtioN tecHNoloGY AND liBrAries | septeMBer 2010 user contributions Koha is the only system of the three that allows users to add tags, comments, descriptions, and reviews. In Koha’s OPAC, user-added tags form tag clouds, and the font and size of each keyword or tag indicate that keyword or Figure 6. Voyager advanced search Figure 7. Koha advanced search Figure 8. Evergreen advanced search expressions and relevance-ranked free-text queries.19 Evergreen’s DokuWiki states that the base relevancy score is determined by the cover density of the searched terms. After this base score is determined, items may receive score bumps based on word order, matching on the first word, and exact matches depending on the type of search performed.20 These statements do not indicate that either Koha or Evergreen go beyond the traditional relevancy-ranking methods of legacy systems, such as WebVoyage. Did You Mean . . . ? Only Evergreen has a true “Did you mean . . . ?” feature. When no hits are returned, Evergreen provides a sug- gested alternate spelling (“Maybe you meant . . . ?”) as well as a suggested additional search (“You may also like to try these related searches . . .”). Koha has a spell-check feature, but it automatically normalizes the search term and does not give the option of choosing different one. This is not the same as a “Did you mean . . . ?” feature as defined above. While the normalizing process may be seamless, it takes the power of choice away from the user and may be problematic if a particular alternative spelling or misspelling is searched purposefully, such as “womyn.” (When “womyn” is searched as a keyword in the Koha demo OPAC, 16,230 hits are returned. This catalog does not appear to contain the term as spelled, which is why it is normalized to women. The fact that the term does not appear as is may not be transparent to the searcher.) With normalization, the user may also be unaware that any mistake in spelling has occurred, and the number of hits may differ between the correct spelling and the normalized spelling, potentially affect- ing discovery. The normalization feature also only works with particular combinations of misspellings, where let- ter order affects whether a match is found. Otherwise the system returns a “No result found!” message with no suggestions offered. (Try “homoexuality” vs. “homo- exsuality.” In Koha’s demo OPAC, the former, with a missing “s,” yields 553 hits, while the latter, with a mis- placed “s,” yields none.) However, Koha is a step ahead of WebVoyage, which has no built-in spell checker at all. If a search fails, the system returns the message “Search Resulted in No Hits.” See figures 15–17. recommendations/related Materials None of the three online catalogs can recommend materi- als for users. tHe Next GeNerAtioN liBrArY cAtAloG | YANG AND HoFMANN 147 Figure 9. Koha enriched content Figure 10. Evergreen enriched content Figure 11. Voyager enriched content Figure 12. Koha faceted navigation Figure 13. Evergreen faceted navigation Figure 14. Voyager faceted navigation 148 iNForMAtioN tecHNoloGY AND liBrAries | septeMBer 2010 Nevertheless, the user contribution in the Koha OPAC is not easy to use. It may take many clicks before a user can figure out how to add or edit text. It requires user login, and the system cannot keep track of the search hits after a login takes place. Therefore the user contribution features of Koha need improvement. See figure 18. rss feeds Koha provides RSS feeds, while Evergreen and WebVoyage do not. ■■ Conclusion Table 1 is a summary of the comparisons in this paper. These comparisons show that the Koha OPAC has six out of the ten compared features for the next-generation catalog, plus two halves. Its full-fledged features include state-of-the-art Web interface, enriched content, faceted navigation, a simple keyword search box, user con- tribution, and RSS feeds. The two halves indicate the existence of a feature that is not fully developed. For instance, “Did you mean . . . ?” in Koha does not work the way the next-generation catalog is envisioned. In addition, Koha has the capability of linking journal titles to full text via Serials Solutions, while the other two OPACs only display holdings information. Evergreen falls into second place, providing four out of the ten compared features: state-of-the-art interface, enriched content, a keyword search box, and “Did you mean . . . ?” WebVoyage, the Voyager OPAC from Ex Libris, comes in third, providing only three out of the ten features for Figure 15. Evergreen: Did you mean . . . ? Figure 16. Koha: Did you mean . . . ? Figure 17. Voyager: Did you mean . . . ? Figure 18. Koha user contibutions tag’s frequency of use. All the tags in a tag cloud serve as hyperlinks to library materials. Users can write their own reviews to complement the Amazon reviews. All user-added reviews, descriptions, and comments have to be approved by a librarian before they are finalized for display in the OPAC. tHe Next GeNerAtioN liBrArY cAtAloG | YANG AND HoFMANN 149 the next-generation catalog. Based on the evidence, Koha’s OPAC is more advanced and innovative than Evergreen’s or Voyager’s. Among the three catalogs, the open-source OPACs compare more favorably to the ideal next-generation catalog then the proprietary OPAC. However, none of them is capable of federated searching. Only Koha offers faceted navigation. WebVoyage does not even provide a spell checker. The ILS OPAC still has a long way to go toward the next- generation catalog. Though this study samples only three catalogs, hopefully the findings will provide a glimpse of the current state of open-source versus proprietary catalogs. ILS OPACs are not comparable in features and functions to stand-alone OPACs, also referred to as “dis- covery tools” or “layers.” Some discovery tools, such as Ex Libris’ Primo, also are federated search engines and are modeled after the next-generation catalog. Recently they have become increasingly popular because they are bolder and more innovative than ILS OPACs. Two of the best stand-alone open-source OPACs are Villanova University’s VuFind and Oregon State University’s LibraryFind.21 Both boast eight out of ten features of the next-generation catalog.22 Technically it is easier to develop a new stand-alone OPAC with all the next-gen- eration catalog features than mending old ILS OPACs. As more and more libraries are disappointed with their ILS OPACs, more discovery tools will be implemented. Vendors will stop improving ILS OPACs and concentrate on developing better discovery tools. The fact that ILS OPACs are falling behind current trends may eventually bear no significance for libraries—at least for the ones that can afford the purchase or implementation of a more sophisticated discovery tool or stand-alone OPAC. Certainly small and public libraries who cannot afford a discovery tool or a programmer for an open-source OPAC overlay will suffer, unless market conditions change. References 1. Tanja Mercun and Maja Žumer, “New Generation of Cata- logues for the New Generation of Users: A Comparison of Six Library Catalogues,” Program: Electronic Library & Information Systems 42, no. 3 (July 2008): 243–61. 2. Eric Lease Morgan, “A ‘Next-Generation’ Library Catalog— Executive Summary (Part #1 of 5),” online posting, July 7, 2006, LITA Blog: Library Information Technology Association, http:// litablog.org/2006/07/07/a-next-generation-library-catalog -executive-summary-part-1-of-5/ (accessed Nov. 10, 2008). 3. Marshall Breeding, introduction to “Next Generation Library Catalogs,” Library Technology Reports 43, no. 4 (July/Aug. 2007): 5–14. 4. Ibid. 5. Marshall Breeding, “Library Technology Guides: Key Resources in the Field of Library Automation,” http:// www .librarytechnology.org/lwc-search-advanced.pl (accessed Jan. 23, 2010). 6. Marshall Breeding, “Investing in The Future: Automation Marketplace 2009,” Library Journal (Apr. 1, 2009), http:// www .libraryjournal.com/article/CA6645868.html (accessed Jan. 23, 2010). 7. Marshall Breeding, “Library Technology Guides: Com- pany Directory,” http://www.librarytechnology.org/exlibris .pl?SID=20100123734344482&code=vend (accessed Jan. 23, 2010). 8. Merčun and Zumer, “New Generation of Catalogues.” 9. Ibid. 10. Linda Riewe, “Integrated Library System (ILS) Survey: Open Source vs. Proprietary-Tables” (master’s thesis, San Jose University, 2008): 2–5, http://users.sfo.com/~lmr/ils-survey/ tables-all.pdf (accessed Nov. 4, 2008). 11. Ibid., 26–27. 12. Breeding, introduction. 13. Ibid.; Morgan, “A ‘Next-Generation’ Library Catalog.” 14. Breeding, introduction. 15. Ibid. 16. Ibid. 17. Villanova University, “VuFind,” http://vufind.org/ (accessed June 10, 2010); Innovated Interfaces, “Encore,” http:// encoreforlibraries.com/ (accessed June 10, 2010). 18. Auto-Graphics, “AGent Illuminar,” http://www4.auto -graphics.com/solutions/agentiluminar/agentiluminar.htm (accessed June 10, 2010). 19. Breeding, introduction; Morgan, “A ‘Next-Generation’ Table 1. Summary Features of the next generation catalog Koha Evergreen Voyager Single point of entry for all library information ûü û û State-of-the-art web interface ü ü ü Enriched content ü ü ü Faceted navigation ü û û Keyword search ü ü ü Relevancy û û û Did you mean…? üû ü û Recommended/ related materials û û û User contribution ü û û RSS feed ü û û 150 iNForMAtioN tecHNoloGY AND liBrAries | septeMBer 2010 22. Villanova University, “VuFind”; Oregon State University, “LibraryFind,” http://libraryfind.org/ (accessed June 10, 2010). 23. Sharon Q.Yang and Kurt Wagner, “Open Source Stand- Alone OPACs,” (Microsoft PowerPoint presentation, 2010 Virtual Academic Library Environment Annual Conference, Piscataway, New Jersey, Jan. 8, 2010). Library Catalog.” 20. Index Data, “Zebra,” http://www.indexdata.dk/zebra/ (accessed Jan. 3, 2009). 21. Evergreen Docuwiki, “Search Relevancy Ranking,” http://open-ils.org/dokuwiki/doku.php?id=scratchpad:opac_ demo&s=core (accessed Dec. 19, 2008). LITA cover 3, cover 4 YALSA cover 2 Index to Advertisers 3138 ---- 128 iNForMAtioN tecHNoloGY AND liBrAries | septeMBer 2010 Lynne Weber and Peg Lawrence Authentication and Access: Accommodating Public Users in an Academic World in Cook and Shelton’s Managing Public Computing, which confirmed the lack of applicable guidelines on academic websites, had more up-to-date information but was not available to the researchers at the time the project was initiated.2 In the course of research, the authors developed the following questions: ■■ How many ARL libraries require affiliated users to log into public computer workstations within the library? ■■ How many ARL libraries provide the means to authenticate guest users and allow them to log on to the same computers used by affiliates? ■■ How many ARL libraries offer open-access comput- ers for guests to use? Do these libraries provide both open-access computers and the means for guest user authentication? ■■ How do Federal Depository Library Program libraries balance their policy requiring computer authentica- tion with the obligation to provide public access to government information? ■■ Do computers provided for guest use (open access or guest login) provide different software or capabilities than those provided to affiliated users? ■■ How many ARL libraries have written policies for the use of open-access computers? If a policy exists, what is it? ■■ How many ARL libraries have written policies for authenticating guest users? If a policy exists, what is it? ■■ Literature Review Since the 1950s there has been considerable discussion within library literature about academic libraries serving “external,” “secondary,” or “outside” users. The subject has been approached from the viewpoint of access to the library facility and collections, reference assistance, interlibrary loan (ILL) service, borrowing privileges, and (more recently) access to computers and Internet privi- leges, including the use of proprietary databases. Deale emphasized the importance of public relations to the academic library.3 While he touched on creating bonds both on and off campus, he described the positive effect of “privilege cards” to community members.4 Josey described the variety of services that Savannah State College offered to the community.5 He concluded his essay with these words: Why cannot these tried methods of lending books to citizens of the community, story hours for children . . . , a library lecture series or other forum, a great books discussion group and the use of the library staff In the fall of 2004, the Academic Computing Center, a division of the Information Technology Services Department (ITS) at Minnesota State University, Mankato took over responsibility for the computers in the public areas of Memorial Library. For the first time, affiliated Memorial Library users were required to authenticate using a campus username and password, a change that effectively eliminated computer access for anyone not part of the university community. This posed a dilemma for the librarians. Because of its Federal Depository status, the library had a responsibility to pro- vide general access to both print and online government publications for the general public. Furthermore, the library had a long tradition of providing guest access to most library resources, and there was reluctance to aban- don the practice. Therefore the librarians worked with ITS to retain a small group of six computers that did not require authentication and were clearly marked for community use, along with several standup, open-access computers on each floor used primarily for searching the library catalog. The additional need to provide computer access to high school students visiting the library for research and instruction led to more discussions with ITS and resulted in a means of generating temporary usernames and passwords through a Web form. These user accommodations were implemented in the library without creating a written policy governing the use of open-access computers. O ver time, library staff realized that guidelines for guests using the computers were needed because of misuse of the open-access computers. We were charged with the task of drafting these guidelines. In typical librarian fashion, we searched websites, including those of Association of Research Libraries (ARL) members for existing computer access policies in academic libraries. We obtained very little information through this search, so we turned to ARL publications for assistance. Library Public Access Workstation Authentication by Lori Driscoll, was of greater benefit and offered much of the needed information, but it was dated.1 A research result described lynne webber (lnweber@mnsu.edu) is access Services librar- ian and peg lawrence (peg.lawrence@mnsu.edu) is Systems librarian, Minnesota State university, Mankato. AutHeNticAtioN AND Access | weBer AND lAwreNce 129 providing service to the unaffiliated, his survey revealed 100 percent of responding libraries offered free in-house collection use for the general public, and many others offered additional services.16 Brenda Johnson described a one-day program in 1984 sponsored by Rutgers University Libraries Forum titled “A Case Study in Closing the University Library to the Public.” The participating librarians spent the day famil- iarizing themselves with the “facts” of the theoretical case and concluded that public access should be restricted but not completely eliminated. A few months later, consider- ation of closing Rutgers’ library to the public became a real debate. Although there were strong opposing view- points, the recommendation was to retain the open-door policy.17 Jansen discussed the division between those who wanted to provide the finest service to primary users and those who viewed the library’s mission as including all who requested assistance. Jansen suggested specific ways to balance the needs of affiliates and the public and referred to the dilemma the University of California, Berkeley, library that had been closed to unaffiliated users.18 Bobp and Richey determined that California undergraduate libraries were emphasizing service to pri- mary users at a time when it was no longer practical to offer the same level of service to primary and secondary users. They presented three courses of action: adherence to the status quo, adoption of a policy restricting access, or implementation of tiered service.19 Throughout the 1990s, the debate over the public’s right to use academic libraries continued, with increasing focus on computer use in public and private academic libraries. New authorization and authentication require- ments increased the control of internal computers, but the question remained of libraries providing access to government information and responding to community members who expected to use the libraries supported by their taxes. Morgan, who described himself as one who had spent his career encouraging equal access to information, con- cluded that it would be necessary to use authentication, authorization, and access control to continue offering information services readily available in the past.20 Martin acknowledged that library use was changing as a result of the Internet and that the public viewed the academic librarian as one who could deal with the explosion of information and offer service to the public.21 Johnson described unaffiliated users as a group who wanted all the privileges of the affiliates; she discussed the obliga- tion of the institution to develop policies managing these guest users.22 Still and Kassabian considered the dual responsi- bilities of the academic library to offer Internet access to public users and to control Internet material received and sent by primary and public users. Further, they weighed as consultants be employed toward the building of good relations between town and gown.6 Later, however, Deale indicated that the generosity common in the 1950s to outsiders was becoming unsus- tainable.7 Deale used Beloit College, with an “open door policy” extending more than 100 years, as an example of a school that had found it necessary to refuse out-of-library circulation to minors except through ILL by the 1960s.8 Also in 1964, Waggoner related the increasing difficulty of accommodating public use of the academic library. He encouraged a balance of responsibility to the public with the institution’s foremost obligation to the students and faculty.9 In October 1965, the ad hoc Committee on Community Use of Academic Libraries was formed by the College Library Section of the Association of College and Research Libraries (ACRL). This committee distributed a 13-ques- tion survey to 1,100 colleges and universities throughout the United States. The high rate of response (71 per- cent) was considered noteworthy, and the findings were explored in “Community Use of Academic Libraries: A Symposium,” published in 1967.10 The concluding article by Josey (the symposium’s moderator) summarized the lenient attitudes of academic libraries toward public users revealed through survey and symposium reports. In the same article, Josey followed up with his own arguments in favor of the public’s right to use academic libraries because of the state and federal support provided to those institutions.11 Similarly, in 1976 Tolliver reported the results of a survey of 28 Wisconsin libraries (public academic, private academic, and public), which indicated that respondents made a great effort to serve all patrons seeking service.12 Tolliver continued in a different vein from Josey, however, by reporting the current annual fiscal support for libraries in Wisconsin and commenting upon financial steward- ship. Tolliver concluded by asking, “How effective are our library systems and cooperative affiliations in meet- ing the information needs of the citizens of Wisconsin?”13 Much of the literature in the years following focused on serving unaffiliated users at a time when public and academic libraries suffered the strain of overuse and underfunding. The need for prioritization of primary users was discussed. In 1979, Russell asked, “Who are our legitimate clientele?” and countered the argument for publicly supported libraries serving the entire public by saying the public “cannot freely use the university lawn mowers, motor pool vehicles, computer center, or athletic facilities.”14 Ten years later, Russell, Robison, and Prather prefaced their report on a survey of policies and services for outside users at 12 consortia institutions by saying, “The issue of external users is of mounting concern to an institution whose income is student credit hour gen- erated.”15 Despite Russell’s concerns about the strain of 130 iNForMAtioN tecHNoloGY AND liBrAries | septeMBer 2010 be aware of the issues and of the effects that licensing, networking, and collection development decisions have on access.”35 In “Unaffiliated Users’ Access to Academic Libraries: A Survey,” Courtney reported and analyzed data from her own comprehensive survey sent to 814 academic libraries in winter 2001.36 Of the 527 libraries responding to the survey, 72 libraries (13.6 percent) required all users to authenticate to use computers within the library, while 56 (12.4 percent) indicated that they planned to require authentication in the next twelve months.37 Courtney followed this with data from surveyed libraries that had canceled “most” of their indexes and abstracts (179 librar- ies, or 33.9 percent) and libraries that had cancelled “most” periodicals (46 libraries or 8.7 percent).38 She concluded that the extent to which the authentication requirement restricted unaffiliated users was not clear, and she asked, “As greater numbers of resources shift to electronic-only formats, is it desirable that they disappear from the view of the community user or the visiting scholar?”39 Courtney’s “Authentication and Library Public Access Computers: A Call for Discussion” described a follow-up with the academic libraries participating in her 2001 survey who had self-identified as using authentication or planning to employ authentication within the next twelve months. Her conclusion was the existence of ambivalence toward authentication among the libraries, since more than half of the respondents provided some sort of public access. She encouraged librarians to carefully consider the library’s commitment to service before entering into blanket license agreements with vendors or agreeing to campus computer restrictions.40 Several editions of the ARL SPEC Kit series showing trends of authentication and authorization for all users of ARL libraries have been an invaluable resource in this investigation. An examination of earlier SPEC Kits indicated that the definitions of “user authentication” and “authorization” have changed over the years. User Authentication, by Plum and Bleiler indicated that 98 per- cent of surveyed libraries authenticated users in some way, but at that time authentication would have been more precisely defined as authorization or permission to access personal records, such as circulation, e-mail, course regis- tration, and file space. As such, neither authentication nor authorization was related to basic computer access.41 By contrast, it is common for current library users authenti- cate to have any access to a public workstation. Driscoll’s Library Public Access Workstation Authentication sought information on how and why users were authenticated on public-access computers, who was driving the change, how it affected the ability of Federal Depository libraries to provide public information, and how it affected library ser- vices in general.42 But at the time of Driscoll’s survey, only 11 percent of surveyed libraries required authentication on all computers and 22 percent required it only on selected terminals. Cook and Shelton’s Managing Public Computing the reconciliation of material restrictions against “prin- ciples of freedom of speech, academic freedom, and the ALA’s condemnation of censorship.”23 Lynch discussed institutional use of authentication and authorization and the growing difficulty of verifying bona fide users of aca- demic library subscription databases and other electronic resources. He cautioned that future technical design choices must reflect basic library values of free speech, personal confidentiality, and trust between academic institution and publisher.24 Barsun specifically examined the webpages of one hundred ARL libraries in search of information pertinent to unaffiliated users. She included a historic overview of the changing attitudes of academics toward service to the unaffiliated population and described the difficult bal- ance of college community needs with those of outsiders in 2000 (the survey year).25 Barsun observed a consistent lack of information on library websites regarding library guest use of proprietary databases.26 Carlson discussed academic librarians’ concerns about “Internet-related crimes and hacking” leading to reconsideration of open computer use, and he described the need to compromise patron privacy by requiring authentication.27 In a chapter on the relationship of IT security to academic values, Oblinger said, “One possible interpretation of intellectual freedom is that individuals have the right to open and unfiltered access to the Internet.”28 This statement was followed later with “equal access to information can also be seen as a logical extension of fairness.”29 A short article in Library and Information Update alerted the authors to a UK project investigating improved online access to resources for library visitors not affili- ated with the host institution.30 Salotti described Higher Education Access to E-Resources in Visited Institutions (HAERVI) and its development of a toolkit to assist with the complexities of offering electronic resources to guest users.31 Salotti summarized existing resources for sharing within the United Kingdom and emphasized that “no single solution is likely to suit all universities and col- leges, so we hope that the toolkit will offer a number of options.”32 Launched by the Society of College, National and University Libraries (SCONUL), and Universities and Colleges Information Systems Association (UCISA), HAERVI has created a best-practice guide.33 By far the most useful articles for this investigation have been those by Nancy Courtney. “Barbarians at the Gates: A Half-Century of Unaffiliated Users in Academic Libraries,” a literature review on the topic of visitors in academic libraries, included a summary of trends in attitude and practice toward visiting users since the 1950s.34 The article concluded with a warning: “The shift from printed to elec- tronic formats . . . combined with the integration of library resources with campus computer networks and the Internet poses a distinct threat to the public’s access to information even onsite. It is incumbent upon academic librarians to AutHeNticAtioN AND Access | weBer AND lAwreNce 131 introductory letter with the invitation to participate and a forward containing definitions of terms used within the survey is in appendix A. In total, 61 (52 percent) of the 117 ARL libraries invited to participate in the survey responded. This is comparable with the response rate for similar surveys reported by Plum and Bleiler (52 of 121, or 43 percent), Driscoll (67 of 124, or 54 percent), and Cook and Shelton (69 of 123, or 56 percent).45 1. What is the name of your academic institution? The names of the 61 responding libraries are listed in appendix B. 2. Is your institution public or private? See figure 1. Respondents’ explanations of “other” are listed below. ■❏ State-related ■❏ Trust instrument of the U.S. people; quasi- government ■❏ Private state-aided ■❏ Federal government research library ■❏ Both—private foundation, public support 3. Are affiliated users required to authenticate in order to access computers in the public area of your library? See figure 2. 4. If you answered “yes” to the previous question, does your library provide the means for guest users to authenticate? See figure 3. Respondents’ explanations of “other” are listed below. All described open-access comput- ers. ■❏ “We have a few “open” terminals” ■❏ “4 computers don’t require authentication” ■❏ “Some workstations do not require authentica- tion” ■❏ “Open-access PCs for guests (limited number and function)” ■❏ “No—but we maintain several open PCs for guests” ■❏ “Some workstations do not require login” 5. Is your library a Federal Depository Library? See fig- ure 4. This question caused some confusion for the Canadian survey respondents because Canada has its own Depository Services Program corresponding to the U.S. Federal Depository Program. Consequently, 57 of the 61 respondents identified themselves as Federal Depository (including three Canadian librar- ies), although 5 of the 61 are more accurately mem- bers of the Canadian Depository Services Program. Only two responding libraries were neither a mem- ber of the U.S. Federal Depository Program nor of the Canadian Depository Services Program. 6. If you answered “yes” to the previous question, and com- puter authentication is required, what provisions have been made to accommodate use of online government documents by the general public in the library? Please check all that touched on every aspect of managing public computing, including public computer use, policy, and security.43 Even in 2007, only 25 percent of surveyed libraries required authentication on all computers, but 46 percent required authentication on some computers, showing the trend toward an ever increasing number of libraries requiring public workstation authentication. Most of the responding libraries had a computer-use policy, with 48 percent follow- ing an institution-wide policy developed by the university or central IT department.44 ■■ Method We constructed a survey designed to obtain current data about authentication in ARL libraries and to provide insight into how guest access is granted at various aca- demic institutions. It should be noted that the object of the survey was access to computers located in the public areas of the library for use by patrons, not access to staff computers. We constructed a simple, fourteen-question survey using the Zoomerang online tool (http://www .zoomerang.com/). A list of the deans, directors, and chief operating officers from the 123 ARL libraries was compiled from an Internet search. We eliminated the few library administrators whose addresses could not be readily found and sent the survey to 117 individuals with the request that it be forwarded to the appropriate respondent. The recipients were informed that the goal of the project was “determination of computer authentica- tion and current computer access practices within ARL libraries” and that the intention was “to reflect practices at the main or central library” on the respondent’s cam- pus. Recipients were further informed that the names of the participating libraries and the responses would be reported in the findings, but that there would be no link between responses given and the name of the participat- ing library. The survey introduction included the name and contact information of the institutional review board administrator for Minnesota State University, Mankato. Potential respondents were advised that the e-mail served as informed consent for the study. The survey was administered over approximately three weeks. We sent reminders three, five, and seven days after the survey was launched to those who had not already responded. ■■ Survey Questions, Responses, and Findings We administered the survey, titled “Authentication and Access: Academic Computers 2.0,” in late April 2008. Following is a copy of the fourteen-question survey with responses, interpretative data, and comments. The 132 iNForMAtioN tecHNoloGY AND liBrAries | septeMBer 2010 ■❏ “some computers are open access and require no authentication” ■❏ “some workstations do not require login” 7. If your library has open-access computers, how many do you provide? (Supply number). See figure 6. A total of 61 institutions responded to this question, and 50 reported open-access computers. The number of open-access computers ranged from 2 to 3,000. As expected, the highest numbers were reported by libraries that did not require authentication for affili- ates. The mean number of open-access computers was 161.2, the median was 23, the mode was 30, and the range was 2,998. 8. Please indicate which online resources and services are available to authenticated users. Please check all that apply. See figure 7. ■❏ Online catalog ■❏ Government documents ■❏ Internet browser apply. See figure 5. ■❏ Temporary User ID and Password ■❏ Open Access Computers (Unlimited Access) ■❏ Open Access Computers (Access Limited to Government Documents) ■❏ Other Of the 57 libraries that responded “yes” to question 5, 30 required authentication for affiliates. These institutions offered the general public access to online government documents various ways. Explanations of “other” are listed below. Three of these responses indicate, by survey definition, that open-access computers were provided. ■❏ “catalog-only workstations” ■❏ “4 computers don’t require authentication” ■❏ “generic login and password” ■❏ “librarians login each guest individually” ■❏ “provision made for under-18 guests needing gov doc” ■❏ “staff in Gov Info also login user for quick use” ■❏ “restricted guest access on all public devices” Figure 3. Institutions with the means to authenticate guests Figure 4. Libraries with Federal Depository and/or Canadian Depository Services status Figure 2. Institutions requiring authentication Figure 1. Categories of responding institutions AutHeNticAtioN AND Access | weBer AND lAwreNce 133 11. Does your library have a written policy for use of open access computers in the public area of the library? Question 7 indicates that 50 of the 61 responding libraries did offer the public two or more open-access computers. Out of the 50, 28 responded that they had a written policy governing the use of computers. Conversely, open-access computers were reported at 22 libraries that had no reported written policy. 12. If you answered “yes” to the previous question, please give the link to the policy and/or summarize the policy. Twenty-eight libraries gave a URL, a URL plus a summary explanation, or a summary explanation with no URL. 13. Does your library have a written policy for authenticating guest users? Out of the 32 libraries that required their users to authenticate (see question 3), 23 also had the means to allow their guests to authenticate (see question 4). Fifteen of those libraries said they had a policy. 14. If you answered “yes” to the previous question, please give the link to the policy and/or summarize the policy. Eleven ■❏ Licensed electronic resources ■❏ Personal e-mail access ■❏ Microsoft Office software 9. Please indicate which online resources and services are available to authenticated guest users. Please check all that apply. See figure 8. ■❏ Online catalog ■❏ Government documents ■❏ Internet browser ■❏ Licensed electronic resources ■❏ Personal e-mail access ■❏ Microsoft Office software 10. Please indicate which online resources and services are available on open-access computers. Please check all that apply. See figure 9. ■❏ Online catalog ■❏ Government documents ■❏ Internet browser ■❏ Licensed electronic resources ■❏ Personal e-mail access ■❏ Microsoft Office software Figure 5. Provisions for the online use of government docu- ments where authentication is required Figure 6. Number of open-access computers offered Figure 7. Electronic resources for authenticated affiliated users (N = 32) Number of libraries Number of librariesNumber of libraries Number of libraries Figure 8. Resources for authenticating guest users (N = 23) 134 iNForMAtioN tecHNoloGY AND liBrAries | septeMBer 2010 ■■ Respondents and authentication Figure 10 compares authentication practices of public, private, and other institutions described in response to question 2. Responses from public institutions outnum- bered those from private institutions, but within each group a similar percentage of libraries required their affiliated users to authenticate. Therefore no statistically significant difference was found between authenticating affiliates in public and private institutions. Of the 61 respondents, 32 (52 percent) required their affiliated users to authenticate (see question 3) and 23 of the 32 also had the means to authenticate guests (see question 4). The remaining 9 offered open-access comput- ers. Fourteen libraries had both the means to authenticate guests and had open-access computers (see questions 4 and 7). When we compare the results of the 2007 study by Cook and Shelton with the results of the current study (completed in 2008), the results are somewhat contradic- tory (see table 1).46 The differences in survey data seem to indicate that authentication requirements are decreasing; however, the literature review—specifically Cook and Shelton and the 2003 Courtney article—clearly indicate that authentica- tion is on the rise.47 This dichotomy may be explained, in part, by the fact that of the more than 60 ARL libraries responding to both surveys, there was an overlap of only 34 libraries. The 30 U.S. Federal Depository or Canadian Depository Services libraries that required their affiliated users to authenticate (see questions 3 and 5) provided guest access ranging from usernames and passwords, to open-access computers, to computers restricted to libraries gave the URL to their policy; 4 summarized their policies. ■■ Research questions answered The study resulted in answers to the questions we posed at the outset: ■■ Thirty-two (52 percent) of the responding ARL libraries required affiliated users to login to public computer workstations in the library. ■■ Twenty-three (72 percent) of the 32 ARL libraries requiring affiliated users to login to public computers provided the means for guest users to login to public computer workstations in the library. ■■ Fifty (82 percent) of 61 responding ARL libraries provided open-access computers for guest users; 14 (28 percent) of those 50 libraries provided both open-access computers and the means for guest authentication. ■■ Without exception, all U.S. Federal Depository or Canadian Depository Services Libraries that required their users to authenticate offered guest users some form of access to online information. ■■ Survey results indicated some differences between software provided to various users on differently accessed computers. Office software was less fre- quently provided on open-access computers. ■■ Twenty-eight responding ARL libraries had written policies relating to the use of open-access computers. ■■ Fifteen responding ARL libraries had written policies relating to the authorization of guests. Figure 9. Electronic resources on open access computers (N = 50) Figure 10. Comparison of library type and authentication requirement Number of libraries AutHeNticAtioN AND Access | weBer AND lAwreNce 135 ■■ One library had guidelines for use posted next to the workstations but did not give specifics. ■■ Fourteen of those requiring their users to authen- ticate had both open-access computers and guest authentication to offer to visitors of their libraries. Other policy information was obtained by an exami- nation of the 28 websites listed by respondents: ■■ Ten of the sites specifically stated that the open-access computers were for academic use only. ■■ Five of the sites specified time limits for use of open- access computers, ranging from 30 to 90 minutes. ■■ Four stated that time limits would be enforced when others were waiting to use computers. ■■ One library used a sign-in sheet to monitor time limits. ■■ One library mentioned a reservation system to moni- tor time limits. ■■ Two libraries prohibited online gambling. ■■ Six libraries prohibited viewing sexually explicit materials. ■■ Guest-authentication policies Of the 23 libraries that had the means to authenticate their guests, 15 had a policy for guests obtaining a username and password to authenticate, and 6 outlined their requirements of showing identification and issuing access. The other 9 had open-access computers that guests might use. The following are some of the varied approaches to guest authentication: ■■ Duration of the access (when mentioned) ranged from 30 days to 12 months. ■■ One library had a form of sponsored access where current faculty or staff could grant a temporary user- name and password to a visitor. ■■ One library had an online vouching system that allowed the visitor to issue his or her own username and password online. ■■ One library allowed guests to register themselves by swiping an ID or credit card. ■■ One library had open-access computers for local resources and only required authentication to leave the library domain. ■■ One library had the librarians log the users in as guests. ■■ One library described the privacy protection of col- lected personal information. ■■ No library mentioned charging a fee for allowing computer access. government documents, to librarians logging in for guests (see question 6). Numbers of open-access comput- ers ranged widely from 2 to more than 3,000 (see question 7). Eleven (19 percent) of the responding U.S. Federal Depository or Canadian Depository Services libraries that did not provide open-access computers issued a tempo- rary ID (nine libraries), provided open access limited to government documents (one library), or required librar- ian login for each guest (one library). All libraries with U.S. Federal Depository or Canadian Depository Services status provided a means of public access to information to fulfill their obligation to offer government documents to guests. Figure 11 shows a comparison of resources available to authenticated users and authenticated guests and offered on open-access computers. As might be expected, almost all institutions provided access to online catalogs, government documents, and Internet browsers. Fewer allowed access to licensed electronic resources and e-mail. Access to Office software showed the most dramatic drop in availability, especially on open-access computers. ■■ Open-access computer policies As mentioned earlier, 28 libraries had written policies for their open-access computers (see question 11), and 28 libraries gave a URL, a URL plus a summary explanation, or a summary explanation with no URL (see question 12). In most instances, the library policy included their campus’s acceptable-use policy. Seven libraries cited their campus’s acceptable-use policy and nothing else. Nearly all libraries applied the same acceptable-use policy to all users on all computers and made no distinction between policies for use of open-access computers or computers requiring authentication. Following are some of the varied aspects of summa- rized policies pertaining to open-access computers: ■■ Eight libraries stated that the computers were for aca- demic use and that users might be asked to give up their workstation if others were waiting. Table 1. Comparison of findings from Cook and Shelton (2007) and the current survey (2008) Authentication requirements 2007 (N = 69) 2008 (N = 61) Some required 28 (46%) 23 (38%) Required for all 15 (25%) 9 (15%) Not required 18 (30%) 29 (48%) 136 iNForMAtioN tecHNoloGY AND liBrAries | septeMBer 2010 ■■ Further study Although the survey answered many of our questions, other questions arose. While the number of libraries requiring affiliated users to log on to their public com- puters is increasing, this study does not explain why this is the case. Reasons could include reactions to the September 11 disaster, the USA PATRIOT Act, general security concerns, or the convenience of the personalized desktop and services for each authenticated user. Perhaps a future investigation could focus on reasons for more frequent requirement of authentication. Other subjects that arose in the examination of institutional policies were guest fees for services, age limits for younger users, com- puter time limits for guests, and collaboration between academic and public libraries. ■■ Policy developed as a result of the survey findings As a result of what was learned in the survey, we drafted guidelines governing the use of open-access computers by visitors and other non-university users. The guidelines can be found at http://lib.mnsu.edu/about/libvisitors .html#access. These guidelines inform guests that open- access computers are available to support their research, study, and professional activities. The computers also are governed by the campus policy and the state university system acceptable-use policy. Guideline provisions enable staff to ask users to relinquish a computer when others are waiting or if the computer is not being used for academic purposes. While this library has the ability to generate temporary usernames and passwords, and does so for local schools coming to the library for research, no guide- lines have yet been put in place for this function. Figure 11. Online resources available to authenticated affiliated users, guest users, open-access users AutHeNticAtioN AND Access | weBer AND lAwreNce 137 These practices depend on institutional missions and goals and are limited by reasonable considerations. In the past, accommodation at some level was generally offered to the community, but the complications of affili- ate authentication, guest registration, and vendor-license restrictions may effectively discourage or prevent outside users from accessing principal resources. On the other hand, open-access computers facilitate access to electronic resources. Those librarians who wish to provide the same level of commitment to guest users as in the past as well as protect the rights of all should advocate to campus policy-makers at every level to allow appropriate guest access to computers to fulfill the library’s mission. In this way, the needs and rights of guest users can be balanced with the responsibilities of using campus computers. In addition, librarians should consider ensuring that the licenses of all electronic resources accommodate walk-in users and developing guidelines to prevent incor- poration of electronic materials that restrict such use. This is essential if the library tradition of freedom of access to information is to continue. Finally, in regard to external or guest users, academic librarians are pulled in two directions; they are torn between serving primary users and fulfilling the prin- ciples of intellectual freedom and free, universal access to information along with their obligations as Federal Depository libraries. At the same time, academic librar- ians frequently struggle with the goals of the campus administration responsible for providing secure, reliable networks, sometimes at the expense of the needs of the outside community. The data gathered in this study, indicating that 82 percent of responding libraries con- tinue to provide at least some open-access computers, is encouraging news for guest users. Balancing public access and privacy with institutional security, while a current concern, may be resolved in the way of so many earlier preoccupations of the electronic age. Given the pervasiveness of the problem, however, fair and equitable treatment of all library users may continue to be a central concern for academic libraries for years to come. References 1. Lori Driscoll, Library Public Access Workstation Authentica- tion, SPEC Kit 277 (Washington, D.C.: Association of Research Libraries, 2003). 2. Martin Cook and Mark Shelton, Managing Public Comput- ing, SPEC Kit 302 (Washington, D.C.: Association of Research Libraries, 2007): 16. 3. H. Vail Deale, “Public Relations of Academic Libraries,” Library Trends 7 (Oct. 1958): 269–77. 4. Ibid., 275. 5. E. J. Josey, “The College Library and the Community,” Faculty Research Edition, Savannah State College Bulletin (Dec. 1962): 61–66. ■■ Conclusions While we were able to gather more than 50 years of litera- ture pertaining to unaffiliated users in academic libraries, it soon became apparent that the scope of consideration changed radically through the years. In the early years, there was discussion about the obligation to provide service and access for the community balanced with the challenge to serve two clienteles. Despite lengthy debate, there was little exception to offering the community some level of service within academic libraries. Early preoccupation with physical access, material loans, ILL, basic reference, and other services later became a discus- sion of the right to use computers, electronic resources, and other services without imposing undue difficulty to the guest. Current discussions related to guest users reflect obvious changes in public computer administration over the years. Authentication presently is used at a more fundamental level than in earlier years. In many librar- ies, users must be authorized to use the computer in any way whatsoever. As more and more institutions require authentication for their primary users, accommodation must be made if guests are to continue being served. In addition, as Courtney’s 2003 research indicates, an ever increasing number of electronic databases, indexes, and journals replace print resources in library collections. This multiplies the roadblocks for guest users and exacerbates the issue.48 Unless special provisions are made for com- puter access, community users are left without access to a major part of the library’s collections. Because 104 of the 123 ARL libraries (85 percent) are Federal Depository or Canadian Depository Services Libraries, the researchers hypothesized that most librar- ies responding to the survey would offer open-access computers for the use of nonaffiliated patrons. This study has shown that Federal Depository Libraries have remained true to their mission and obligation of provid- ing public access to government-generated documents. Every Federal Depository respondent indicated that some means was in place to continue providing visitor and guest access to the majority of their electronic resources— whether through open-access computers, temporary or guest logins, or even librarians logging on for users. While access to government resources is required for the librar- ies housing government-document collections, libraries can use considerably more discretion when considering what other resources guest patrons may use. Despite the commitment of libraries to the dissemination of govern- ment documents, the increasing use of authentication may ultimately diminish the libraries’ ability and desire to accommodate the information needs of the public. This survey has provided insight into the various ways academic libraries serve guest users. Not all academic libraries provide public access to all library resources. 138 iNForMAtioN tecHNoloGY AND liBrAries | septeMBer 2010 Identify Yourself,” Chronicle of Higher Education 50, no. 42 (June 25, 2004): A39, http://search.ebscohost.com/login.aspx?direct =true&db=aph&AN=13670316&site=ehost-live (accessed Mar. 2, 2009). 28. Diana Oblinger, “IT Security and Academic Values,” in Luker and Petersen, Computer & Network Security in Higher Edu- cation, 4, http://net.educause.edu/ir/library/pdf/pub7008e .pdf (accessed July 14, 2008). 29. Ibid., 5. 30. “Access for Non-Affiliated Users,” Library & Information Update 7, no. 4 (2008): 10. 31. Paul Salotti, “Introduction to HAERVI-HE Access to E-Resources in Visited Institutions,” SCONUL Focus no. 39 (Dec. 2006): 22–23, http://www.sconul.ac.uk/publications/ newsletter/39/8.pdf (accessed July 14, 2008). 32. Ibid., 23. 33. Universities and Colleges Information Systems Asso- ciation (UCISA), HAERVI: HE Access to E-Resources in Visited Institutions, (Oxford: UCISA, 2007), http://www.ucisa.ac.uk/ publications/~/media/Files/members/activities/haervi/ haerviguide%20pdf (accessed July 14, 2008). 34. Nancy Courtney, “Barbarians at the Gates: A Half-Century of Unaffiliated Users in Academic Libraries,” Journal of Academic Librarianship 27, no. 6 (Nov. 2001): 473–78, http://search.ebsco host.com/login.aspx?direct=true&db=aph&AN=5602739&site= ehost-live (accessed July 14, 2008). 35. Ibid., 478. 36. Nancy Courtney, “Unaffiliated Users’ Access to Academic Libraries: A Survey,” Journal of Academic Librarianship 29, no. 1 (Jan. 2003): 3–7, http://search.ebscohost.com/login.aspx?dire ct=true&db=aph&AN=9406155&site=ehost-live (accessed July 14, 2008). 37. Ibid., 5. 38. Ibid., 6. 39. Ibid., 7. 40. Nancy Courtney, “Authentication and Library Public Access Computers: A Call for Discussion,” College & Research Libraries News 65, no. 5 (May 2004): 269–70, 277, www.ala .org/ala/mgrps/divs/acrl/publications/crlnews/2004/may/ authentication.cfm (accessed July 14, 2008). 41. Terry Plum and Richard Bleiler, User Authentication, SPEC Kit 267 (Washington, D.C.: Association of Research Libraries, 2001): 9. 42. Lori Driscoll, Library Public Access Workstation Authentica- tion, SPEC Kit 277 (Washington, D.C.: Association of Research Libraries, 2003): 11. 43. Cook and Shelton, Managing Public Computing. 44. Ibid., 15. 45. Plum and Bleiler, User Authentication, 9; Driscoll, Library Public Access Workstation Authentication, 11; Cook and Shelton, Managing Public Computing, 11. 46. Cook and Shelton, Managing Public Computing, 15. 47. Ibid.; Courtney, Unaffiliated Users, 5–7. 48. Courtney, Unaffiliated Users, 6–7. 6. Ibid., 66. 7. H. Vail Deale, “Campus vs. Community,” Library Journal 89 (Apr. 15, 1964): 1695–97. 8. Ibid., 1696. 9. John Waggoner, “The Role of the Private University Library,” North Carolina Libraries 22 (Winter 1964): 55–57. 10. E. J. Josey, “Community Use of Academic Libraries: A Symposium,” College & Research Libraries 28, no. 3 (May 1967): 184–85. 11. E. J. Josey, “Implications for College Libraries,” in “Com- munity Use of Academic Libraries,” 198–202. 12. Don L. Tolliver, “Citizens May Use Any Tax-Supported Library?” Wisconsin Library Bulletin (Nov./Dec. 1976): 253. 13. Ibid., 254. 14. Ralph E. Russell, “Services for Whom: A Search for Iden- tity,” Tennessee Librarian: Quarterly Journal of the Tennessee Library Association 31, no. 4 (Fall 1979): 37, 39. 15. Ralph E. Russell, Carolyn L. Robison, and James E. Prather, “External User Access to Academic Libraries,” The Southeastern Librarian 39 (Winter 1989): 135. 16. Ibid., 136. 17. Brenda L. Johnson, “A Case Study in Closing the Univer- sity Library to the Public,” College & Research Library News 45, no. 8 (Sept. 1984): 404–7. 18. Lloyd M. Jansen, “Welcome or Not, Here They Come: Unaffiliated Users of Academic Libraries,” Reference Services Review 21, no. 1 (Spring 1993): 7–14. 19. Mary Ellen Bobp and Debora Richey, “Serving Secondary Users: Can It Continue?” College & Undergraduate Libraries 1, no. 2 (1994): 1–15. 20. Eric Lease Morgan, “Access Control in Libraries,” Com- puters in Libraries 18, no. 3 (Mar. 1, 1998): 38–40, http://search .ebscohost.com/login.aspx?direct=true&db=aph&AN=306709& site=ehost-live (accessed Aug. 1, 2008). 21. Susan K. Martin, “A New Kind of Audience,” Journal of Academic Librarianship 24, no. 6 (Nov. 1998): 469, Library, Infor- mation Science & Technology Abstracts, http://search.ebsco host.com/login.aspx?direct=true&db=aph&AN=1521445&site= ehost-live (accessed Aug. 8, 2008). 22. Peggy Johnson, “Serving Unaffiliated Users in Publicly Funded Academic Libraries,” Technicalities 18, no. 1 (Jan. 1998): 8–11. 23. Julie Still and Vibiana Kassabian, “The Mole’s Dilemma: Ethical Aspects of Public Internet Access in Academic Libraries,” Internet Reference Services Quarterly 4, no. 3 (1999): 9. 24. Clifford Lynch, “Authentication and Trust in a Networked World,” Educom Review 34, no. 4 (Jul./Aug. 1999), http://search .ebscohost.com/login.aspx?direct=true&db=aph&AN=2041418 &site=ehost-live (accessed July 16, 2008). 25. Rita Barsun, “Library Web Pages and Policies Toward ‘Outsiders’: Is the Information There?” Public Services Quarterly 1, no. 4 (2003): 11–27. 26. Ibid., 24. 27. Scott Carlson, “To Use That Library Computer, Please AutHeNticAtioN AND Access | weBer AND lAwreNce 139 Appendix A. The Survey Introduction, Invitation to Participate, and Forward Dear ARL Member Library, As part of a professional research project, we are attempting to determine computer authentication and current com- puter access practices within ARL libraries. We have developed a very brief survey to obtain this information which we ask one representative from your institution to complete before April 25, 2008. The survey is intended to reflect practices at the main or central library on your campus. Names of libraries responding to the survey may be listed but no identifying information will be linked to your responses in the analysis or publication of results. If you have any questions about your rights as a research participant, please contact Anne Blackhurst, Minnesota State University, Mankato IRB Administrator. Anne Blackhurst, IRB Administrator Minnesota State University, Mankato College of Graduate Studies & Research 115 Alumni Foundation Mankato, MN 56001 (507)389-2321 anne.blackhurst@mnsu.edu You may preview the survey by scrolling to the text below this message. If, after previewing you believe it should be handled by another member of your library team, please forward this message appropriately. Alternatively, you may print the survey, answer it manually and mail it to: Systems/ Access Services Survey Library Services Minnesota State University, Mankato ML 3097—PO Box 8419 Mankato, MN 56001-8419 (USA) We ask you or your representative to take 5 minutes to answer 14 questions about computer authentication practices in your main library. Participation is voluntary, but follow-up reminders will be sent. This e-mail serves as your informed consent for this study. Your participation in this study includes the completion of an online survey. Your name and iden- tity will not be linked in any way to the research reports. Clicking the link to take the survey shows that you understand you are participating in the project and you give consent to our group to use the information you provide. You have the right to refuse to complete the survey and can discontinue it at any time. To take part in the survey, please click the link at the bottom of this e-mail. Thank you in advance for your contribution to our project. If you have questions, please direct your inquiries to the contacts given below. Thank you for responding to our invitation to participate in the survey. This survey is intended to determine current academic library practices for computer authentication and open access. Your participation is greatly appreciated. Below are the definitions of terms used within this survey: ■■ “Authentication”: a username and password are required to verify the identity and status of the user in order to log on to computer workstations in the library. ■■ “Affiliated user”: a library user who is eligible for campus privileges. ■■ “Non-affiliated user”: a library user who is not a member of the institutional community (an alumnus may be a non- affiliated user). This may be used interchangeably with “guest user.” ■■ “Guest user”: visitor, walk-in user, nonaffiliated user. ■■ “Open Access Computer”: Computer workstation that does not require authentication by user. 140 iNForMAtioN tecHNoloGY AND liBrAries | septeMBer 2010 Appendix B. Responding Institutions 1. University at Albany State University of New York 2. University of Alabama 3. University of Alberta 4. University of Arizona 5. Arizona State University 6. Boston College 7. University of British Columbia 8. University at Buffalo, State University of NY 9. Case Western Reserve University 10. University of California Berkeley 11. University of California, Davis 12. University of California, Irvine 13. University of Chicago 14. University of Colorado at Boulder 15. University of Connecticut 16. Columbia University 17. Dartmouth College 18. University of Delaware 19. University of Florida 20. Florida State University 21. University of Georgia 22. Georgia Tech 23. University of Guelph 24. Howard University 25. University of Illinois at Urbana-Champaign 26. Indiana University Bloomington 27. Iowa State University 28. Johns Hopkins University 29. University of Kansas 30. University of Louisville 31. Louisiana State University 32. McGill University 33. University of Maryland 34. University of Massachusetts Amherst 35. University of Michigan 36. Michigan State University 37. University of Minnesota 38. University of Missouri 39. Massachusetts Institute of Technology 40. National Agricultural Library 41. University of Nebraska-Lincoln 42. New York Public Library 43. Northwestern University 44. Ohio State University 45. Oklahoma State University 46. University of Oregon 47. University of Pennsylvania 48. University of Pittsburgh 49. Purdue University 50. Rice University 51. Smithsonian Institution 52. University of Southern California 53. Southern Illinois University Carbondale 54. Syracuse University 55. Temple University 56. University of Tennessee 57. Texas A&M University 58. Texas Tech University 59. Tulane University 60. University of Toronto 61. Vanderbilt University 3140 ---- tHe Next GeNerAtioN liBrArY cAtAloG | ZHou 151Are Your DiGitAl DocuMeNts weB FrieNDlY? | ZHou 151 Are Your Digital Documents Web Friendly?: Making Scanned Documents Web Accessible The Internet has greatly changed how library users search and use library resources. Many of them prefer resources available in electronic format over tradi- tional print materials. While many docu- ments are now born digital, many more are only accessible in print and need to be digitized. This paper focuses on how the Colorado State University Libraries cre- ates and optimizes text-based and digitized PDF documents for easy access, download- ing, and printing. T o digitize print materials, we normally scan originals, save them in archival digital formats, and then make them Web- accessible. There are two types of print documents, graphic-based and text-based. If we apply the same tech- niques to digitize these two different types of materials, the documents produced will not be Web-friendly. Graphic-based materials include archival resources such as his- torical photographs, drawings, manuscripts, maps, slides, and post- ers. We normally scan them in color at a very high resolution to capture and present a reproduction that is as faithful to the original as possible. Then we save the scanned images in TIFF (Tagged Image File Format) for archival purposes and convert the TIFFs to JPEG (Joint Photographic Experts Group) 2000 or JPEG for Web access. However, the same practice is not suitable for modern text-based documents, such as reports, jour- nal articles, meeting minutes, and theses and dissertations. Many old text-based documents (e.g., aged newspapers and books), should be Yongli ZhouTutorial files for fast Web delivery as access files. For text-based files, access files normally are PDFs that are converted from scanned images. “BCR’s CDP Digital Imaging Best Practices Version 2.0” says that the master image should be the highest quality you can afford, it should not be edited or processed for any specific output, and it should be uncom- pressed.1 This statement applies to archival images, such as photographs, manuscripts, and other image-based materials. If we adopt the same approach for modern text documents, the result may be problematic. PDFs that are created from such master files may have the following drawbacks: ■■ Because of their large file size, they require a long download time or cannot be downloaded because of a timeout error. ■■ They may crash a user’s com- puter because they use more memory while viewing. ■■ They sometimes cannot be printed because of insufficient printer memory. ■■ Poor print and on-screen view- ing qualities can be caused by background noise and bleed- through of text. Background noise can be caused by stains, highlighter marks made by users, and yellowed paper from aged documents. ■■ The OCR process sometimes does not work for high-resolu- tion images. ■■ Content creators need to spend more time scanning images at a high resolution and converting them to PDF documents. Web-friendly files should be small, accessible by most users, full-text searchable, and have good treated as graphic-based material. These documents often have faded text, unusual fonts, stains, and col- ored background. If they are scanned using the same practice as modern text documents, the document cre- ated can be unreadable and contain incorrect information. This topic is covered in the section “Full-Text Searchable PDFs and Troubleshooting OCR Errors.” Currently, PDF is the file format used for most digitized text docu- ments. While PDFs that are created from high-resolution color images may be of excellent quality, they can have many drawbacks. For exam- ple, a multipage PDF may have a large file size, which increases down- load time and the memory required while viewing. Sometimes the down- load takes so long it fails because a time-out error occurs. Printers may have insufficient memory to print large documents. In addition, the Optical Character Recognition (OCR) process is not accurate for high- resolution images in either color or grayscale. As we know, users want the ability to easily download, view, print, and search online textual docu- ments. All of the drawbacks created by high-quality scanning defeat one of the most important purposes of digitizing text-based documents: making them accessible to more users. This paper addresses how Colorado State University Libraries (CSUL) manages these problems and others as staff create Web-friendly digitized textual documents. Topics include scanning, long-time archiving, full-text searchable PDFs and troubleshooting OCR problems, and optimizing PDF files for Web delivery. Preservation Master Files and Access Files For digitization projects, we normally refer to images in uncompressed TIFF format as master files and compressed Yongli Zhou is Digital repositories librarian, Colorado State university libraries, Colorado State university, fort Collins, Colorado 152 iNForMAtioN tecHNoloGY AND liBrAries | septeMBer 2010152 iNForMAtioN tecHNoloGY AND liBrAries | septeMBer 2010 factors that determine PDF file size. Color images typically generate the largest PDFs and black-and-white images generate the smallest PDFs. Interestingly, an image of smaller file size does not necessarily generate a smaller PDF. Table 1 shows how file format and color mode affect PDF file size. The source file is a page contain- ing black-and-white text and line art drawings. Its physical dimensions are 8.047" by 10.893". All images were scanned at 300 dpi. CSUL uses Adobe Acrobat Professional to create PDFs from scanned images. The current ver- sion we use is Adobe Acrobat 9 Professional, but most of its features listed in this paper are available for other Acrobat versions. When Acrobat converts TIFF images to a PDF, it compresses images. Therefore a final PDF has a smaller file size than the total size of the original images. Acrobat compresses TIFF uncom- pressed, LZW, and Zip the same amount and produces PDFs of the same file size. Because our in-house scanning software does not support TIFF G4, we did not include TIFF G4 test data here. By comparing simi- lar pages, we concluded that TIFF G4 works the same as TIFF uncom- pressed, LZW, and Zip. For example, if we scan a text-based page as black- and-white and save it separately in TIFF uncompressed, LZW, Zip, or G4, then convert each page into a PDF, the final PDF will have the same file size without a noticeable quality difference. TIFF JPEG generates the smallest PDF, but it is a lossy format, so it is not recommended. Both JPEG and JPEG 2000 have smaller file sizes but generate larger PDFs than those converted from TIFF images. recommendations 1. Use TIFF uncompressed or LZW in 24 bits color for pages with color graphs or for historical doc- uments. 2. Use TIFF uncompressed or LZW compress an image up to 50 per- cent. Some vendors hesitate to use this format because it was proprietary; however, the patent expired on June 20, 2003. This format has been widely adopted by much software and is safe to use. CSUL saves all scanned text documents in this format. ■■ TIFF Zip: This is a lossless compression. Like LZW, ZIP compression is most effective for images that contain large areas of single color. 2 ■■ TIFF JPEG: This is a JPEG file stored inside a TIFF tag. It is a lossy compression, so CSUL does not use this file format. Other image formats: ■■ JPEG: This format is a lossy com- pression and can only be used for nonarchival purposes. A JPEG image can be converted to PDF or embedded in a PDF. However, a PDF created from JPEG images has a much larger file size com- pared to a PDF created from TIFF images. ■■ JPEG 2000: This format’s file extension is .jp2. This format offers superior compression per- formance and other advantages. JPEG 2000 normally is used for archival photographs, not for text-based documents. In short, scanned images should be saved as TIFF files, either with compression or without. We recom- mend saving text-only pages and pages containing text and/or line art as TIFF G4 or TIFF LZW. We also recommend saving pages with photo- graphs and illustrations as TIFF LZW. We also recommend saving pages with photographs and illustrations as TIFF uncompressed or TIFF LZW. How Image Format and Color Mode Affect PDF File Size Color mode and file format are two on-screen viewing and print quali- ties. In the following sections, we will discuss how to make scanned docu- ments Web-friendly. Scanning There are three main factors that affect the quality and file size of a digitized document: file format, color mode, and resolution of the source images. These factors should be kept in mind when scanning text documents. File Format and compression Most digitized documents are scanned and saved as TIFF files. However, there are many different formats of TIFF. Which one is appro- priate for your project? ■■ TIFF: Uncompressed format. This is a standard format for scanned images. However, an uncom- pressed TIFF file has the largest file size and requires more space to store. ■■ TIFF G3: TIFF with G3 compres- sion is the universal standard for faxs and multipage line-art documents. It is used for black- and-white documents only. ■■ TIFF G4: TIFF with G4 com- pression has been approved as a lossless archival file format for bitonal images. TIFF images saved in this compression have the smallest file size. It is a stan- dard file format used by many commercial scanning vendors. It should only be used for pages with text or line art. Many scan- ning programs do not provide this file format by default. ■■ TIFF Huffmann: A method for compressing bi-level data based on the CCITT Group 3 1D fac- simile compression schema. ■■ TIFF LZW: This format uses a lossless compression that does not discard details from images. It may be used for bitonal, gray- scale, and color images. It may tHe Next GeNerAtioN liBrArY cAtAloG | ZHou 153Are Your DiGitAl DocuMeNts weB FrieNDlY? | ZHou 153 to be scanned at no less than 600 dpi in color. Our experiments show that documents scanned at 300 or 400 dpi are sufficient for creating PDFs of good quality. Resolutions lower than 300 dpi are not recom- mended because they can degrade image quality and produce more OCR errors. Resolutions higher than 400 dpi also are not recommended because they generate large files with little improved on-screen viewing and print quality. We compared PDF files that were converted from images of resolutions at 300, 400, and 600 dpi. Viewed at 100 percent, the differ- ence in image quality both on screen and in print was negligible. If a page has text with very small font, it can be scanned at a higher resolution to improve OCR accuracy and viewing and print quality. Table 2 shows that high-resolu- tion images produce large files and require more time to be converted into PDFs. The time required to combine images is not significantly different compared to scanning time and OCR time, so it was omitted. Our example is a modern text docu- ment with text and a black-and-white chart. Most of our digitization projects do not require scanning at 600 dpi; 300 dpi is the minimum requirement. We use 400 dpi for most documents and choose a proper color mode for each page. For example, we scan our theses and dissertations in black-and- white at 400 dpi for bitonal pages. We scan pages containing photographs or illustrations in 8-bit grayscale or 24-bit color at 400 dpi. Other Factors that Affect PDF File Size In addition to the three main fac- tors we have discussed, unnecessary edges, bleed-through of text and graphs, background noise, and blank pages also increase PDF file sizes. Figure 1 shows how a clean scan can largely reduce a PDF file size and cover. The updated file has a file size of 42.8 MB. The example can be accessed at http://hdl.handle .net/10217/3667. Sometimes we scan a page containing text and photo- graphs or illustrations twice, in color or grayscale and in black-and-white. When we create a PDF, we com- bine two images of the same page to reproduce the original appearance and to reduce file size. How to opti- mize PDFs using multiple scans will be discussed in a later section. How Image Resolution Affects PDF File Size Before we start scanning, we check with our project manager regarding project standards. For some funded projects, documents are required in grayscale 8 bits for pages with black-and-white photographs or grayscale illustrations. 3. Use TIFF uncompressed, LZW, or G4 in black-and-white for pages containing text or line art. To achieve the best result, each page should be scanned accordingly. For example, we had a document with a color cover, 790 pages containing text and line art, and 7 blank pages. We scanned the original document in color at 300 dpi. The PDF created from these images was 384 MB, so large that it exceeded the maximum file size that our repository software allows for uploading. To optimize the document, we deleted all blank pages, converted the 790 pages with text and line art from color to black- and-white, and retained the color Table 1. File format and color mode versus PDF file size File Format Scan Specifications TIFF Size (KB) PDF Size (KB) TIFF Color 24 bits 23,141 900 TIFF LZW Color 24 bits 5,773 900 TIFF ZIP Color 24 bits 4,892 900 TIFF JPEG Color 24 bits 4,854 873 JPEG 2000 Color 24 bits 5,361 5,366 JPEG Color 24 bits 4,849 5,066 TIFF Grayscale 8 bits 7,729 825 TIFF LZW Grayscale 8 bits 2,250 825 TIFF ZIP Grayscale 8 bits 1,832 825 TIFF JPEG Grayscale 8 bits 2,902 804 JPEG 2000 Grayscale 8 bits 2,266 2,270 JPEG Grayscale 8 bits 2,886 3,158 TIFF Black-and-white 994 116 TIFF LZW Black-and-white 242 116 TIFF ZIP Black-and-white 196 116 note: Black-and-white scans cannot be saved in JPEg, JPEg 2000, or TIff JPEg formats. 154 iNForMAtioN tecHNoloGY AND liBrAries | septeMBer 2010154 iNForMAtioN tecHNoloGY AND liBrAries | septeMBer 2010 Many PDF files cannot be saved as PDF/A files. If an error occurs when saving a PDF to PDF/A, you may use Adobe Acrobat Preflight (Advanced > Preflight) to identify problems. See figure 2. Errors can be created by non- embedded fonts, embedded images with unsupported file compression, bookmarks, embedded video and audio, etc. By default, the Reduce File Size procedure in Acrobat Professional compresses color images using JPEG 2000 compression. After running the Reduce File Size pro- cedure, a PDF may not be saved as a PDF/A because of a “JPEG 2000 compression used” error. According to the PDF/A Competence Center, this problem will be eliminated in the second part of the PDF/A standard— PDF/A-2 is planned for 2008/2009. There are many other features in new PDFs; for example, transparency and layers will be allowed in PDF/A- 2.5 However, at the time this paper was written PDF/A-2 had not been announced.6 portable, which means the file cre- ated on one computer can be viewed with an Acrobat viewer on other computers, handheld devices, and on other platforms.3 A PDF/A document is basically a traditional PDF document that fulfills precisely defined specifications. The PDF/A standard aims to enable the creation of PDF documents whose visual appearance will remain the same over the course of time. These files should be software-independent and unrestricted by the systems used to create, store, and reproduce them.4 The goal of PDF/A is for long-term archiving. A PDF/A document has the same file extension as a regular PDF file and must be at least compat- ible with Acrobat Reader 4. There are many ways to cre- ate a PDF/A document. You can convert existing images and PDF files to PDF/A files, export a doc- ument to PDF/A format, scan to PDF/A, to name a few. There are many software programs you can use to create PDF/A, such as Adobe Acrobat Professional 8 and later ver- sions, Compart AG, PDFlib, and PDF Tools AG. simultaneously improve its viewing and print quality. Recommendations 1. Unnecessary edges: Crop out. 2. Bleed-through text or graphs: Place a piece of white or black card stock on the back of a page. If a page is single sided, use white card stock. If a page is double sided, use black card stock and increase contrast ratio when scanning. Often color or grayscale images have bleed- through problems. Scanning a page containing text or line art as black-and-white will eliminate bleed-through text and graphs. 3. Background noise: Scanning a page containing text or line art as black-and-white can elimi- nate background noise. Many aged documents have yellowed papers. If we scan them as color or grayscale, the result will be images with yellow or gray back- ground, which may increase PDF file sizes greatly. We also recom- mend increasing the contrast for better OCR results when scanning documents with background colors. 4. Blank pages: Do not include if they are not required. Blank pages scanned in grayscale or color can quickly increase file size. PDF and Long- Term Archiving PDF/A PDF vs. PDF/A PDF, short for Portable Document Format, was developed by Adobe as a unique format to be viewed through Adobe Acrobat view- ers. As the name implies, it is Table 2. Color Mode and Image Resolution vs. PDF File Size Color mode Resolution (DPI) Scanning time (sec.) OCR time (sec.) TIFF LZW (KB) PDF size (KB) color 600 100 N/A* 16,498 2,391 color 400 25 35 7,603 1,491 color 300 18 16 5,763 952 grayscale 600 36 33 6,097 2,220 grayscale 400 18 18 2,888 1370 grayscale 300 14 12 2,240 875 B/W 600 12 18 559 325 B/W 400 10 10 333 235 B/W 300 8 9 232 140 *n/a due to an oCr error tHe Next GeNerAtioN liBrArY cAtAloG | ZHou 155Are Your DiGitAl DocuMeNts weB FrieNDlY? | ZHou 155 able. This option keeps the origi- nal image and places an invisible text layer over it. Recommended for cases requiring maximum fidelity to the original image.8 This is the only option used by CSUL. 2. Searchable Image: Ensures that text is searchable and selectable. This option keeps the original image, de-skews it as needed, and places an invisible text layer over it. The selection for downs- ample images in this same dia- log box determines whether the image is downsampled and to what extent.9 The downsam- pling combines several pixels in an image to make a single larger pixel; thus some informa- tion is deleted from the image. However, downsampling does not affect the quality of text or line art. When a proper setting is used, the size of a PDF can be significantly reduced with little or no loss of detail and precision. 3. ClearScan: Synthesizes a new Type 3 font that closely approxi- mates the original, and preserves the page background using a low-resolution copy.10 The final PDF is the same as a born-dig- ital PDF. Because Acrobat can- not guarantee the accuracy of manipulate the PDF document for accessibility. Once OCR is properly applied to the scanned files, how- ever, the image becomes searchable text with selectable graphics, and one may apply other accessibility features to the document.7 Acrobat Professional provides three OCR options: 1. Searchable Image (Exact): Ensures that text is searchable and select- Full-Text Searchable PDFs and Trouble- shooting OCR Errors A PDF created from a scanned piece of paper is inherently inaccessible because the content of the docu- ment is an image, not searchable text. Assistive technology cannot read or extract the words, users cannot select or edit the text, and one cannot Figure 1. PDFs Converted from different images: (a) the original PDF converted from a grayscale image and with unnecessary edges; (b) updated PDF converted from a black- and-white image and with edges cropped out; (c) screen viewed at 100 percent of the PDF in grayscale; and (d) screen viewed at 100 percent of the PDF in black-and-white. Dimensions: 9.127” X 11.455” Color Mode: grayscale Resolution: 600 dpi TIFF LZW: 12.7 MB PDF: 1,051 KB Dimensions: 8” X 10.4” Color Mode: black-and-white Resolution: 400 dpi TIFF LZW: 153 KB PDF: 61 KB Figure 2. Example of Adobe Acrobat 9 Preflight 156 iNForMAtioN tecHNoloGY AND liBrAries | septeMBer 2010156 iNForMAtioN tecHNoloGY AND liBrAries | septeMBer 2010 but at least users can read all text, while the black-and-white scan con- tains unreadable words. Troubleshoot OCR Error 3: Cannot OCR Image Based Text The search of a digitized PDF is actually performed on its invis- ible text layer. The automated OCR process inevitably produces some incorrectly recognized words. For example, Acrobat cannot recognize the Colorado State University Logo correctly (see figure 6). Unfortunately, Acrobat does not provide a function to edit a PDF file’s invisible text layer. To manu- ally edit or add OCR’d text, Adobe Acrobat Capture 3.0 (see figure 7) must be purchased. However, our tests show that Capture 3.0 has many drawbacks. This software is compli- cated and produces it’s own errors. Sometimes it consolidates words; other times it breaks them up. In addition, it is time-consuming to add or modify invisible text layers using Acrobat Capture 3.0. At CSUL, we manually add searchable text for title and abstract pages only if they cannot be OCR’d by Acrobat correctly. The example in Troubleshoot OCR Error 2: Could Not Perform Recognition (OCR) Sometimes Acrobat gener- ates an “Outside of the Allowed Specifications” error when process- ing OCR. This error is normally caused by color images scanned at 600 dpi or more. In the example in figure 4, the page only contains text but was scanned in color at 600 dpi. When we scanned this page as black- and-white at 400 dpi, we did not encounter this problem. We could also use a lower-resolution color scan to avoid this error. Our experiments also show that images scanned in black-and-white work best for the OCR process. In this article we mainly discuss running the OCR process on modern textual documents. Black-and-white scans do not work well for historical textual documents or aged newspa- pers. These documents may have faded text and background noise. When they are scanned as black- and-white, broken letters may occur, and some text might become unread- able. For this reason they should be scanned in color or grayscale. In fig- ure 5, images scanned in color might not produce accurate OCR results, OCRed text at 100 percent, this option is not acceptable for us. For a tutorial on to how to make a full-text searchable PDF, please see appendix A. Troubleshoot OCR Error 1: Acrobat Crashes Occasionally Acrobat crashes during the OCR process. The error message does not indicate what causes the crash and where the problem occurs. Fortunately, the page number of the error can be found on the top short- cuts menu. In figure 3, we can see the error occurs on page 7. We discovered that errors are often caused by figures or diagrams. For a problem like this, the solution is to skip the error-causing page when running the OCR process. Our initial research was performed on Acrobat 8 Professional. Our recent study shows that this problem has been significantly improved in Acrobat 9 Professional. Figure 3. Adobe Acrobat 8 Professional crash window Figure 4. “Could not perform recognition (OCR)” error Figure 5. An aged newspaper scanned in color and black-and-white Aged Newspaper Scanned in Color Aged Newspaper Scanned in Black-and-White tHe Next GeNerAtioN liBrArY cAtAloG | ZHou 157Are Your DiGitAl DocuMeNts weB FrieNDlY? | ZHou 157 a very light yellow background. The undesirable marks and background contribute to its large file size and create ink waste when printed. Method 2: Running Acrobat’s Built-In Optimization Processes Acrobat provides three built-in pro- cesses to reduce file size. By default, Acrobat use JPEG compression for color and grayscale images and CCITT Group 4 compression for bitonal images. optimize scanned pDF Open a scanned PDF and select Documents > Optimize Scanned PDF. A number of settings, such as image quality and background removal, can be specified in the Optimize Scanned PDF dialog box. Our experiments show this process can noticably degrade images and sometimes even increase file size. Therefore we do not use this option. reduce File size Open a scanned PDF and select Documents > Reduce File Size. The Reduce File Size command resa- mples and recompresses images, removes embedded Base-14 fonts, and subset-embeds fonts that were left embedded. It also compresses document structure and cleans up elements such as invalid bookmarks. If the file size is already as small as possible, this command has no effect.11 After process, some files cannot be saved as PDF/A, as we discussed in a previous section. We also noticed that different versions of Acrobat can create files of different file sizes even if the same settings were used. pDF optimizer Open a scanned PDF and select Advanced > PDF Optimizer. Many settings can be specified in the PDF Optimizer dialog box. For example, we can downsample images from sections, we can greatly reduce a PDF’s size by using an appro- priate color mode and resolution. Figure 9 shows two different ver- sions of a digitized document. The source document has a color cover and 111 bitonal pages. The origi- nal PDF, shown in figure 9 on the left, was created by another univer- sity department. It was not scanned according to standards and pro- cedures adopted by CSUL. It was scanned in color at 300 dpi and has a file size of 66,265 KB. We exported the original PDF as TIFF images, batch-converted color TIFF images to black-and-white TIFF images, and then created a new PDF using black- and-white TIFF images. The updated PDF has a file size of 8,842 KB. The image on the right is much cleaner and has better print quality. The file on the left has unwanted marks and figure 8 is a book title page for which we used Acrobat Capture 3.0 to man- ually add searchable text. The entire book may be accessed at http://hdl .handle.net/10217/1553. Optimizing PDFs for Web Delivery A digitized PDF file with 400 color pages may be as large as 200 to 400 MB. Most of the time, optimizing processes may reduce files this large without a noticeable difference in quality. In some cases, quality may be improved. We will discuss three optimization methods we use. Method 1: Using an Appropriate Color Mode and Resolution As we have discussed in previous ~dO UniversitY Original Logo Text OCRed by Acrobat Figure 6. Incorrectly recognized text sample Figure 7. Adobe Acrobat capture interface Figure 8. Image-based text sample 158 iNForMAtioN tecHNoloGY AND liBrAries | septeMBer 2010158 iNForMAtioN tecHNoloGY AND liBrAries | septeMBer 2010 grayscale. A PDF may contain pages that were scanned with different color modes and resolutions. A PDF may also have pages of mixed reso- lutions. One page may contain both bitonal images and color or grayscale images, but they must be of the same resolution. The following strategies were adopted by CSUL: 1. Combine bitmap, grayscale, and color images. We use gray- scale images for pages that con- tain grayscale graphs, such as black-and-white photos, color images for pages that contain color images, and bitmap images for text-only or text and line art pages. 2. If a page contains high-definition color or grayscale images, scan that page in a higher resolution and scan other pages at 400 dpi. 3. If a page contains a very small font and the OCR process does not work well, scan it at a higher resolution and the rest of docu- ment at 400 dpi. 4. If a page has both text, color, or grayscale graphs, we scan it twice. Then we modify images using Adobe Photoshop and combine two images in Acrobat. In figure 10, the grayscale image has a gray background and a true reproduction of the original photo- graph. The black-and-white scan has a white background and clean text, but details of the photograph are lost. The PDF converted from the grayscale image is 491 KB and has nine OCR errors. The PDF converted from the black-and-white image is 61KB and has no OCR errors. The PDF converted from a combination of the grayscale and black-and-white images is 283 KB and has no OCR errors. The following are the steps used to create a PDF in figure 10 using Acrobat: 1. Scan a page twice—grayscale Optimizer can be found at http:// www.acrobatusers.com/tutorials/ understanding-acrobats-optimizer. Method 3: Combining Different Scans Many documents have color covers and color or grayscale illustrations, but the majority of pages are text- only. It is not necessary to scan all pages of such documents in color or a higher resolution to a lower reso- lution and choose a different file compression. Different collections have different original sources, therefore different settings should be applied. We normally do sev- eral tests for each collection and choose the one that works best for it. We also make our PDFs compat- ible with Acrobat 6 to allow users with older versions of software to view our documents. A detailed tutorial of how to use the PDF Figure 9. Reduce file size example Figure 10. Reduce file size example: combine images tHe Next GeNerAtioN liBrArY cAtAloG | ZHou 159Are Your DiGitAl DocuMeNts weB FrieNDlY? | ZHou 159 help.html?content=WSfd1234e1c4b69f30 ea53e41001031ab64-7757.html (accessed Mar. 3, 2010). 3. Ted Padova Adobe Acrobat 7 PDF Bible, 1st ed. (Indianapolis: Wiley, 2005). 4. Olaf Drümmer, Alexandra Oettler, and Dietrich von Seggern, PDF/A in a Nutshell—Long Term Archiving with PDF, (Berlin: Association for Digital Document Standards, 2007). 5. PDF/A Competence Center, “PDF/A: An ISO Standard—Future Development of PDF/A,” http://www. pdfa.org/doku.php?id=pdfa:en (accessed July 20, 2010). 6. PDF/A Competence Center, “PDF/A—A new Standard for Long- Term Archiving,” http://www.pdfa.org/ doku.php?id=pdfa:en:pdfa_whitepaper (accessed July 20, 2010). 7. Adobe, “Creating Accessible PDF Documents with Adobe Acrobat 7.0: A Guide for Publishing PDF Documents for Use by People with Disabilities,” 2005, http://www.adobe.com/enterprise/ a c c e s s i b i l i t y / p d f s / a c ro 7 _ p g _ u e . p d f (accessed Mar. 8, 2010). 8. Adobe, “Recognize Text in Scanned Documents,” 2010, http:// help.adobe.com/en_US/Acrobat/9.0/ S t a n d a rd / W S 2 A 3 D D 1 FA - C FA 5 - 4 c f 6 -B993-159299574AB8.w.html (accessed Mar. 8, 2010). 9. Ibid. 10. Ibid. 11. Adobe, “Reduce File Size by Saving,” 2010, http://help.adobe.com/en_US/ Acrobat/9.0/Standard/WS65C0A053 -BC7C-49a2-88F1-B1BCD2524B68.w.html (accessed Mar. 3, 2010). the other 76 pages as grayscale and black-and-white. Then we used the procedure described above to com- bine text pages and photographs. The final PDF has clear text and cor- rectly reproduced photographs. The example can be found at http://hdl .handle.net/10217/1553. Conclusion Our case study, as reported in this article, demonstrates the importance of investing the time and effort to apply the appropriate standards and techniques for scanning and optimiz- ing digitized documents. If proper techniques are used, the final result will be Web-friendly resources that are easy to download, view, search, and print. Users will be left with a posi- tive impression of the library and feel encouraged to use its materials and services again in the future. References 1. BCR’s CDP Digital Imaging Best Practices Working Group, “BCR’s CDP Digital Imaging Best Practices Version 2.0,” June 2008, http://www.bcr.org/ dps/cdp/best/digital-imaging-bp.pdf (accessed Mar. 3, 2010). 2. Adobe, “About File Formats and Compression,” 2010, http://livedocs .adobe.com/en_US/Photoshop/10.0/ and black-and-white. 2. Crop out text on the grayscale scan using Photoshop. 3. Delete the illustration on the black-and-white image using Photoshop. 4. Create a PDF using the black- and-white image. 5. Run the OCR process and save the file. 6. Insert the color graph. Select Tools > Advanced Editing > TouchUp Object Tool. Right- click on the page and select Place Image. Locate the color graph in the Open dialog, then click Open and move the color graph to its correct location. 7. Save the file and run the Reduce File Size or PDF Optimizer pro- cedure. 8. Save the file again. This method produces the small- est file size with the best quality, but it is very time-consuming. At CSUL we used this method for some important documents, such as one of our institutional repository’s show- case items, Agricultural Frontier to Electronic Frontier. The book has 220 pages, including a color cover, 76 pages with text and photographs, and 143 text-only pages. We used a color image for the cover page and 143 black-and-white images for the 143 text-only pages. We scanned Appendix A. Step-by-Step Creating a Full-Text Searchable PDF In this tutorial, we will show you how to create a full-text searchable PDF using Adobe Acrobat 9 Professional. Creating a PDF from a Scanner Adobe Acrobat Professional can create a PDF directly from a scanner. Acrobat 9 provides five options: Black and White Document, Grayscale Document, Color Document, Color Image, and Custom Scan. The custom scan option allows you to scan, run the OCR procedure, add metadata, combine multiple pages into one PDF, and also make it PDF/A compliant. To create a PDF from a scanner, go to File > Create PDF > From Scanner > Custom Scan. See figure 1. At CSUL, we do not directly create PDFs from scanners because our tests show that it can produce fuzzy text and it is not time efficient. Both scanning and running the OCR process can be very time consuming. If an error occurs during these processes, we would have to start over again. We normally scan images on scanning stations by student employees 160 iNForMAtioN tecHNoloGY AND liBrAries | septeMBer 2010160 iNForMAtioN tecHNoloGY AND liBrAries | septeMBer 2010 or outsource them to vendors. Then library staff will perform quality control and create PDFs on seperate machines. In this way, we can work on multiple documents at the same time and ensure that we provide high-quality PDFs. Creating a PDF from Scanned Images 1. From the task bar select Combine > Merge Files into a single PDF > From Multiple Files. See figure 2. 2. In the Combine Files dialog, make sure the Single PDF radio button is selected. From the Add Files dropdown menu select Add Files. See figure 3. 3. In the Add Files dialog, locate images and select multiple images by holding shift key, and then click Add Files button. 4. By default, Acrobat sorts files by file names. Use Move Up and Move Down buttons to change image orders and use the Remove button to delete images. Choose a target file size. The smallest icon will produce a file with a smaller file size but a lower image quality PDF, and the largest icon will produce a high image quality PDF but with a very large file size. We normally use the default file size setting, which is the middle icon. 5. Save the file. At this point, the PDF is not full-text searchable. Making a Full-Text Searchable PDF A PDF document created from a scanned piece of paper is inherently inaccessible because the content of the document is an image, not searchable text. Assistive technology cannot read or extract the words, users cannot select or edit the text, and one cannot manipulate the PDF document for accessibility. Once optical character recognition (OCR) is properly applied to the scanned files, however, the image becomes searchable text with selectable graphics, and one may apply other acces- sibility features to the document. Adobe Acrobat Professional provides three OCR options, Searchable Image (Exact), Searchable Image, and Clean Scan. Because Searchable Image (Exact) is the only option that keeps the original look, we only use this option. To run an OCR procedure using Acrobat 9 Professional: 1. Open a digitized PDF. 2. Select Document > OCR text recognition > Recognize text using OCR. 3. In the Recognize Text dialog, specify pages to be OCRed. 4. In the Recognize Text dialog, click the Edit button in the Settings sec- tion to choose OCR language and PDF Output Style. We recommend the Searchable Image (Exact) option. Click OK. The setting will be remembered by the program and will be used until a new setting is chosen. Sometimes a PDF’s file size increases greatly after an OCR process. If this happens, use the PDF optimizer to reduce its file size. Figure 2. Merge files into a single PDF Figure 3. Combine Files dialog Figure 1. Acrobat 9 Professional’s Create PDF from Scanner Dialog 3142 ---- editoriAl | truitt 55 A recent Library Journal (LJ) story referred to “the pal- pable hunger public librarians have for change . . . and, perhaps, a silver bullet to ensure their future” in the context of a presentation at the Public Library Association’s 2010 Annual Conference by staff members of the Rangeview (Colo.) Library District. Now, lest there be any doubt on this point, allow me to state clearly from the outset that none of the following ramblings are in any way intended as a specific critique of the measures under- taken by Rangeview. Far be it from me to second-guess the Rangeview staff’s judgment as to how best to serve the community there.1 Rather, what got my attention was LJ’s reference to a “palpable hunger”for magic ammunition, from whose presumed existence we in libraries seem to draw com- fort. In the last quarter century, it seems as though we’ve heard about and tried enough silver bullets to keep our collective six-shooters endlessly blazing away. Here are just a few examples that I can recall off the top of my head, and in no particular order: ■■ Library cafes and coffee shops. ■■ Libraries arranged along the lines of chain book- stores. ■■ General-use computers in libraries (including infor- mation/knowledge commons and what-have-you) ■■ Computer gaming in libraries. ■■ Lending laptops, digital cameras, mp3 players and iPods, e-book readers, and now iPads. ■■ Mobile technology (e.g., sites and services aimed at and optimized for iPhones, Blackberries, etc.) ■■ E-books and e-serials. ■■ Chat and instant-message reference. ■■ Libraries and social networking (e.g., Facebook, Twitter, Second Life, etc.). ■■ “Breaking down silos,” and “freeing”/exposing our bibliographic data to the Web, and reuse by others outside of the library milieu. ■■ Ditching our old and “outmoded” systems, whether the object of our scorn is AACR2, LCSH, LCC, Dewey, MARC, the ILS, etc. ■■ Library websites generally. Remember how every- one—including us—simply had to have a website in the 1990s? And ever since then, it’s been an endless treadmill race to find the perfect, user-centric library Web presence? If Sisyphus were to be incarnated today, I have little doubt that he would appear as a library Web manager and his boulder would be a library website. ■■ Oh, and as long as we’re at it, “user-centricity” gen- erally. The implication, of course, is that before the term came into vogue, libraries and librarians were not focused on users. ■■ “Next-gen” catalogs. I’m sure I’m forgetting a whole lot more. Anyway, you get the picture. Each of these has, at one time or another, been posi- tioned by some advocate as the necessary change—the “silver bullet”—that would save libraries from “irrel- evance” (or worse!), if we would but adopt it now, or better yet, yesterday. Well, to judge from the generally dismal state of libraries as depicted by some opinion- makers in our profession—or perhaps simply from our collective lack of self-esteem—we either have been misled about the potency of our ammunition, or else we’ve been very poor markspersons. Notwithstanding the fact that we seem to have been indiscriminately blasting away with shotguns rather than six-shooters, our shooting has neither reversed the trends of shrinking budgets and declining morale nor staunched the ceaseless dire warn- ings of some about “irrelevance” resulting from ebbing library use. To stretch the analogy a bit further still, one might even argue that all this shooting has done damage of its own, peppering our most valuable services with countless pellet-sized holes. At the same time, we have in recent years shown ourselves to be remarkably susceptible to the marketing- focused hyperbole of those in and out of librarianship about technological change. Each new technology is labeled a “game-changer”; change in general is either— to use the now slightly-dated, oh-so-nineties term—a “paradigm shift” or, more recently, “transformational.” When did we surrender our skepticism and awareness of a longer view? What’s wrong with this picture?2 I’d like to suggest another way of viewing this. A couple of years ago, Alan Weisman published The World Without Us, a book that should be required reading for all who are interested in sustainability, our own hubris, and humankind’s place in the world. The book begins with our total, overnight disappearance, and asks (1) What would the earth be like without us? and (2) What evidence of our works would remain, and for how long? The bottom line answers for Weisman are (1) In the long run, probably much better off, and (2) Not much and not for very long, really. So, applying Weisman’s first question to our own, much more modest domain, what might the world be like if tomorrow librarians all disappeared or went on to work doing something else—became consultants, perhaps?— and our physical and virtual collections were padlocked? Would everything be okay, because as some believe, Marc TruittEditorial: No More Silver Bullets, Please marc truitt (marc.truitt@ualberta.ca) is Associate University Librarian, Bibliographic and information Technology Services, University of Alberta Libraries, Edmonton, Alberta, Canada, and Editor of ITAL. 56 iNFormAtioN tecHNoloGY ANd liBrAries | JuNe 2010 think we need to be prepared to turn off the lights, lock the doors, and go elsewhere, because I hope that what we’re doing is about more than just our own job security. And if the far-fetched should actually happen, and we all disappear? I predict that at some future point, some- one will reinvent libraries and librarians, just as others have reinvented cataloguing in the guise of metadata. Notes and references 1. Norman Oder, “PLA 2010 Conference: The Anythink Revolution is Ripe,” Library Journal, Mar. 26, 2010, http://www .libraryjournal.com/article/CA6724258.html (accessed Mar. 30, 2010). There, I said it! A fairly innocuous disclaimer added to one of my columns last year seemed to garner more attention (http:// freerangelibrarian.com/2009/06/13/marc-truitts-surprising -ital-editorial/) than did the content of the column itself. Will the present disclaimer be the subject of similar speculation? 2. One of my favorite antidotes to such bloated, short-term language is embodied in Michael Gorman’s “Human Values in a Technological Age,” ITAL 20, no. 1 (Mar. 2000): 4–11, http:// www.ala.org/ala/mgrps/divs/lita/ital/2001gorman.cfm (accessed Apr 12, 2010)—highly recommended. The following is but one of many calming and eminently sensible observations Gorman makes: The key to understanding the past is the knowledge that people then did not live in the past—they lived in the present, just a different present from ours. The present we are living in will be the past sooner than we wish. What we perceive as its uniqueness will come to be seen as just a part of the past as viewed from the point of a future present that will, in turn, see itself as unique. People in history did not wear quaintly old- fashioned clothes—they wore modern clothes. They did not see themselves as comparing unfavorably with the people of the future, they compared themselves and their lives favorably with the people of their past. In the context of our area of interest, it is particularly interesting to note that people in history did not see themselves as technologically primitive. On the contrary, they saw themselves as they were—at the leading edge of technology in a time of unprecedented change. it’s all out there on the Web anyway, and Google will make it findable? Absent a few starry-eyed bibliophiles and newly out-of-work librarians—those who didn’t make the grade as consultants—would anyone mourn our disappearance? Would anyone notice? If a tree falls in the woods . . . In short, would it matter? And if so, why and how much? The answer to the preceding two questions, I think, can help to point the way to an approach for understand- ing and evaluating services and change in libraries that is both more realistic and less draining than our obsessive quest for the “silver bullet.” What exactly is our “value- add”? What do we provide that is unique and valuable? We can’t hope to compete with Barnes and Noble, Starbucks, or the Googleplex; seeking to do so simply diverts resources and energy from providing services and resources that are uniquely ours. Instead, new and changed services and approaches should be evaluated in terms of our value-add: If they contribute positively and are within our abilities to do them, great. If they do not contribute positively, then try- ing to do them is wasteful, a distraction, and ultimately disillusioning to those who place their hopes in such panaceas. Some of the “bullets” I listed above may well qualify as contributing to our value-add, and that’s fine. My point isn’t to judge whether they are “bad” or “good.” My argument is about process and how we decide what we should do and not do. Understanding what we contribute that is uniquely ours should be the reference standard by which proposed changes are evaluated, not some pie-in- the-sky expectation that pursuit of this or that vogue will magically solve our funding woes, contribute to higher (real or virtual) gate counts, make us more “relevant” to a particular user group, or even raise our flagging self- esteem. In other words, our value-add must stand on its own, regardless of whether it actually solves temporal problems. It is the “why” in “why are we here?” If, at the end of the day, we cannot articulate that which makes us uniquely valuable—or if society as a whole finds that contribution not worth the cost—then I 3143 ---- editoriAl BoArd tHouGHts: itAl 2.0 | Boze 57 litablog.org/) I see that there are occasional posts, but there are rarely comments and little in the way of real dis- cussion. It seems to be oriented toward announcements, so perhaps it’s not a good comparison with ITALica. Some ALA groups are using WordPress for their blogs, a few with user comments, but mostly without much apparent traffic (for example, the LL&M Online blog, http://www .lama.ala.orgLLandM). In general, blogs don’t seem to be a satisfactory platform for discussion. Wikis aren’t par- ticularly useful in this regard, either, so I think that rules out the LITA Wiki (http://wikis.ala.org/lita/index.php/ Main_Page). I’ve looked at ALA Connect (http://connect. ala.org/), which has a variety of Web 2.0 features, so it might be a good home for ITALica. We could also use a mailing list, either one that already exists, such as LITA-L, or a new one. The one advantage e-mail has is that it is delivered to the reader, so one doesn’t have to remember to visit a website. We already have RSS feeds for the ITALica blog, so maybe that works well enough as a notification for those who subscribe to them. I’ve also wondered whether a discussion forum (aka message board) would be useful. I frequent a few soft- ware-related forums, and I find them conducive to discussion. They have a degree of flexibility lacking in other platforms. It’s easy for any participant to start up a new topic rather than limiting discussion only to topics posted by the owner, as is usually the case with blogs. Frankly I’d like to encourage discussion on topics beyond only the articles published in ITAL. For example, we used to have columns devoted to book and software reviews. Even though they were discontinued, those could still be interesting topics for discussion between ITAL readers. In writing this, my hope is to get feedback from you, the reader, about what ITAL and ITALica could be doing for you. How can we use ALA Connect in ways that would be useful? Could we use other platforms to do things beyond simply discussing articles that appear in the print edition? What social Web technologies do you use, and how could we apply them to ITAL? After you read this, I hope you’ll join us at ITALica for a discussion. Let us know what you think. Editor’s note: Andy serves on the ITAL Editorial Board and as the ITAL website manager. He earns our gratitude every quarter with his timely and professional work to post the new issue online. T he title of this recurring column is “Editorial Board Thoughts,” so as I sit here in the middle of February, what am I thinking about? As I trudge to work each day through the snow and ice, I think about what a nuisance it is to have a broken foot (I broke the fifth metatarsal of my left foot at the Midwinter Meeting in Boston—not recommended) but most recently I’ve been thinking about ITAL. The March issue is due to be mailed in a couple of weeks, and I got the digITAL files a week or so ago. In a few days I’ll have to start separating the PDF into individ- ual articles, and then I’ll start up my Web editor to turn the RTF files for each article into nicely formatted HTML. All of this gets fed into ALA’s content management sys- tem, where you can view it online by pointing your Web browser to http://www.lita.org/ala/mgrps/divs/lita/ ITAL/ITALinformation.cfm. In case you didn’t realize it, the full text of each issue of ITAL is there, going back to early 2004. Selected full-text articles are available from earlier issues going back to 2001. The site is in need of a face lift, but we expect to work on that in the near future. Starting with the September 2008 issue of ITAL we launched ITALica, the ITAL blog at http://ITAL-ica .blogspot.com/, as a pilot. ITALica was conceived as a forum for readers, authors, and editors of ITAL to discuss each issue. For a year and a half we’ve been open for reader feedback, and our authors have been posting to the blog and responding to reader comments. What’s your opinion of ITALica? Is it useful? What could we be doing to enhance its usefulness? In reality we haven’t had a great deal of communica- tion via the blog. We are looking at moving ITALica from Blogger to a platform more integrated with existing ALA or LITA services. Is a blog format the best way to encour- age discussion? When I look at the LITA Blog (http:// Andy Boze (Boze.1@nd.edu) is Head, desktop Computing and network Services, University of notre dame Hesburgh Libraries, notre dame, indiana. Andy BozeEditorial Board Thoughts: ITAL 2.0 3141 ---- 54 iNFormAtioN tecHNoloGY ANd liBrAries | JuNe 2010 tinuing education opportunities for library informa- tion technologists and all library staff who have an interest in technology. 2. Innovation: To serve the library community, LITA expert members will identify and demonstrate the value of new and existing technologies within ALA and beyond. 3. Advocacy and policy: LITA will advocate for and participate in the adoption of legislation, policies, technologies, and standards that promote equitable access to information and technology. 4. The organization: LITA will have a solid structure to support its members in accomplishing its mission, vision, and strategic plan. 5. Collaboration and outreach: LITA will reach out and col- laborate with other library organizations to increase the awareness of the importance of technology in libraries, improve services to existing members, and reach out to new members. The LITA Executive Committee is currently finalizing the strategies LITA will pursue to achieve success in each of the goal areas. It is my hope that the strategies for each goal are approved by the LITA board of directors before the 2010 ALA Annual Conference in Washington, D.C. That way the finalized version of the LITA Strategic Plan can be introduced to the Committee and Interest Group Chairs and the membership as a whole at that conference. This will allow us to start the next fiscal year with a clear road for the future. While I am excited about what is next, I have also been dreading the end of my presidency. I have truly enjoyed my experience as LITA president, and in some way wish it was not about to end. I have learned so much and have met so many wonderful people. Thank you for giving me this opportunity to serve you and for your support. I have truly appreciated it. A s I write this last column, the song “My Way” by Frank Sinatra keeps going through my head. While this is definitely not my final curtain, it is the final curtain of my presidency. Like Sinatra I have a few regrets, “but then again, too few to mention.” There was so much more I wanted to accomplish this year; however, as usual, my plans were more ambitious than the time I had available. Being LITA’s president was a big part of my life, but it was not the only part. Those other parts—like family, friends, work, and school—demanded my atten- tion as well. I have thought about what to say in this final column. Do I list my accomplishments of the last year? Nah, you can read all about that in the LITA Annual Report, which I will post in June. Tackle some controversial topic? While I can think of a few, I have not yet thought of any solutions, and I do not want to rant against something without pro- posing some type of solution or plan of attack. I thought instead I would talk about where I have devoted a large part of my LITA time over the last year. As I look back at the last year, I am also thinking ahead to the future of LITA. We are currently writing LITA’s Strategic Plan. We have a lot to great ideas to work with. LITA members are always willing to share their thoughts both formally and informally. I have been charged with the task of taking all of those great ideas, gathered at conferences, board meetings, hallway con- versations, surveys, e-mail, etc., to create a roadmap for the future. After reviewing all of the ideas gathered over the last three years, I was able to narrow that list down to six major goal areas. With the assistance of the LITA board of directors and the LITA Executive Committee, we whittled the list down to five major goal areas of the LITA Strategic Plan: 1. Training and continuing education: LITA will be nationally recognized as the leading source for con- michelle Frisque (mfrisque@northwestern.edu) is LiTA President 2009–10 and Head, information Systems, north- western University, Chicago. Michelle Frisque President’s Message: The End and New Beginnings 3144 ---- 58 iNFormAtioN tecHNoloGY ANd liBrAries | JuNe 2010 know its power, and facets can showcase metadata in new interfaces. According to McGuinness, facets perform several functions in an interface: ■■ vocabulary control ■■ site navigation and support ■■ overview provision and expectation setting ■■ browsing support ■■ searching support ■■ disambiguation support5 These functions offer several potential advantages to the user: The functions use category systems that are coherent and complete, they are predictable, they show previews of where to go next, they show how to return to previous states, they suggest logical alternatives, and they help the user avoid empty result sets as searches are narrowed.6 Disadvantages include the fact that categories of interest must be known in advance, important trends may not be shown, category structures may need to be built by hand, and automated assignment is only partly successful.7 Library catalog records, of course, already supply “categories of interest” and a category structure. Information science research has shown benefits to users from faceted search interfaces. But do these benefits hold true for systems as complex as library catalogs? This paper presents an extensive review of both information science and library literature related to faceted browsing. ■■ Method To find articles in the library and information science lit- erature related to faceted browsing, the author searched the Association for Computing Machinery (ACM) Digital Library, Scopus, and Library and Information Science and Technology Abstracts (LISTA) databases. In Scopus and the ACM Digital Library, the most successful searches included the following: ■■ (facet* or cluster*) and (usability or user stud*) ■■ facet* and usability In LISTA, the most successful searches included combining product names such as “aquabrowser” with “usability.” The search “catalog and usability” was also used. The author also searched Google and the Next Generation Catalogs for Libraries (NGC4LIB) electronic discussion list in an attempt to find unpublished studies. Search terms initially included the concept of “clus- tering”; however, this was quickly shown to be a clearly defined, separate topic. According to Hearst, “Clustering refers to the grouping of items according to some measure Faceted browsing is a common feature of new library catalog interfaces. But to what extent does it improve user performance in searching within today’s library catalog systems? This article reviews the literature for user studies involving faceted browsing and user studies of “next-generation” library catalogs that incorporate faceted browsing. Both the results and the methods of these studies are analyzed by asking, What do we cur- rently know about faceted browsing? How can we design better studies of faceted browsing in library catalogs? The article proposes methodological considerations for prac- ticing librarians and provides examples of goals, tasks, and measurements for user studies of faceted browsing in library catalogs. M any libraries are now investigating possible new interfaces to their library catalogs. Sometimes called “next-generation library catalogs” or “dis- covery tools,” these new interfaces are often separate from existing integrated library systems. They seek to provide an improved experience for library patrons by offering a more modern look and feel, new features, and the potential to retrieve results from other major library systems such as article databases. One interesting feature these interfaces offer is called “faceted browsing.” Hearst defines facets as a “a set of meaningful labels organized in such a way as to reflect the concepts relevant to a domain.”1 LaBarre defines fac- ets as representing “the categories, properties, attributes, characteristics, relations, functions or concepts that are central to the set of documents or entities being organized and which are of particular interest to the user group.”2 Faceted browsing offers the user relevant subcategories by which they can see an overview of results, then nar- row their list. In library catalog interfaces, facets usually include authors, subjects, and formats, but may include any field that can be logically created from the MARC record (see figure 1 for an example). Using facets to structure information is not new to librarians and information scientists. As early as 1955, the Classification Research Group stated a desire to see faceted classification as the basis for all information retrieval.3 In 1960, Ranganathan introduced facet analysis to our profession.4 Librarians like metadata because they Jody condit Fagan (faganjc@jmu.edu) is Content interfaces Coordinator, James Madison University Library, Harrisonburg, Virginia. Jody Condit Fagan Usability Studies of Faceted Browsing: A Literature Review usABilitY studies oF FAceted BrowsiNG: A literAture review | FAGAN 59 doing so and performed a user study to inform their decision. results: empirical studies of faceted browsing The following summaries present selected empirical research studies that had significant findings related to faceted browsing or inter- esting methods for such studies. It is not an exhaustive list. Pratt, Hearst, and Fagan questioned whether faceted results were better than clustering or relevancy-ranked results.11 They studied fif- teen breast-cancer patients and families. Every subject used three tools: a faceted interface, a tool that clustered the search results, and a tool that ranked the search results according to relevance criteria. The subjects were given three simple queries related to breast cancer (e.g., “What are the ways to prevent breast cancer?”), asked to list answers to these before beginning, and to answer the same queries after using all the tools. In this study, sub- jects completed two timed tasks. First, subjects found as many answers as possible to the question in four minutes. Second, the researchers measured the time subjects took to find answers to two specific questions (e.g., “Can diet be used in the prevention of breast cancer?”) that related to the original, general query. For the first task, when the subjects used the faceted interface, they found more answers than they did with the other two tools. The mean number of answers found using the faceted interface was 7.80, for the cluster tool it was 4.53, and for the ranking tool it was 5.60. This difference was significant (p<0.05).12 For the second task, the researchers found no significant difference between the tools when comparing time on task. The researchers gave the subjects a user-satisfaction questionnaire at the end of the study. On thirteen of the fourteen quantitative questions, satisfaction scores for the faceted interface were much higher than they were for either the ranking tool or the cluster tool. This difference was statistically significant (p < 0.05). All fifteen users also affirmed that the faceted interface made sense, was help- ful, was useful, and had clear labels, and said they would use the faceted interface again for another search. Yee et al. studied the use of faceted metadata for image searching, and browsing using an interface they developed called Flamenco.13 They collected data from thirty-two participants who were regular users of the Internet, searching for information either every day or a few times a week. Their subjects performed four tasks (two structured and two unstructured) on each of two interfaces. An example of an unstructured task from their study was “search for images of interest.” An example of a structured task was to gather materials for an art history of similarity . . . typically computed using associations and commonalities among features where features are typically words and phrases.”8 Using library catalog key- words to generate word clouds would be an example of clustering, as opposed to using subject headings to group items. Clustering has some advantages according to Hearst. It is fully automated, it is easily applied to any text collection, it can reveal unexpected or new trends, and it can clarify or sharpen vague queries. Disadvantages to clustering include possible imperfections in the cluster- ing algorithm, similar items not always being grouped into one cluster, a lack of predictability, conflating many dimensions, difficulty labeling groups, and counterintui- tive subhierarchies.9 In user studies comparing clustering with facets, Pratt, Hearst, and Fagan showed that users find clustering difficult to interpret and prefer a predict- able organization of category hierarchies.10 ■■ Results The author grouped the literature into two categories: user studies of faceted browsing and user studies of library catalog interfaces that include faceted browsing as a feature. Generally speaking, the information science literature consisted of empirical studies of interfaces cre- ated by the researchers. In some cases, the researchers’ intent was to create and refine an interface intended for actual use; in others, the researchers created the interface only for the purposes of studying a specific aspect of user behavior. In the library literature, the studies found were generally qualitative usability studies of specific library catalog interface products. Libraries had either implemented a new product, or they were thinking about Figure 1. Faceted results from JMU’s VuFind implementation 60 iNFormAtioN tecHNoloGY ANd liBrAries | JuNe 2010 Uddin and Janacek asked nineteen users (staff and students at the Asian Institute of Technology) to use a website search engine with both a traditional results list and a faceted results list.22 Tasks were as follows: (1) look for scholarship information for a masters program, (2) look for staff recruitment information, and (3) look for research and associated faculty member information within your interested area.23 They found that users were faster when using the faceted system, significantly so for two of the three tasks. Success in finding relevant results was higher with the faceted system. In the post–study questionnaire, participants rated the faceted system more highly, including significantly higher ratings for flexibil- ity, interest, understanding of information content, and more search results relevancy. Participants rated the most useful features to be the capability to switch from one facet to another, preview the result set, combine facets, and navigate via breadcrumbs. Capra et al. compared three interfaces in use by the Bureau of Labor Statistics website, using a between-sub- jects study with twenty-eight people and a within-subjects study with twelve people.24 Each set of participants per- formed three kinds of searches: simple lookup, complex lookup, and exploratory. The researchers used an interest- ing strategy to help control the variables in their study: Because the BLS website is a highly specialized corpus devoted to economic data in the United States orga- nized across very specific time periods (e.g., monthly releases of price or employment data), we decided to include the US as a geographic facet and a month or year as a temporal facet to provide context for all search tasks in our study. Thus, the simple lookup tasks were constructed around a single economic facet but also included the spatial and temporal facets to provide context for the searchers. The complex lookup tasks involve additional facets including genre (e.g. press release) and/or region.25 Capra et al. found that users preferred the familiarity afforded by the traditional website interface (hyperlinks + keyword search) but listed the facets on the two experi- mental interfaces as their best features. The researchers concluded, “If there is a predominant model of the infor- mation space, a well designed hierarchical organization might be preferred.”26 Zhang and Marchionini analyzed results from fifteen undergraduate and graduate students in a usability study of an interface that used facets to categorize results (Relation Browser ++).27 There were three types of tasks: ■■ Type 1: Simple look-up task (three tasks such as “check if the movie titled The Matrix is in the library movie collection”). ■■ Type 2: Data exploration and analysis tasks (six tasks essay on a topic given by the researchers and to complete four related subtasks. The researchers designed the struc- tured task so they knew exactly how many relevant results were in the system. They also gave a satisfaction survey. More participants were able to retrieve all relevant results with the faceted interface than with the baseline interface. During the structured tasks, participants received empty results with the baseline interface more than three times as often as with the faceted interface.14 The researchers found that participants constructed queries from multiple facets in the unstructured tasks 19 percent of the time and in the structured tasks 45 percent of the time.15 When given a post–test survey, participants identified the fac- eted interface as easier to use, more flexible, interesting, enjoyable, simple, and easy to browse. They also rated it as slightly more “overwhelming.” When asked to choose between the two, twenty-nine participants chose the fac- eted interface, compared with two who chose the baseline (N = 31). Thirty-one of the thirty-two participants said the faceted interface helped them learn more, and twenty- eight of them said it would be more useful for their usual tasks.16 The researchers concluded that even though their faceted interface was much slower than the other, it was strongly preferred by most study participants: “These results indicate that a category-based approach is a suc- cessful way to provide access to image collections.”17 In a related usability study on the Flamenco interface, English et al. compared two image browsing interfaces in a nineteen-participant study.18 After an initial search, the “Matrix View” interface showed a left column with facets, with the images in the result set placed in the main area of the screen. From this intermediary screen, the user could select multiple terms from facets in any order and have the items grouped under any facet. The “SingleTree” interface listed subcategories of the currently selected term at the top, with query previews underneath. The user could then only drill down to subcategories of the current category, and could not select terms from more than one facet. The researchers found that a majority of participants preferred the “power” and “flexibility” of Matrix to the simplicity of SingleTree. They found it easier to refine and expand searches, shift between searches, and troubleshoot research problems. They did prefer SingleTree for locating a specific image, but Matrix was preferred for browsing and exploring. Participants started over only 0.2 percent of the time for the Matrix compared to 4.5 percent for SingleTree.19 Yet the faceted interface, Matrix, was not “better” at everything. For specific image searching, participants found the correct image only 22.0 percent of the time in Matrix compared to 66.0 percent in SingleTree.20 Also, in Matrix, some participants drilled down in the wrong hierarchy with wrong assumptions. One interesting finding was that in both interfaces, more participants chose to begin by browsing (12.7 percent) than by searching (5.0 percent).21 usABilitY studies oF FAceted BrowsiNG: A literAture review | FAGAN 61 of the first two studies: The first study comprised one faculty member, five graduate students, and two under- graduate students; the second comprised two faculty members, four graduate students, and two undergradu- ate students. The third study did not report results related to faceted browsing and is not discussed here. The first study had seven scenarios; the second study had nine. The scenarios were complex: for example, one scenario began, “You want to borrow Shakespeare’s play, The Tempest, from the library,” but contained the following subtasks as well: 1. Find The Tempest. 2. Find multiple editions of this item. 3. Find a recent version. 4. See if at least one of the editions is available in the library. 5. What is the call number of the book? 6. You’d like to print the details of this edition of the book so you can refer to it later. Participants found the interface friendly, easy to use, and easy to learn. All the participants reported that fac- eted browsing was useful as a means of narrowing down the result lists, and they considered this tool one of the differentiating features between Primo and their library OPAC or other interfaces. Facets were clear, intuitive, and useful to all participants, including opening the “more” section.31 One specific result from the tests was that “online resources” and “available” limiters were moved from a separate location to the right with all other facets.32 In a study of Aquabrowser by Olson, twelve subjects— all graduate students in the humanities—participated in a comparative test in which they looked for additional sources for their dissertation.33 Aquabrowser was created by MediaLab but is distributed by Serials Solutions in North America. This study also had three pilot subjects. No relevance judgments were made by the researchers. Nine of the twelve subjects found relevant materials by using Aquabrowser that they had not found before.34 Olson’s subjects understood facets as a refinement tool (narrowing) and had a clear idea of which facets were useful and not useful for them. They gave overwhelm- ingly positive comments. Only two felt the faceted interface was not an improvement. Some participants wanted to limit to multiple languages or dates, and a few were confused about the location of facets in multiple places, for example, “music” under both format and topic. A team at Yale University, led by Bauer, recently conducted two tests on pilot VuFind installations: a subject-based presentation of e-books for the Cushing/ Whitney Medical Library and a pilot test of VuFind using undergraduate students with a sample of 400,000 records from the library system.35 VuFind is open-source software developed at Villanova University (http://vufind.org). that require users to understand and make sense of the information collection: “In which decade did Steven Spielberg direct the most movies?”). ■■ Type 3: (one free exploration task: “find five favorite videos without any time constraints”). The tasks assigned for the two interfaces were dif- ferent but comparable. For type 2 tasks, Zhang and Marchionini found that performance differences between the two interfaces were all statistically significant at the .05 level.28 No participants got wrong answers for any but one of the tasks using the faceted interface. With regard to satisfaction, on the exploratory tasks the researchers found statistically significant differences favoring the faceted interface on all three of the satisfaction ques- tions. Participants found the faceted interface not as aesthetically appealing nor as intuitive to use as the basic interface. Two participants were confused by the constant changing and updating of the faceted interface. The above studies are examples of empirical inves- tigations of experimental interfaces. Hearst recently concluded that facets are a “proven technique for sup- porting exploration and discovery” and summarized areas for further research in this area, such as applying facets to large “subject-oriented category systems,” facets on mobile interfaces, adding smart features like “auto- complete” to facets, allowing keyword search terms to affect order of facets, and visualizations of facets.29 In the following section, user studies of next-generation library catalog interfaces will be presented. results: library literature Understandably, most studies by practicing librarians focus on products their libraries are considering for eventual use. These studies all use real library catalog records, usually the entire catalog’s database. In most cases, these studies were not focused on investigating faceted browsing per se, but on the usability of the overall interface. In general, these studies used fewer participants than the information science studies above, followed less rigorous methods, and were not subjected to statistical tests. Nevertheless, they provide many insights into the user experience with the extremely complex datasets underneath next-generation library catalog interfaces that feature faceted browsing. In this review article, only results specifically relating to fac- eted browsing will be presented. Sadeh described a series of usability studies per- formed at the University of Minnesota (UM), a Primo development partner.30 Primo is the next-generation library catalog product sold by Ex Libris. The author also received additional information from the Usability Services lab at UM via e-mail. Three studies were con- ducted in August 2006, January 2007, and October 2007. Eight users from various disciplines participated in each 62 iNFormAtioN tecHNoloGY ANd liBrAries | JuNe 2010 participants. The researchers measured task success, dura- tion, and difficulty, but did not measure user satisfaction. Their study consisted of four known-item tasks and six topic-searching tasks. The topic-searching tasks were geared toward the use of facets, for example, “Can you show me how would you find the most recently published book about nuclear energy policy in the United States?”45 All five participants using Endeca understood the idea of facets, and three used them. Students tried to limit their searches at the outset rather than search and then refine results. An interesting finding was that use of the facets did not directly follow the order in which facets were listed. The most heavily used facet was Library of Congress Classification (LCC), followed closely by topic, and then library, format, author, and genre.46 Results showed a sig- nificantly shorter average task duration for Endeca catalog users for most tasks.47 The researchers noted that none of the students understood that the LCC facet represented call-number ranges, but all of the students understood that these facets “could be used to learn about a topic from dif- ferent aspects—science, medicine, education.”48 The authors could find no published studies relating to the use of facets in some next-generation library cata- logs, including Encore and WorldCat Local. Although the University of Washington did publish results of a WorldCat Local usability study in a recent issue of Library Technology Reports, results from the second round of testing, which included an investigation of facets, were not yet ready.49 ■■ Discussion summary of empirical evidence related to faceted browsing Empirical studies in the information science literature support many positive findings related to faceted brows- ing and build a solid case for including facets in search interfaces: ■■ Facets are useful for creating navigation structures.50 ■■ Faceted categorization greatly facilitates efficient retrieval in database searching.51 ■■ Facets help avoid dead ends.52 ■■ Users are faster when using a faceted system.53 ■■ Success in finding relevant results is higher with a faceted system.54 ■■ Users find more results with a faceted system.55 ■■ Users also seem to like facets, although they do not always immediately have a positive reaction. ■■ Users prefer search results organized into predict- able, multidimensional hierarchies.56 ■■ Participants’ satisfaction is higher with a faceted system.57 The team drew test questions from user search logs in their current library system. Some questions targeted specific problems, such as incomplete spellings and incomplete title information. Bauer notes that some problems uncovered in the study may relate to the pecu- liarities of the Yale implementation. The medical library study contained eight partici- pants—a mix of medical and nursing students. Facets, reported Bauer, “worked well in several instances, although some participants did not think they were noticeable on the right side of the page.”36 The prompt for the faceted task in this study came after the user had done a search: “What if you wanted to look at a particular sub- set, say ‘xxx’ (determine by looking at the facets).”37 Half of the participants used facets, half used “search within” to narrow the topic by adding keywords. Sixty-two per- cent of the participants were successful at this task. The undergraduate study asked five participants faced with a results list, “What would you do now if you only wanted to see material written by John Adams?”38 On this task, only one of the five was successful, even though the author’s name was on the screen. Bauer noted that in general, “the use of the topic facet to narrow the search was not understood by most participants. . . . Even when participants tried to use topic facets the length of the list and extraneous topics rendered them less than useful.”39 The five undergraduates were also asked, “Could you find books in this set of results that are about health and illness in the United States population, or control of com- municable diseases during the era of the depression?”40 Again, only one of the five was successful. Bauer notes that “the overly broad search results made this difficult for participants. Again, topic facets were difficult to navi- gate and not particularly useful to this search.”41 Bauer’s team noted that when the search was configured to return more hits, “topic facets become a confusingly large set of unrelated items. These imprecise search results, combined with poor topic facet sets, seemed to result in confusion for test participants.”42 Participants were not aware that topics represented subsets, although learning occurred because the “narrow” header was helpful to some par- ticipants.43 Other results found by Bauer’s team were that participants were intrigued by facets, navigation tools are needed so that patrons may reorder large sets of topic fac- ets, format and era facets were useful to participants, and call-number facets were not used by anyone. Antelman, Pace, and Lynema studied North Carolina State University’s (NCSU) next-generation library catalog, which is driven by software from Endeca.44 Their study used ten undergraduate students in a between-subjects design where five used the Endeca catalog and five used the library’s traditional catalog. The researchers noted that their participants may have been experienced with the library’s old catalog, as log data shows most NCSU users enter one or two terms, which was not true of study usABilitY studies oF FAceted BrowsiNG: A literAture review | FAGAN 63 one product’s faceted system for a library catalog does not substitute for another, the size and scope of local collections may greatly affect results, and cataloging practices and metadata will affect results. Still, it is important for practic- ing librarians to determine if new features such as facets truly improve the user’s experience. methodological best practices After reading numerous empirical research studies (some of which critique their own methods) and library case studies, some suggestions for designing better studies of facets in library catalogs emerged. designing the study ■■ Consider reusing protocols from previous studies. This provides not only a tested method but also a possible point of comparison. ■■ Define clear goals for each study and focus on spe- cific research questions. It’s tempting to just throw the user into the interface and see what happens, but this makes it difficult, if not impossible, to analyze the results in a useful way. For example, one of Zhang and Marchionini’s hypotheses specifically describes what rich interaction would look like: “Typing in key- words and clicking visual bars to filter results would be used frequently and interchangeably by the users to finish complex search tasks, especially when large numbers of results are returned.”64 ■■ Develop the study for one type of user. Olson’s focus on graduate students in the dissertation process allowed the researchers to control for variables such as interest of and knowledge about the subject. ■■ Pilot test the study with a student worker or col- league to iron out potential wrinkles. ■■ Let users explore the system for a short time and pos- sibly complete one highly structured task to help the user become used to the test environment, interface, and facilitator.65 Unless you are truly interested in the very first experience users have with a system, the first use of a system is an artificial case. designing tasks ■■ Make sure user performance on each task is measur- able. Will you measure the time spent on a task? If “success” is important, define what that would look like. For example, English et al. defined success for one of their tasks as when “the participant indicated (within the allotted time) that he/she had reached an appropriate set of images/specific image in the collection.”66 ■■ Establish benchmarks for comparison. One can test for significant differences between interfaces, one can test for differences between research subjects and an expert user, and one can simply measure against ■■ Users are more confident with a faceted system.58 ■■ Users may prefer the familiarity afforded by tra- ditional website interface (hyperlinks + keyword search).59 ■■ Initial reactions to the faceted interface may be cau- tious, seeing it as different or unfamiliar.60 Users interact with specific characteristics of faceted interfaces, and they go beyond just one click with facets when it is permitted. English et al. found that 7 percent of their participants expanded facets by removing a term, and that facets were used more than “keyword search within”: 27.6 percent versus 9 percent.61 Yee et al. found that participants construct queries from multiple facets 19 percent of the time in unstructured tasks; in structured tasks they do so 45 percent of the time.62 The above studies did not use library catalogs; in most cases they used an experimental interface with record sets that were much smaller and less complicated than in a complete library collection. Domains included websites, information from one website, image collections, video collections, and a journal article collection. summary of practical user studies related to faceted browsing This review also included studies from practicing librar- ians at live library implementations. These studies generally had smaller numbers of users, were more likely to focus on the entire interface rather than a few features, and chose more widely divergent methods. Studies were usually linked to a specific product, and results varied widely between systems and studies. For this reason it is difficult to assemble a bulleted summary as with the previous section. The variety of results from these studies indicate that when faceted browsing is applied to a real- life situation, implementation details can greatly affect user performance and user preference. Some, like LaBarre, are skeptical about whether fac- ets are appropriate for library information. Descriptions of library materials, says LaBarre, include analyses of intellectual content that go beyond the descriptive terms assigned to commercial items such as a laptop: Now is the time to question the assumptions that are embedded in these commercial systems that were primarily designed to provide access to concrete items through descriptions in order to enhance profit.63 It is clear that an evaluation of commercial interfaces or experimental interfaces does not substitute for an OPAC evaluation. Yet it is a challenge for libraries to find expertise and resources to conduct user studies. The systems they want to test are large and complex. Collaborating with other libraries has its own challenges: An evaluation of 64 iNFormAtioN tecHNoloGY ANd liBrAries | JuNe 2010 groups of participants, each of which tests a dif- ferent system. ■❏ A within-subjects design has one group of par- ticipants test both systems. It is hoped that if libraries use the suggestions above when designing future experiments, results across studies will be more comparable and useful. designing user studies of faceted browsing After examining both empirical research studies and case studies by practicing librarians, a key difference seems to be the specificity of research questions and design- ing tasks and measurements to test specific hypotheses. While describing a full user-study protocol for investi- gating faceted browsing in a library catalog is beyond the scope of this article, reviewing the literature and the study methods it describes provided insights into how hypotheses, tasks, and measurements could be written to provide more reliable and comparable evidence related to faceted browsing in library catalog systems. For example, one research question could surround the format facet: “Compared with our current interface, does our new faceted interface improve the user’s ability to find different formats of materials?” Hypotheses could include the following: 1. Users will be more accurate when identifying the formats of items from their result set when using the faceted interface than when using the traditional interface. 2. Users will be able to identify formats of items more quickly with the faceted interface than with the tradi- tional interface. Looking at these hypotheses, here is a prompt and some example tasks the participants would be asked to perform: “We will be asking you to find a variety of for- mats of materials. When we say formats of materials, we mean books, journal articles, videos, etc.” ■■ Task 1: Please use interface A to search on “interper- sonal communication.” Look at your results set. Please list as many different formats of material as you can. ■■ Task 2: How many items of each format are there? ■■ Task 3: Please use interface B to search on “family communication.” What formats of materials do you see in your results set? ■■ Task 4: How many items of each format are there?” We would choose the topics “interpersonal com- munication” and “family communication” because our local catalog has many material types for these topics and because these topics would be understood by most of our students. We would choose different topics to expectations or against previous iterations of the same study. For example, “75 percent of users completed the task within five minutes.” Zhang and Marchionini measured error rates, another possible benchmark.67 ■■ Consider looking at your existing OPAC logs for zero- results searches or other issues that might inspire interesting questions. ■■ Target tasks to avoid distracters. For example, if your catalog has a glut of government documents, consider running the test with a limit set to exclude them unless you are specifically interested in their impact. For example, Capra et al. decided to include the United States as a geographic facet and a month or year as a temporal facet to provide context for all search tasks in their study.68 ■■ For some tasks, give the subjects simple queries (e.g., “What are the ways to prevent breast cancer?”) as opposed to asking the subjects to come up with their own topic. This can help control for the potential challenges of formulating one’s own research ques- tion on the spot. As librarians know, formulating a good research question is its own challenge. ■■ If you are using any timed tasks, consider how the nature of your tasks could affect the result. For example, Pratt, Hearst, and Fagan noted that the time that it took subjects to read and understand abstracts most heavily influenced the time for them to find an answer.69 English et al. found that the system’s pro- cessing time influenced their results.70 ■■ Consider the implications of your local implementa- tion carefully when designing your study. At Yale, the team chose to point their VuFind instance at just 400,000 of their records, drew questions from prob- lems users were having (as shown in log files), and targeted questions to these problems.71 who to study? ■■ Try to study a larger set of users. It is better to create a short test with many users than a long test with a few users. Nielsen suggests that twenty users is suf- ficient.72 Consider collaborating with another library if necessary. ■■ If you test a small number, such as the typical four to eight users for a usability test, be sure you emphasize that your results are not generalizable. ■■ Use subjects who are already interested in the subject domain: for example, Pratt, Hearst, and Fagan used breast cancer patients,73 and Olson used graduate students currently writing their dissertations.74 ■■ Consider focusing on advanced or scholarly users. La Barre suggests that undergraduates may be over- studied.75 ■■ For comparative studies, consider having both between-subjects and within-subjects designs.76 ■❏ A between-subjects design involves creating two usABilitY studies oF FAceted BrowsiNG: A literAture review | FAGAN 65 these experimental studies. Previous case-study inves- tigations of library catalog interfaces with facets have proven inconclusive. By choosing more specific research questions, tasks, and measurements for user studies, libraries may be able to design more objective studies and compare results more effectively. References 1. Marti A. Hearst, “Clustering versus Faceted Categories for Information Exploration,” Communications of the ACM 49, no. 4 (2006): 60. 2. Kathryn La Barre, “Faceted Navigation and Browsing Fea- tures in New OPACS: Robust Support for Scholarly Information Seeking?” Knowledge Organization 34, no. 2 (2007): 82. 3. Vanda Broughton, “The Need for Faceted Classification as the Basis of All Methods of Information Retrieval,” Aslib Proceed- ings 58, no. 1/2 (2006): 49–71. 4. S. R. Ranganathan, Colon Classification Basic Classification, 6th ed. (New York: Asia, 1960). 5. Deborah L. McGuinness, “Ontologies Come of Age,” in Spinning the Semantic Web: Bringing the World Wide Web to Its Full Potential, ed. Dieter Fensel et al. (Cambridge, Mass.: MIT Pr., 2003): 179–84. 6. Hearst, “Clustering versus Faceted Categories,” 60. 7. Ibid., 61. 8. Ibid., 59. 9. Ibid.. 60. 10. Wanda Pratt, Marti A. Hearst, and Lawrence M. Fagan, “A Knowledge-Based Approach to Organizing Retrieved Docu- ments,” Proceedings of the Sixteenth National Conference on Artifi- cial Intelligence, July 18–22, 1999, Orlando, Florida (Menlo Park, Calif.: AAAI Pr., 1999): 80–85. 11. Ibid. 12. Ibid., 5. 13. Ka-Ping Yee et al., “Faceted Metadata for Image Search and Browsing,” 2003, http://flamenco.berkeley.edu/papers/ flamenco-chi03.pdf (accessed Oct. 6, 2008). 14. Ibid., 6. 15. Ibid., 7. 16. Ibid. 17. Ibid., 8. 18. Jennifer English et al., “Flexible Search and Navigation,” 2002, http://flamenco.berkeley.edu/papers/flamenco02.pdf (accessed Apr. 22, 2010). 19. Ibid., 7. 20. Ibid., 6. 21. Ibid., 7. 22. Mohammed Nasir Uddin and Paul Janecek, “Performance and Usability Testing of Multidimensional Taxonomy in Web Site Search and Navigation,” Performance Measurement and Met- rics 8, no. 1 (2007): 18–33. 23. Ibid., 25. 24. Robert Capra et al., “Effects of Structure and Interaction Style on Distinct Search Tasks,” Proceedings of the 7th ACM-IEEE-CS Joint Conference on Digital Libraries (New York: ACM, 2007): 442–51. 25. Ibid., 446. 26. Ibid., 450. help minimize learning effects. To further address this, we would plan to have half our users start first with the traditional interface and half to start first with the faceted interface. This way we can test for differences resulting from learning. The above tasks would allow us to measure several pieces of evidence to support or reject our hypotheses. For tasks 1 and 3, we would measure the number of formats correctly identified by users compared with the number found by an expert searcher. For tasks 2 and 4, we would compare the number of items correctly identified with the total items found in each category by an expert searcher. We could also time the user to determine which interface helped them work more quickly. In addition to measuring the number of formats identified and the number of items identified in each format, we would be able to measure the time it takes users to identify the number of formats and the number of items in each format. To measure user satisfaction, we would ask participants to complete the System Usability Scale (SUS) after each interface and, at the very end of the study, complete a questionnaire com- paring the two interfaces. Even just selecting the format facet, we would have plenty to investigate. Other hypotheses and tasks could be developed for other facet types, such as time period or publication date, or facets related to the responsible par- ties, such as author or director: Hypothesis: Users can find more materials written in a certain time period using the faceted interface. Task: Find ten items of any type (books, journals, mov- ies) written in the 1950s that you think would have information about television advertising. Hypothesis: Users can find movies directed by a spe- cific person more quickly using the faceted interface. Task: In the next two minutes, find as many movies as you can that were directed by Orson Welles. For the first task above, an expert searcher could complete the same task, and their time could be used as a point of comparison. For the second, the total number of movies in the library catalog that were directed by Welles is an objective quantity. In both cases, one could compare the user’s performance on the two interfaces. ■■ Conclusion Reviewing user studies about faceted browsing revealed empirical evidence that faceted browsing improves user performance. Yet this evidence does not necessarily point directly to user success in faceted library catalogs, which have much more complex databases than those used in 66 iNFormAtioN tecHNoloGY ANd liBrAries | JuNe 2010 53. Uddin and Janecek, “Performance and Usability Testing”; Zhang and Marchionini, Evaluation and Evolution; Hao Chen and Susan Dumais, Bringing Order to the Web: Automatically Categoriz- ing Search Results (New York: ACM, 2000): 145–52. 54. Uddin and Janecek, “Performance and Usability Testing.” 55. Ibid.; Pratt, Hearst, and Fagan, “A Knowledge-Based Approach”; Hsinchun Chen et al., “Internet Browsing and Searching: User Evaluations of Category Map and Concept Space Techniques,” Journal of the American Society for Information Science 49, no. 7 (1998): 582–603. 56. Vanda Broughton, “The Need for Faceted Classification as the Basis of All Methods of Information Retrieval,” Aslib Proceedings 58, no. 1/2 (2006): 49–71; Pratt, Hearst, and Fagan, “A Knowledge-Based Approach,” 80–85.; Chen et al., “Internet Browsing and Searching,” 582–603; Yee et al., “Faceted Metadata for Image Search and Browsing”; English et al., “Flexible Search and Navigation using Faceted Metadata.” 57. Uddin and Janecek, “Performance and Usability Testing”; Zhang and Marchionini, Evaluation and Evolution; Hideo Joho and Joemon M. Jose, Slicing and Dicing the Information Space Using Local Contexts (New York: ACM, 2006): 66–74.; Yee et al., “Faceted Metadata for Image Search and Browsing.” 58. Yee et al., “Faceted Metadata for Image Search and Brows- ing”; Chen and Dumais, Bringing Order to the Web. 59. Capra et al., “Effects of Structure and Interaction Style.” 60. Yee et al., “Faceted Metadata for Image Search and Brows- ing”; Capra et al., “Effects of Structure and Interaction Style”; Zhang and Marchionini, Evaluation and Evolution. 61. English et al., “Flexible Search and Navigation,” 7. 62. Yee et al., “Faceted Metadata for Image Search and Brows- ing,” 7. 63. La Barre, “Faceted Navigation and Browsing,” 85. 64. Zhang and Marchionini, Evaluation and Evolution, 183. 65. English et al., “Flexible Search and Navigation.” 66. Ibid., 6. 67. Zhang and Marchionini, Evaluation and Evolution. 68. Capra et al., “Effects of Structure and Interaction Style.” 69. Pratt, Hearst, and Fagan, “A Knowledge-Based Approach.” 70. English et al., “Flexible Search and Navigation.” 71. Bauer, “Yale University Library VuFind Test—Under- graduates.” 72. Jakob Nielsen, “Quantitative Studies: How Many Users to Test?” online posting, Alertbox, June 26, 2006 http://www.useit .com/alertbox/quantitative_testing.html (accessed Apr. 7, 2010). 73. Pratt, Hearst, and Fagan, “A Knowledge-Based Approach.” 74. Tod A. Olson used graduate students currently writing their dissertations. Olson, “Utility of a Faceted Catalog for Schol- arly Research,” Library Hi Tech 25, no. 4 (2007): 550–61. 75. La Barre, “Faceted Navigation and Browsing.” 76. Capra et al., “Effects of Structure and Interaction Style.” 27. Junliang Zhang and Gary Marchionini, Evaluation and Evolution of a Browse and Search Interface: Relation Browser++ (Atlanta, Ga.: Digital Government Society of North America, 2005): 179–88. 28. Ibid., 183. 29. Marti A. Hearst, “UIs for Faceted Navigation: Recent Advances and Remaining Open Problems,” 2008, http://people. ischool.berkeley.edu/~hearst/papers/hcir08.pdf (accessed Apr. 27, 2010). 30. Tamar Sadeh, “User Experience in the Library: A Case Study,” New Library World 109, no. 1/2 (Jan. 2008): 7–24. 31. Ibid., 22. 32. Jerilyn Veldof, e-mail from University of Minnesota Usability Services Lab, 2008. 33. Tod A. Olson, “Utility of a Faceted Catalog for Scholarly Research,” Library Hi Tech 25, no. 4 (2007): 550–61. 34. Ibid., 555. 35. Kathleen Bauer, “Yale University Library VuFind Test— Undergraduates,” May 20, 2008, http://www.library.yale.edu/ usability/studies/summary_undergraduate.doc (accessed Apr. 27, 2010); Kathleen Bauer and Alice Peterson-Hart, “Usability Test of VuFind as a Subject-Based Display of Ebooks,” Aug. 21, 2008, http://www.library.yale.edu/usability/studies/summary _medical.doc (accessed Apr. 27, 2010). 36. Bauer and Peterson-Hart, “Usability Test of VuFind as a Subject-Based Display of Ebooks,” 1. 37. Ibid., 2. 38. Ibid., 3. 39. Ibid. 40. Ibid., 4. 41. Ibid. 42. Ibid., 5. 43. Ibid., 8. 44. Kristin Antelman, Andrew K. Pace, and Emily Lynema, “Toward a Twenty-First Century Library Catalog,” Information Technology & Libraries 25, no. 3 (2006): 128–39. 45. Ibid., 139. 46. Ibid., 133. 47. Ibid., 135. 48. Ibid., 136. 49. Jennifer L. Ward, Steve Shadle, and Pam Mofield, “User Experience, Feedback, and Testing,” Library Technology Reports 44, no. 6 (Aug. 2008): 22. 50. English et al., “Flexible Search and Navigation.” 51. Peter Ingwersen and Irene Wormell, “Ranganathan in the Perspective of Advanced Information Retrieval,” Libri 42 (1992): 184–201; Winfried Godert, “Facet Classification in Online Retrieval,” International Classification 18, no. 2 (1991): 98–109.; W. Godert, “Klassificationssysteme und Online-Katalog [Classifica- tion Systems and the Online Catalogue],” Zeitschrift für Biblio- thekswesen und Bibliographie 34, no. 3 (1987): 185–95. 52. Yee et al., “Faceted Metadata for Image Search and Brows- ing”; English et al., “Flexible Search and Navigation.” 3145 ---- reduciNG PsYcHoloGicAl resistANce to diGitAl rePositories | QuiNN 67 and MIT mandates, and other mandates such as the one instituted at Stanford’s School of Education, have come to pass, and the Registry of Open Access Repository Material Archiving Policies (ROARMAP) lists more than 120 mandates around the world that now exist.3 While it is too early to tell whether these developments will be successful in getting faculty to deposit their work in digital repositories, they at least establish a precedent that other institutions may follow. How many institutions fol- low and how effective the mandates will be once enacted remains to be seen. Will all colleges and universities, or even a majority, adopt mandates that require faculty to deposit their work in repositories? What of those that do not? Even if most institutions are successful in instituting mandates, will they be sufficient to obtain faculty cooperation? For those institutions that do not adopt mandates, how are they going to persuade faculty to participate in self-archiving, or even in some variation—such as having surrogates (librarians, staff, or graduate assistants) archive the work of faculty? Are mandates the only way to ensure faculty cooperation and compliance, or are mandates even neces- sarily the best way? To begin to adequately address the problem of user resistance to digital repositories, it might help to first gain some insight into the psychology of resistance. The existing literature on user behavior with regard to digital repositories devotes scant attention to the psy- chology of resistance. In an article entitled “Institutional Repositories: Partnering with Faculty to Enhance Scholarly Communication,” Johnson discusses the inertia of the traditional publishing paradigm. He notes that this inertia is most evident in academic faculty. This would suggest that the problem of eliciting user cooperation is primarily motivational and that the problem is more one of indifference than active resistance.4 Heterick, in his article “Faculty Attitudes toward Electronic Resources,” suggests that one reason faculty may be resistant to digital repositories is because they do not fully trust them. In response to a survey he conducted, 48 percent of faculty felt that libraries should maintain paper archives.5 The implication is that digital reposi- tories and archives may never completely replace hard copies in the minds of scholars. In “Understanding Faculty to Improve Content Recruitment for Institutional Repositories,” Foster and Gibbons point out that faculty complain of having too much work already. They resent any additional work that contributing to a digital repository might entail. Thus the authors echo Johnson in suggesting that faculty resistance The potential value of digital repositories is dependent on the cooperation of scholars to deposit their work. Although many researchers have been resistant to sub- mitting their work, the literature on digital repositories contains very little research on the psychology of resis- tance. This article looks at the psychological literature on resistance and explores what its implications might be for reducing the resistance of scholars to submitting their work to digital repositories. Psychologists have devised many potentially useful strategies for reducing resistance that might be used to address the problem; this article examines these strategies and how they might be applied. O bserving the development and growth of digital repositories in recent years has been a bit like rid- ing an emotional roller coaster. Even the definition of what constitutes a repository may not be the subject of complete agreement, but for the purposes of this study, a repository is defined as an online database of digital or digitized scholarly works constructed for the purpose of preserving and disseminating scholarly research. The initial enthusiasm expressed by librarians and advocates of open access toward the potential of repositories to make significant amounts of scholarly research avail- able to anyone with Internet access gradually gave way to a more somber appraisal of the prospects of getting faculty and researchers to deposit their work. In August 2007, Bailey posted an entry to his Digital Koans blog titled “Institutional Repositories: DOA?” in which he noted that building digital repository collections would be a long, arduous, and costly process.1 The success of repositories, in his view, will be a function not so much of technical considerations as of attitudinal ones. Faculty remain unconvinced that repositories are important, and there is a critical need for outreach programs that point to repositories as an important step in solving the crisis in scholarly communication. Salo elaborated on Bailey’s post with “Yes, IRs Are Broken. Let’s Talk About It,” on her own blog, Caveat Lector. Salo points out that institutional repositories have not fulfilled their early promise of attracting a large num- ber of faculty who are willing to submit their work. She criticizes repositories for monopolizing the time of library faculty and staff, and she states her belief that repositories will not work without deposit mandates, but that man- dates are impractical.2 Subsequent events in the world of scholarly com- munication might suggest that mandates may be less impractical than Salo originally thought. Since her post, the National Institutes of Health mandate, the Harvard Brian Quinn (brian.quinn@ttu.edu) is Social Sciences Librarian, Texas Tech University Libraries, Lubbock. Brian Quinn Reducing Psychological Resistance to Digital Repositories 68 iNFormAtioN tecHNoloGY ANd liBrAries | JuNe 2010 whether or not this was actually the case.11 This study also suggests that a combination of both cognitive and affective processes feed faculty resistance to digital repositories. It can be seen from the preceding review of the lit- erature that several factors have been identified as being possible sources of user resistance to digital repositories. Yet the authors offer little in the way of strategies for addressing this resistance other than to suggest work- around solutions such as having nonscholars (e.g., librarians, graduate students, or clerical staff) serve as proxy for faculty and deposit their work for them, or to suggest that institutions mandate that faculty deposit their work. Similarly, although numerous arguments have been made in favor of digital repositories and open access, they do not directly address the resistance issue.12 In contrast, psychologists have studied user resistance extensively and accumulated a body of research that may suggest ways to reduce resistance rather than try to circumvent it. It may be helpful to examine some of these studies to see what insights they might offer to help address the problem of user resistance. It should be pointed out that resistance as a topic has been addressed in the business and organizational literature, but has generally been approached from the standpoint of management and organizational change.13 This study has chosen to focus primarily on the psychol- ogy of resistance because many repositories are situated in a university setting. Unlike employees of a corporation, faculty members typically have a greater degree of auton- omy and latitude in deciding whether to accommodate new work processes and procedures into their existing routines, and the locus of change will therefore be more at an individual level. ■■ The psychology of user resistance Psychologists define resistance as a preexisting state or attitude in which the user is motivated to counter any attempts at persuasion. This motivation may occur on a cognitive, affective, or behavioral level. Psychologists thus distinguish between a state of not being persuaded and one in which there is actual motivation to not com- ply. The source of the motivation is usually an affective state, such as anxiety or ambivalence, which itself may result from cognitive problems, such as misunderstand- ing, ignorance, or confusion.14 It is interesting to note that psychologists have long viewed inertia as one form of resistance, suggesting paradoxically that a person can be motivated to inaction.15 Resistance may also manifest itself in more subtle forms that shade into indifference, suspicion of new work processes or technologies, and contentment with the status quo. may be attributed at least in part to motivation.6 In another article published a few months later, Foster and Gibbons suggest that the main reason faculty have been slow to deposit their work in digital repositories is a cog- nitive one: Faculty have not understood how they would benefit by doing so. The authors also mention that users may feel anxiety when executing the sequence of techni- cal steps needed to deposit their work, and that they may also worry about possible copyright infringement.7 The psychology of resistance may thus manifest itself in both cognitive and affective ways. Harley and her colleagues talk about faculty not perceiving any reward for depositing their work in their article “The Influence of Academic Values on Scholarly Publication and Communication Practices.” This percep- tion results in reduced drive to participate. Anxiety is another factor contributing to resistance: Faculty fear that their work may be vulnerable to plagiarism in an open- access environment.8 In “Towards User Responsive Institutional Repositories: a Case Study,” Devakos suggests that one source of user resistance is cognitive in origin. Scholars do not submit their work frequently enough to be able to navigate the interface from memory, so they must reinitiate the learning process each time they submit their work. The same is true for entering metadata for their work.9 Their sense of control may also be threatened by any limitations that may be imposed on substituting later iterations of their work for earlier versions. Davis and Connolly point to several sources of con- fusion, uncertainty, and anxiety among faculty in their article “Institutional Repositories: Evaluating the Reasons for Non-use of Cornell University’s Installation of DSpace.” Cognitive problems arise from having to learn new technology to deposit work and not knowing copy- right details well enough to know whether publishers would permit the deposit of research prior to publica- tion. Faculty wonder whether this might jeopardize their chances of acceptance by important journals whose edi- tors might view deposit as a form of prior publication that would disqualify them from consideration. There is also fear that the complex structure of a large repository may actually make a scholar’s work more difficult to find; fac- ulty may not understand that repositories are not isolated institutional entities but are usually searchable by major search engines like Google.10 Kim also identifies anxiety about plagiarism and confusion about copyright as being sources of faculty resistance in the article “Motivating and Impeding Factors Affecting Faculty Contribution to Institutional Repositories.” Kim found that plagiarism anxiety made some faculty only willing to deposit already-published work and that prepublication material was considered too risky. Faculty with no self-archiving experience also felt that many publishers do not allow self-archiving, reduciNG PsYcHoloGicAl resistANce to diGitAl rePositories | QuiNN 69 more open to information that challenges their beliefs and attitudes and are more open to suggestion.18 Thus before beginning a discussion of why users should deposit their research in repositories, it might help to first affirm the users’ self-concept. This could be done, for example, by reminding them of how unbiased they are in their work or how important it is in their work to be open to new ideas and new approaches, or how successful they have been in their work as scholars. The affirmation should be subtle and not directly related to the repository situation, but it should remind them that they are open- minded individuals who are not bound by tradition and that part of their success is attributable to their flexibility and adaptability. Once the users have been affirmed, librar- ians can then lead into a discussion of the importance of submitting scholarly research to repositories. Self-generated affirmations may be even more effec- tive. For example, another way to affirm the self would be to ask users to recall instances in which they successfully took a new approach or otherwise broke new ground or were innovative in some way. This could serve as a segue into a discussion of the repository as one more oppor- tunity to be innovative. Once the self-concept has been boosted, the threatening quality of the message will be perceived as less disturbing and will be more likely to receive consideration. A related strategy that psychologists employ to reduce resistance involves casting the user in the role of “expert.” This is especially easy to do with scholars because they are experts in their fields. Casting the user in the role of expert can deactivate resistance by putting that person in the persuasive role, which creates a form of role reversal.19 Rather than the librarian being seen as the persuader, the scholar is placed in that role. By saying to the scholar, “You are the expert in the area of communicating your research to an audience, so you would know better why the digital repository is an alternative that deserves con- sideration once you understand how it works and how it may benefit you,” you are empowering the user. Casting the user as an expert imparts a sense of control to the user. It helps to disable resistance by placing the user in a posi- tion of being predisposed to agree to the role he or she is being cast in, which also makes the user more prone to agree with the idea of using a digital repository. Priming and imaging One important discovery that psychologists have made that has some bearing on user resistance is that even subtle manipulations can have a significant effect on one’s judgments and actions. In an interesting experiment, psy- chologists told a group of students that they were to read an online newspaper, ostensibly to evaluate its design and assess how easy it was to read. Half of them read an editorial discussing a public opinion survey of youth ■■ Negative and positive strategies for reducing resistance Just as the definition of resistance can be paradoxical, so too may be some of the strategies that psychologists use to address it. Perhaps the most basic example is to coun- ter resistance by acknowledging it. When scholars are presented with a message that overtly states that digital repositories are beneficial and desirable, it may simultane- ously generate a covert reaction in the form of resistance. Rather than simply anticipating this and attempting to ignore it, digital repository advocates might be more persuasive if they acknowledge to scholars that there will likely be resistance, mention some possible reasons (e.g., plagiarism or copyright concerns), and immediately intro- duce some counterrationales to address those reasons.16 Psychologists have found that being up front and forthcoming can reduce resistance, particularly with regard to the downside of digital repositories. They have learned that it can be advantageous to preemptively reveal negative information about something so that it can be downplayed or discounted. Thus talking about the weak- nesses or shortcomings of digital repositories as early as possible in an interaction may have the effect of making these problems seem less important and weakening user resistance. Not only does revealing negative information impart a sense of honesty and credibility to the user, but psychologists have found that people feel closer to people who reveal personal information.17 A librarian could thus describe some of his or her own frustrations in using repositories as an effective way of establishing rapport with resistant users. The unexpected approach of bring- ing up the less desirable aspects of repositories—whether this refers to the technological steps that must be learned to submit one’s work or the fact that depositing one’s work in a repository is not a guarantee that it will be highly cited—can be disarming to the resistant user. This is particularly true of more resistant users who may have been expecting a strong hard-sell approach on the part of librarians. When suddenly faced with a more candid appeal the user may be thrown off balance psychologi- cally, leaving him or her more vulnerable to information that is the opposite of what was anticipated and to pos- sibly viewing that information in a more positive light. If one way to disarm a user is to begin by discuss- ing the negatives, a seemingly opposite approach that psychologists take is to reinforce the user’s sense of self. Psychologists believe that one source of resistance stems from when a user’s self-concept—which the user tries to protect from any source of undesired change—has been threatened in one way or another. A stable self-concept is necessary for the user to maintain a sense of order and predictability. Reinforcing the self-concept of the user should therefore make the user less likely to resist depos- iting work in a digital repository. Self-affirmed users are 70 iNFormAtioN tecHNoloGY ANd liBrAries | JuNe 2010 or even possibly collaborating on research. Their imagina- tions could be further stimulated by asking them to think of what it would be like to have their work still actively preserved and available to their successors a century from now. Using the imagining strategy could potentially be significantly more effective in attenuating resistance than presenting arguments based on dry facts. identification and liking Conscious processes like imagining are not the only psy- chological means of reducing the resistance of users to digital repositories. Unconscious processes can also be helpful. One example of such a process is what psycholo- gists refer to as the “liking heuristic.” This refers to the tendency of users to employ a rule-of-thumb method to decide whether to comply with requests from persons. This tendency results from users constantly being inun- dated with requests. Consequently, they need to simplify and streamline the decision-making process that they use to decide whether to cooperate with a request. The liking heuristic holds that users are more likely to help some- one they might otherwise not help if they unconsciously identify with the person. At an unconscious level, the user may think that a person acts like them and dresses like them, and therefore the user identifies with that person and likes them enough to comply with their request. In one experiment that psychologists conducted to see if people are more likely to comply with requests from people that they identify with, female undergraduates were informed that they would be participating in a study of first impressions. The subjects were instructed that they and a person in another room would each learn a little about one another without meeting each other. Each sub- ject was then given a list of fifty adjectives and was asked to select the twenty that were most characteristic of them- selves. The experimenter then told the participants that they would get to see each other’s lists. The experimenter took the subject’s list and then returned a short time later with what supposedly was the other participant’s list, but was actually a list that the experimenter had filled out to indicate that either the subject had much in common with the other participant’s personality (seventeen of twenty matches), some shared attributes (ten of twenty matches), or relatively few characteristics in common (three of twenty matches). The subject was then asked to exam- ine the list and fill out a survey that probed their initial impressions of the other participant, including how much they liked them. At the end of the experiment, the two subjects were brought together and given credit for par- ticipating. The experimenter soon left the room and the confederate participant asked the other participant if she would read and critically evaluate an eight-page paper for an English class. The results of the experiment indi- cated that the more the participant thought she shared in consumer patterns that highlighted functional needs, and the other half read a similar editorial focusing on hedo- nistic needs. The students next viewed an ad for a new brand of shampoo that featured either a strong or a weak argument for the product. The results of the experiment indicated that students who read the functional editorial and were then subsequently exposed to the strong argu- ment for the shampoo (a functional product) had a much more favorable impression of the brand than students who had received the mismatched prime.20 While it may seem that the editorial and the shampoo were unrelated, psychologists found that the subjects engaged in a process of elaborating the editorial, which then predisposed them to favor the shampoo. The presence of elaboration, which is a precursor to the development of attitudes, suggests that librarians could reduce users’ resistance to digital repositories by first involving them in some form of priming activity immediately prior to any attempt to persuade them. For example, asking faculty to read a brief case study of a scholar who has benefited from involvement in open-access activity might serve as an effective prime. Another example might be to listen briefly to a speaker summarizing the individual, disciplinary, and societal benefits of sharing one’s research with colleagues. Interventions like these should help mitigate any predispo- sition toward resistance on the part of users. Imagining is a strategy related to priming that psy- chologists have found to be effective in reducing resistance. Taking their cue from insurance salesmen—who are trained to get clients to actively imagine what it would be like to lose their home or be in an accident—a group of psycholo- gists conducted an experiment in which they divided a sample of homeowners who were considering the purchase of cable TV into two groups. One group was presented with the benefits of cable in a straightforward, informative way that described various features. The other group was asked to imagine themselves enjoying the benefits and all the possible channels and shows that they might experi- ence and how entertaining it might be. The psychologists then administered a questionnaire. The results indicated that those participants who were asked to imagine the benefits of cable were much more likely to want cable TV and to subscribe to it than were those who were only given information about cable TV.21 In other words, imagining resulted in more positive attitudes and beliefs. This study suggests that librarians attempting to reduce resistance among users of digital repositories may need to do more than merely inform or describe to them the advan- tages of depositing their work. They may need to ask users to imagine in vivid detail what it would be like to receive periodic reports indicating that their work had been down- loaded dozens or even hundreds of times. Librarians could ask them to imagine receiving e-mail or calls from col- leagues indicating that they had accessed their work in the repository and were interested in learning more about it, reduciNG PsYcHoloGicAl resistANce to diGitAl rePositories | QuiNN 71 students typically overestimate the amount of drinking that their peers engage in at parties. These inaccurate nor- mative beliefs act as a negative influence, causing them to imbibe more because they believe that is what their peers are doing. By informing students that almost three- quarters of their peers have less than three drinks at social gatherings, psychologists have had some success in reduc- ing excessive drinking behavior by students.23 The power of normative messages is illustrated by a recent experiment conducted by a group of psycholo- gists who created a series of five cards to encourage hotel guests to reuse their towels during their stay. The psychologists hypothesized that by appealing to social norms, they could increase compliance rates. To test their hypothesis, the researchers used a different conceptual appeal for each of the five cards. One card appealed to environmental concerns (“Help Save the Environment”), another to environmental cooperation (“Partner with Us to Save the Environment”), a third card appealed to the advantage to the hotel (“Help the Hotel Save Energy”), a fourth card targeted future generations (“Help Save Resources for Future Generations”), and a final card appealed to guests by making reference to a descrip- tive norm of the situation (“Join Your Fellow Citizens in Helping to Save the Environment”). The results of the study indicated that the card that mentioned the benefit to the hotel was least effective in getting guests to reuse their towels, and the card that was most effective was the one that mentioned that descriptive norm.24 This research suggests that if users who are resistant to submitting their work to digital repositories were informed that a larger percentage of their peers were depositing work than they realized, resistance may be reduced. This might prove to be particularly true if they learned that prominent or influential scholars were engaged in popu- lating repositories with their work. This would create a social-norms effect that would help legitimize repositories to other faculty and help them to perceive the submission process as normal and desirable. The idea that accom- plished researchers are submitting materials and reaping the benefits might prove very attractive to less experienced and less well-regarded faculty. Psychologists have a considerable body of evidence in the area of social modeling that suggests that people will imitate the behavior of others in social situations because that behavior provides an implicit guideline of what to do in a similar situation. A related finding is that the more influential people are, the more likely it is for others to emulate their actions. This is even more probable for high- status individuals who are skilled and attractive and who are capable of communicating what needs to be done to potential followers.25 Social modeling addresses both the cognitive dimension of how resistant users should behave and also the affective dimension by offering models that serve as a source of motivation to resistant users to change common with the confederate, the more she liked her. The more she liked the confederate and experienced a percep- tion of consensus, the more likely she was to comply with her request to critique the paper.22 Thus, when trying to overcome the resistance of users to depositing their work in a digital repository, it might make sense to consider who it is that is making the request. Universities sometimes host scholarly communi- cation symposia that are not only aimed at getting faculty interested in open-access issues, but to urge them to sub- mit their work to the institution’s repositories. Frequently, speakers at these symposia consist of academic administra- tors, members of scholarly communication or open-access advocacy organizations, or individuals in the library field. The research conducted by psychologists, however, sug- gests that appeals to scholars and researchers would be more effective if they were made by other scholars and those who are actively engaged in research. Faculty are much more likely to identify with and cooperate with requests from their own tribe, as it were, and efforts need to be concentrated on getting faculty who are involved in and understand the value of repositories to articulate this to their colleagues. Researchers who can personally testify to the benefits of depositing their work are most likely to be effective at convincing other researchers of the value of doing likewise and will be more effective at reducing resis- tance. Librarians need to recognize who their potentially most effective spokespersons and advocates are, which the psychological research seems to suggest is faculty talking to other faculty. Perceived consensus and social modeling The processes of faculty identification with peers and perceived consensus mentioned above can be further enhanced by informing researchers that other scholars are submitting their work, rather than merely telling research- ers why they should submit their work. Information about the practices of others may help change beliefs because of the need to identify with other in-group members. This is particularly true of faculty, who are prone to making con- tinuous comparisons with their peers at other institutions and who are highly competitive by nature. Once they are informed of the career advantages of depositing their work (in terms of professional visibility, collaboration opportuni- ties, etc.), and they are informed that other researchers have these advantages, this then becomes an impetus for them to submit their work to keep up with their peers and stay competitive. A perception of consensus is thus fostered—a feeling that if one’s peers are already depositing their work, this is a practice that one can more easily agree to. Psychologists have leveraged the power of identifi- cation by using social-norms research to inform people about the reality of what constitutes normative behavior as opposed to people’s perceptions of it. For example, college 72 iNFormAtioN tecHNoloGY ANd liBrAries | JuNe 2010 highly resistant users that may be unwilling to submit their work to a repository. Rather than trying to prepare a strong argument based on reason and logic, psychologists believe that using a narrative approach may be more effective. This means conveying the facts about open access and digital repositories in the form of a story. Stories are less rhetori- cal and tend not to be viewed by listeners as attempts at persuasion. The intent of the communicator and the coun- terresistant message are not as overt, and the intent of the message might not be obvious until it has already had a chance to influence the listener. A well-crafted narrative may be able to get under the radar of the listener before the listener has a chance to react defensively and revert to a mode of resistance. In a narrative, beliefs are rarely stated overtly but are implied, and implied beliefs are more diffi- cult to refute than overtly stated beliefs. Listening to a story and wondering how it will turn out tends to use up much of the cognitive attentional capacity that might otherwise be devoted to counterarguing, which is another reason why using a narrative approach may be particularly effec- tive with users who are strongly resistant. The longer and more subtle nature of narratives may also make them less a target of resistance than more direct arguments.28 Using a narrative approach, the case for submitting work to a repository might be presented not as a collection of dry facts or statistics, but rather as a story. The pro- tagonists are the researchers, and their struggle is to obtain recognition for their work and to advance scholarship by providing maximum access to the greatest audience of scholars and to obtain as much access as possible to the work of their peers so that they can build on it. The pro- tagonists are thwarted in their attempts to achieve their ends by avaricious publishers who obtain the work of researchers for free and then sell it back to them in the form of journal and database subscriptions and books for exor- bitant prices. These prices far exceed the rate of inflation or the budgets of universities to pay for them. The publishers engage in a series of mergers and acquisitions that swal- low up small publishing firms and result in the scholarly publishing enterprise being controlled by a few giant firms that offer unreasonable terms to users and make unreason- able demands when negotiating with them. Presented in this dramatic way, the significance of scholar participation in digital repositories becomes magnified to an extent that it becomes more difficult to resist what may almost seem like an epic struggle between good and evil. And while this may be a greatly oversimplified example, it nonetheless provides a sense of the potential power of using a narrative approach as a technique to reduce resistance. Introducing a time element into the attempt to per- suade users to deposit their work in digital repositories can play an important role in reducing resistance. Given that faculty are highly competitive, introducing the idea not only that other faculty are submitting their work but that they are already benefiting as a result makes the their behavior in the desired direction. redefinition, consistency, and depersonalization Another strategy that psychologists use to reduce resis- tance among users is to change the definition of the situation. Resistant users see the process of submitting their research to the repository as an imposition at best. In their view, the last thing that they need is another obliga- tion or responsibility to burden their already busy lives. Psychologists have learned that reframing a situation can reduce resistance by encouraging the user to look at the same phenomenon in a different way. In the current situ- ation, resistant users should be informed that depositing their work in a digital repository is not a burden but a way to raise their professional profile as researchers, to expose their work to a wider audience, and to heighten their visibility among not only their peers but a much larger potential audience that would be able to encounter their work on the Web. Seen in this way, the additional work of submission is less of a distraction and more of a career investment. Moreover, this approach leverages a related psycho- logical concept that can be useful in helping to dissolve resistance. Psychologists understand that inconsistency has a negative effect on self-esteem, so persuading users to believe that submitting their work to a digital repository is consistent with their past behavior can be motivating.26 The point needs to be emphasized with researchers that the act of submitting their work to a digital repository is not something strange and radical, but is consistent with prior actions intended to publicize and promote their work. A digital repository can be seen as analogous to a preprint, book, journal, or other tangible and familiar vehicles that faculty have used countless times to send their work out into the world. While the medium might have changed, the intention and the goal are the same. Reframing the act of depositing as “old wine in new bottles” may help to undermine resistance. In approaching highly resistant individuals, psycholo- gists have discovered that it is essential to depersonalize any appeal to change their behavior. Instead of saying, “You should reduce your caloric intake,” it is better to say, “It is important for people to reduce their caloric intake.” This helps to deflect and reduce the directive, judgmental, and prescriptive quality of the request, thus making it less likely to provoke resistance.27 Suggestion can be much less threatening than prescription among users who may be suspicious and mistrusting. Reverting to a third-per- son level of appeal may allow the message to get through without it being immediately rejected by the user. Narrative, timing, and anticipation Psychologists recommend another strategy to help defuse reduciNG PsYcHoloGicAl resistANce to diGitAl rePositories | QuiNN 73 technological platforms, and so on. This could be fol- lowed by a reminder to users that it is their choice—it is entirely up to them. This reminder that users have the freedom of choice may help to further counter any resis- tance generated as a result of instructions or inducements to anticipate regret. Indeed, psychologists have found that reinstating a choice that was previously threatened can result in greater compliance than if the threat had never been introduced.32 Offering users the freedom to choose between alterna- tives tends to make them more likely to comply. This is because having a choice enables users to both accept and resist the request rather than simply focus all their resis- tance on a single alternative. When presented with options, the user is able to satisfy the urge to resist by rejecting one option but is simultaneously motivated to accept another option; the user is aware that there are benefits to comply- ing and wants to take advantage of them but also wants to save face and not give in. By being offered several alterna- tives that nonetheless all commit to a similar outcome, the user is able to resist and accept at the same time.33 For example, one alternative option to self-archiving might be to present the faculty member with the option of an author- pays publishing model. The choice of alternatives allows the faculty member to be selective and discerning so that a sense of satisfaction is derived from the ability to resist by rejecting one alternative. At the same time, the librar- ian is able to gain compliance because one of the other alternatives that commits the faculty member to depositing research is accepted. options, comparisons, increments, and guarantees In addition to offering options, another way to erode user resistance to digital repositories is to use a comparative strategy. One technique is to first make a large request, such as “we would like you to submit all the articles that you have published in the last decade to the repository,” and then follow this with a more modest request, such as “we would appreciate it if you would please deposit all the articles you have published in the last year.” The origi- nal request becomes an “anchor” or point of reference in the mind of the user against which the subsequent request is then evaluated. Setting a high anchor lessens user resis- tance by changing the user’s point of comparison of the second request from nothing (not depositing any work in the repository) to a higher value (submitting a decade of work). In this way, a high reference anchor is established for the second request, which makes it seem more reason- able in the newly created context of the higher value.34 The user is thus more likely to comply with the second request when it is framed in this way. Using this comparative approach may also work because it creates a feeling of reciprocity in the user. When proposition much more salient. It not only suggests that submitting work is a process that results in a desirable outcome, but that the earlier one’s work is submitted, the more recognition will accrue and the more rapidly one’s career will advance.29 Faculty may feel compelled to submit their work in an effort to remain competitive with their colleagues. One resource that may be par- ticularly helpful for working with skeptical faculty who want substantiation about the effect of self-archiving on scholarly impact is a bibliography created by the Open Citation Project titled, “The Effect of Open Access and Downloads (Hits) on Citation Impact: A Bibliography of Studies.”30 It provides substantial documentation of the effect that open access has on scholarly visibility. An additional stimulus might be introduced in conjunction with the time element in the form of a download report. Showing faculty how downloads accumulate over time is analogous to arguments that investment counselors use showing how interest on investments accrues and compounds over time. This investment analogy creates a condition in which hesitating to submit their work results in faculty potentially losing recognition and compromis- ing their career advancement. An interesting related finding by psychologists sug- gests that an effective way to reduce user resistance is to have users think about the future consequences of complying or not complying. In particular, if users are asked to anticipate the amount of future regret they might experience for making a poor choice, this can significantly reduce the amount of resistance to complying with a request. Normally, users tend not to ruminate about the possibility of future disappointment in making a decision. If users are made to anticipate future regret, however, they will act in the present to try to minimize it. Studies conducted by psychologists show that when users are asked to anticipate the amount of future regret that they might experience for choosing to comply with a request and having it turn out adversely versus choosing to not comply and having it turn out adversely, they consis- tently indicate that they would feel more regret if they did not comply and experienced negative consequences as a result.31 In an effort to minimize this anticipated regret, they will then be more prone to comply. Based on this research, one strategy to reduce user resistance to digital repositories would be to get users to think about the future, specifically about future regret resulting from not cooperating with the request to sub- mit their work. If they feel that they might experience more regret in not cooperating than in cooperating, they might then be more inclined to cooperate. Getting users to think about the future could be done by asking users to imagine various scenarios involving the negative out- comes of not complying, such as lost opportunities for recognition, a lack of citation by peers, lost invitations to collaborate, an inability to migrate one’s work to future 74 iNFormAtioN tecHNoloGY ANd liBrAries | JuNe 2010 submit their work. Mandates rely on authority rather than persuasion to accomplish this and, as such, may represent a less-than-optimal solution to reducing user resistance. Mandates represent a failure to arrive at a meeting of the minds of advocates of open access, such as librarians, and the rest of the intellectual community. Understanding the psychology of resistance is an important prerequisite to any effort to reduce it. Psychologists have assembled a significant body of research on resistance and how to address it. Some of the strategies that the research suggests may be effective, such as discussing resistance itself with users and talk- ing about the negative effects of repositories, may seem counterintuitive and have probably not been widely used by librarians. Yet when other more conventional tech- niques have been tried with little or no success, it may make sense to experiment with some of these approaches. Particularly in the academy, where reason is supposed to prevail over authority, incorporating resistance psychol- ogy into a program aimed at soliciting faculty research seems an appropriate step before resorting to mandates. Most strategies that librarians have used in trying to persuade faculty to submit their work have been con- ventional. They are primarily of a cognitive nature and are variations on informing and educating faculty about how repositories work and why they are important. Researchers have an important affective dimension that needs to be addressed by these appeals, and the psy- chological research on resistance suggests that a strictly rational approach may not be sufficient. By incorporating some of the seemingly paradoxical and counterintuitive techniques discussed earlier, librarians may be able to penetrate the resistance of researchers and reach them at a deeper, less rational level. Ideally, a mixture of rational and less-conventional approaches might be combined to maximize effectiveness. Such a program may not elimi- nate resistance but could go a long way toward reducing it. Future studies that test the effectiveness of such pro- grams will hopefully be conducted to provide us with a better sense of how they work in real-world settings. References 1. Charles W. Bailey Jr., “Institutional Repositories: DOA?,” online posting, Digital Koans, Aug. 22, 2007, http://digital -scholarship.org/digitalkoans/2007/08/21/institutional -repositories-doa/ (accessed Apr. 21, 2010). 2. Dorothea Salo, “Yes, IRs Are Broken. Let’s Talk About It,” online posting, Caveat Lector, Sept. 5, 2007, http://cavlec. yarinareth.net/2007/09/05/yes-irs-are-broken-lets-talk-about -it/ (accessed Apr. 21, 2010). 3. EPrints Services, ROARMAP (Registry of Open Access Repository Material Archiving Policies) http://www.eprints .org/openaccess/policysignup/ (accessed July 28, 2009). 4. Richard K. Johnson, “Institutional Repositories: Partnering the requester scales down the request from the large one to a smaller one, it creates a sense of obligation on the part of the user to also make a concession by agreeing to the more modest request. The cultural expectation of reciprocity places the user in a situation in which they will comply with the lesser request to avoid feelings of guilt.35 For the most resistant users, breaking the request down into the smallest possible increment may prove helpful. By making the request seem more manageable, the user is encouraged to comply. Psychologists conducted an experi- ment to test whether minimizing a request would result in greater cooperation. They went door-to-door, soliciting contributions to the American Cancer Society, and received donations from 29 percent of households. They then made additional solicitations, this time asking, “Would you contribute? Even a penny will help!” Using this approach, donations increased to 50 percent. Even though the solici- tors only asked for a penny, the amounts of the donations were equal to that of the original request. By asking for “even a penny,” the solicitors made the request appear to be more modest and less of a target of resistance.36 Librarians might approach faculty by saying “if you could even submit one paper we would be grateful,” with the idea that once faculty make an initial submission they will be more inclined to submit more papers in the future. One final strategy that psychological research sug- gests may be effective in reducing resistance to digital repositories is to make sure that users understand that the decision to deposit their work is not irrevocable. With any new product, users have fears about what might hap- pen if they try it and they are not satisfied with it. Not knowing the consequences of making a decision that they may later regret fuels reluctance to become involved with it. Faculty need to be reassured that they can opt out of participating at any time and that the repository sponsors will guarantee this. This guarantee needs to be repeated and emphasized as much as possible in the solicitation process so that faculty are frequently reminded that they are entering into a decision that they can reverse if they so decide. Having this reassurance should make research- ers much less resistant to submitting their work, and the few faculty who may decide that they want to opt out are worth the reduction in resistance.37 The digital repository is a new phenomenon that faculty are unfamiliar with, and it is therefore important to create an atmosphere of trust. The guarantee will help win that trust. ■■ Conclusion The scholarly literature on digital repositories has given little attention to the psychology of resistance. Yet the ultimate success of digital repositories depends on over- coming the resistance of scholars and researchers to reduciNG PsYcHoloGicAl resistANce to diGitAl rePositories | QuiNN 75 20. Curtis P. Haugtvedt et al., “Consumer Psychology and Attitude Change,” in Knowles and Linn, Resistance and Persua- sion, 283–96. 21. Larry W. Gregory, Robert B. Cialdini, and Kathleen M. Carpenter, “Self-Relevant Scenarios as Mediators of Likelihood Estimates and Compliance: Does Imagining Make It So?” Journal of Personality & Social Psychology 43, no. 1 (1982): 89–99. 22. Jerry M. Burger, “Fleeting Attraction and Compliance with Requests,” in The Science of Social Influence: Advances and Future Progress, ed. Anthony R. Pratkanis (New York: Psychol- ogy Pr., 2007): 155–66. 23. John D. Clapp and Anita Lyn McDonald, “The Relation- ship of Perceptions of Alcohol Promotion and Peer Drinking Norms to Alcohol Problems Reported by College Students,” Journal of College Student Development 41, no. 1 (2000): 19–26. 24. Noah J. Goldstein and Robert B. Cialdini, “Using Social Norms as a Lever of Social Influence,” in The Science of Social Influence: Advances and Future Progress, ed. Anthony R. Pratkanis (New York: Psychology Pr., 2007): 167–90. 25. Dale H. Schunk, “Social-Self Interaction and Achievement Behavior,” Educational Psychologist 34, no. 4 (1999): 219–27. 26. Rosanna E. Guadagno et al., “When Saying Yes Leads to Saying No: Preference for Consistency and the Reverse Foot-in- the-Door Effect,” Personality & Social Psychology Bulletin 27, no. 7 (2001): 859–67. 27. Mary Jiang Bresnahan et al., “Personal and Cultural Dif- ferences in Responding to Criticism in Three Countries,” Asian Journal of Social Psychology 5, no. 2 (2002): 93–105. 28. Melanie C. Green and Timothy C. Brock, “In the Mind’s Eye: Transportation-Imagery Model of Narrative Persuasion,” in Narrative Impact: Social and Cultural Foundations, ed. Melanie C. Green, Jeffrey J. Strange, and Timothy C. Brock (Mahwah, N.J.: Lawrence Erlbaum, 2004): 315–41. 29. Oswald Huber, “Time Pressure in Risky Decision Mak- ing: Effect on Risk Defusing,” Psychology Science 49, no. 4 (2007): 415–26. 30. The Open Citation Project, “The Effect of Open Access and Downloads (‘Hits’) on Citation Impact: A Bibliography of Studies,” July 17, 2009, http://opcit.eprints.org/oacitation -biblio.html (accessed July 29, 2009). 31. Matthew T. Crawford et al., “Reactance, Compliance, and Anticipated Regret,” Journal of Experimental Social Psychology 38, no. 1 (2002): 56–63. 32. Nicolas Gueguen and Alexandre Pascual, “Evocation of Freedom and Compliance: The ‘But You Are Free of . . .’ Tech- nique,” Current Research in Social Psychology 5, no. 18 (2000): 264–70. 33. James P. Dillard, “The Current Status of Research on Sequential Request Compliance Techniques,” Personality & Social Psychology Bulletin 17, no. 3 (1991): 283–88. 34. Thomas Mussweiler, “The Malleability of Anchoring Effects,” Experimental Psychology 49, no. 1 (2002): 67–72. 35. Robert B. Cialdini and Noah J. Goldstein, “Social Influ- ence: Compliance and Conformity,” Annual Review of Psychology 55 (2004): 591–21. 36. James M. Wyant and Stephen L. Smith, “Getting More by Asking for Less: The Effects of Request Size on Donations of Char- ity,” Journal of Applied Social Psychology 17, no. 4 (1987): 392–400. 37. Lydia J. Price, “The Joint Effects of Brands and Warranties in Signaling New Product Quality,” Journal of Economic Psychol- ogy 23, no. 2 (2002): 165–90. with Faculty to Enhance Scholarly Communication,” D-Lib Mag- azine 8, no. 11 (2002), http://www.dlib.org/dlib/november02/ johnson/11johnson.html (accessed Apr. 2, 2008). 5. Bruce Heterick, “Faculty Attitudes Toward Electronic Resources,” Educause Review 37, no. 4 (2002): 10–11. 6. Nancy Fried Foster and Susan Gibbons, “Understanding Faculty to Improve Content Recruitment for Institutional Repos- itories,” D-Lib Magazine 11, no. 1 (2005), http://www.dlib.org/ dlib/january05/foster/01foster.html (accessed July 29, 2009). 7. Suzanne Bell, Nancy Fried Foster, and Susan Gibbons, “Reference Librarians and the Success of Institutional Reposito- ries,” Reference Services Review 33, no. 3 (2005): 283–90. 8. Diane Harley et al., “The Influence of Academic Values on Scholarly Publication and Communication Practices,” Center for Studies in Higher Education, Research & Occasional Paper Series: CSHE.13.06, Sept. 1, 2006, http://repositories.cdlib.org/ cshe/CSHE-13-06/ (accessed Apr. 17, 2008). 9. Rea Devakos, “Towards User Responsive Institutional Repositories: A Case Study,” Library High Tech 24, no. 2 (2006): 173–82. 10. Philip M. Davis and Matthew J. L. Connolly, “Institutional Repositories: Evaluating the Reasons for Non-Use of Cornell University’s Installation of DSpace,” D-Lib Magazine 13, no. 3/4 (2007), http://www.dlib.org/dlib/march07/davis/03davis .html (accessed July 29, 2009). 11. Jihyun Kim, “Motivating and Impeding Factors Affecting Faculty Contribution to Institutional Repositories,” Journal of Digital Information 8, no. 2 (2007), http://journals.tdl.org/jodi/ article/view/193/177 (accessed July 29, 2009). 12. Peter Suber, “Open Access Overview” online posting, Open Access News: News from the Open Access Environment, June 21, 2004, http://www.earlham.edu/~peters/fos/overview .htm (accessed 29 July 2009). 13. See, for example, Jeffrey D. Ford and Laurie W. Ford, “Decoding Resistance to Change,” Harvard Business Review 87, no. 4 (2009): 99–103.; John P. Kotter and Leonard A. Schlesinger, “Choosing Strategies for Change,” Harvard Business Review 86, no. 7/8 (2008): 130–39; and Paul R. Lawrence, “How to Deal with Resistance to Change,” Harvard Business Review 47, no. 1 (1969): 4–176. 14. Julia Zuwerink Jacks and Maureen E. O’Brien, “Decreas- ing Resistance by Affirming the Self,” in Resistance and Per- suasion, ed. Eric S. Knowles and Jay A. Linn (Mahwah, N.J.: Lawrence Erlbaum, 2004): 235–57. 15. Benjamin Margolis, “Notes on Narcissistic Resistance,” Modern Psychoanalysis 9, no. 2 (1984): 149–56. 16. Ralph Grabhorn et al., “The Therapeutic Relationship as Reflected in Linguistic Interaction: Work on Resistance,” Psycho- therapy Research 15, no. 4 (2005): 470–82. 17. Arthur Aron et al., “The Experimental Generation of Interpersonal Closeness: A Procedure and Some Preliminary Findings,” Personality & Social Psychology Bulletin 23, no. 4 (1997): 363–77. 18. Geoffrey L. Cohen, Joshua Aronson, and Claude M. Steele, “When Beliefs Yield to Evidence: Reducing Biased Evaluation by Affirming the Self,” Personality & Social Psychology Bulletin 26, no. 9 (2000): 1151–64. 19. Anthony R. Pratkanis, “Altercasting as an Influence Tac- tic,” in Attitudes, Behavior and Social Context: The Role of Norms and Group Membership, ed. Deborah J. Terry and Michael A.Hogg (Mahwah, N.J.: Lawrence Erlbaum, 2000): 201–26. 3149 ---- 2 iNFORMAtiON tecHNOlOGY AND liBRARies | MARcH 2010 Michelle Frisque (mfrisque@northwestern.edu) is LiTA President 2009–10 and head, information Systems, North- western University, Chicago. Michelle Frisque Michelle Frisque (mfrisque@northwestern.edu) is LiTA President 2009–10 and head, information Systems, North- western University, Chicago. Michelle Frisque President’s Message: Join Us at the Forum! T he first LITA National Forum I attended was in Milwaukee, Wisconsin. It seems like it was only a couple of years ago, but in fact nine National Forums have since passed. I was a new librarian, and I went on a lark when a colleague invited me to attend and let me crash in her room for free. I am so glad I took her up on the offer because it was one of the best conferences I have ever attended. It was the first conference that I felt was made up of people like me, people who shared my interests in technology within the library. The program- ming was a good mix of practical know-how and mind- blowing possibilities. My understanding of what was possible was greatly expanded, and I came home excited and ready to try out the new things I had learned. Almost eight years passed before I attended my next Forum in Cincinnati, Ohio. After half a day I wondered why I had waited so long. The program was diverse, cov- ering a wide range of topics. I remember being depressed and outraged on the current state of Internet access in the United States as reported by the Office for Information Technology Policy. I felt that surge of recognition when I discovered that other universities were having a difficult time documenting and tracking the various systems they run and maintain. I was inspired by David Lanke’s talk, “Obligations of Leadership.” If you missed it you can still hear it online. It is linked from the LITA Blog (http:// www.litablog.org). While the next Forum may seem like a long way off to you, it is in the forefront of my mind. The National Forum 2010 Planning Committee is busy working to make sure this Forum lives up to the reputation of Forums past. This year’s Forum takes place in Atlanta, Georgia, September 30–October 3. The theme is “The Cloud and the Crowd.” Program proposals are due February 19, so I cannot give you specifics about the concurrent sessions, but we do hope to have presentations about projects, plans, or discoveries in areas of library-related technology involv- ing emerging cloud technologies; software-as-service, as well as social technologies of various kinds; using vir- tualized or cloud resources for storage or computing in libraries; library-specific open-source software (OSS) and other OSS “in” libraries; technology on a budget; using crowdsourcing and user groups for supporting technol- ogy projects; and training via the crowd. Each accepted program is scheduled to maximize the impact for each attendee. Programming ranges from five-minute lightening talks to full day preconferences. In addition, on the basis of attendee comments from previ- ous Forums, we have also decided to offer thirty- and seventy-five-minute concurrent sessions. These concur- rent sessions will be a mix of traditional single- or multi- speaker formats, panel discussions, case studies, and demonstrations of projects. Finally, poster sessions will also be available. While programs such as the keynote speakers, light- ning talks, and concurrent sessions are an important part of the Forum experience, so is the opportunity to network with other attendees. I know I have learned just as much talking with a group of people in the hall between ses- sions, during lunch, or at the networking dinners as I have sitting in the programs. Not only is it a great oppor- tunity to catch up with old friends, you will also have the opportunity to make new ones. For instance, at the 2009 National Forum in Salt Lake City, Utah, approximately half of the people who attended were first-time attendees. The National Forum is an intimate event whose atten- dance ranges between 250 and 400 people, thus making it easy to forge personal connections. Attendees come from a variety of settings, including academic, public, and spe- cial libraries; library-related organizations; and vendors. If you want to meet the attendees in a more formal setting you can attend a networking dinner organized on-site by LITA members. This year the dinners were organized by the LITA president, LITA past president, LITA president- elect, and a LITA director-at-large. If you have not attended a National Forum or it has been a while, I hope I have piqued your interest in com- ing to the next National Forum in Atlanta. Registration will open in May! The most up-to-date information about the 2010 Forum is available at the LITA website (http:// www.lita.org). I know that even after my LITA presidency is a distant memory, I will still make time to attend the LITA National Forum. I hope to see you there! 3146 ---- 76 iNFormAtioN tecHNoloGY ANd liBrAries | JuNe 2010 In this paper we discuss the design space of meth- ods for integrating information from Web services into websites. We focus primarily on client-side mash-ups, in which code running in the user’s browser contacts Web services directly without the assistance of an inter- mediary server or proxy. To create such mash-ups, we advocate the use of “widgets,” which are easy-to-use, customizable HTML elements whose use does not require programming knowledge. Although the techniques we discuss apply to any Web-based information system, we specifically consider how an OPAC can become both the target of Web services integration and also a Web service that provides information to be integrated elsewhere. We describe three widget libraries we have developed, which provide access to four Web services. These libraries have been deployed by us and others. Our contributions are twofold: We give practitioners an insight into the trade-offs surrounding the appropri- ate choice of mash-up model, and we present the specific designs and use examples of three concrete widget libraries librarians can directly use or adapt. All software described in this paper is available under the LGPL Open Source License. ■■ Background Web-based information systems use a client-server archi- tecture in which the server sends HTML markup to the user’s browser, which then renders this HTML and dis- plays it to the user. Along with HTML markup, a server may send JavaScript code that executes in the user’s browser. This JavaScript code can in turn contact the original server or additional servers and include infor- mation obtained from them into the rendered content while it is being displayed. This basic architecture allows for myriad possible design choices and combinations for mash-ups. Each design choice has implications to ease of use, customizability, programming requirements, hosting requirements, scalability, latency, and availability. server-side mash-ups In a server-side mash-up design, shown in figure 1, the mash-up server contacts the base server and each source when it receives a request from a client. It combines the information received from the base server and the sources and sends the combined HTML to the client. Server-side mash-up systems that combine base and mash-up servers are also referred to as data mash-up systems. Such data mash-up systems typically provide a Web-based configuration front-end that allows users to select data sources, specify the manner in which they are combined, and to create a layout for the entire mash-up. Godmar Back and Annette Bailey Web Services and Widgets for Library Information Systems As more libraries integrate information from web services to enhance their online public displays, techniques that facilitate this integration are needed. This paper presents a technique for such integration that is based on HTML widgets. We discuss three example systems (Google Book Classes, Tictoclookup, and MAJAX) that implement this technique. These systems can be easily adapted without requiring programming experience or expensive hosting. T o improve the usefulness and quality of their online public access catalogs (OPACs), more and more librarians include information from addi- tional sources into their public displays.1 Examples of such sources include Web services that provide addi- tional bibliographic information, social bookmarking and tagging information, book reviews, alternative sources for bibliographic items, table-of-contents previews, and excerpts. As new Web services emerge, librarians quickly integrate them to enhance the quality of their OPAC displays. Conversely, librarians are interested in opening the bibliographic, holdings, and circulation information contained in their OPACs for inclusion into other Web offerings they or others maintain. For example, by turn- ing their OPAC into a Web service, subject librarians can include up-to-the-minute circulation information in sub- ject or resource guides. Similarly, university instructors can use an OPAC’s metadata records to display citation information ready for import into citation management software on their course pages. The ability to easily create such “mash-up” pages is crucial for increasing the vis- ibility and reach of the digital resources libraries provide. Although the technology to use Web services to create mash-ups is well known, several practical requirements must be met to facilitate its widespread use. First, any environment providing for such integration should be easy to use, even for librarians with limited programming background. This ease of use must extend to environments that include proprietary systems, such as vendor-provided OPACs. Second, integration must be seamless and custom- izable, allowing for local display preferences and flexible styling. Third, the setup, hosting, and maintenance of any necessary infrastructure must be low-cost and should maximize the use of already available or freely accessible resources. Fourth, performance must be acceptable, both in terms of latency and scalability.2 Godmar Back (gback@cs.vt.edu) is Assistant Professor, depart- ment of Computer Science and Annette Bailey (afbailey@vt.edu) is Assistant Professor, University Libraries, Virginia Tech Univer- sity, Blacksburg. weB services ANd widGets For liBrArY iNFormAtioN sYstems | BAck ANd BAileY 77 Examples of such systems include Dapper and Yahoo! Pipes.3 These systems require very little programming knowledge, but they limit mash-up creators to the func- tionality supported by a particular system and do not allow the user to leverage the layout and functionality of an existing base server, such as an existing OPAC. Integrating server-side mash-up systems with pro- prietary OPACs as the base server is difficult because the mash-up server must parse the OPAC’s output before integrating any additional information. Moreover, users must now visit—or be redirected to—the URL of the mash-up server. Although some emerging extensible OPAC designs provide the ability to include information from external sources directly and easily, most currently deployed systems do not.4 In addition, those mash-up servers that do usually require server-side programming to retrieve and integrate the information coming from the mash-up sources into the page. The availability of software libraries and the use of special purpose markup languages may mitigate this requirement in the future. From a performance scalability point of view, the mash-up server is a bottleneck in server-side mash-ups and therefore must be made large enough to handle the expected load of end-user requests. On the other hand, the caching of data retrieved from mash-up sources is simple to implement in this arrangement because only the mash-up server contacts these sources. Such caching reduces the frequency with which requests have to be sent to sources if their data is cacheable, that is, if real- time information is not required. The latency in this design is the sum of the time required for the client to send a request to the mash- up server and receive a reply, plus the processing time required by the server, plus the time incurred by sending a request and receiving a reply from the last responding mash-up source. This model assumes that the mash-up server contacts all sources in parallel, or as soon as the server knows that information from a source should be included in a page. The availability of the system depends on the avail- ability of all mash-up sources. If a mash-up source does not respond, the end user must wait until such failure is apparent to the mash-up server via a timeout. Finally, because the mash-up server acts as a client to the base and source servers, no additional security considerations apply with respect to which sources may be contacted. There also are no restrictions on the data interchange for- mat used by source servers as long as the mash-up server is able to parse the data returned. client-side mash-ups In a client-side setup, shown in figure 2, the base server sends only a partial website to the client, along with JavaScript code that instructs the client which other sources of information to contact. When executed in the browser, this JavaScript code retrieves the information from the mash-up sources directly and completes the mash-up. The primary appeal of client-side mashing is that no mash-up server is required, and thus the URL that users visit does not change. Consequently, the mash-up server is no longer a bottleneck. Equally important, no main- tenance is required for this server, which is particularly relevant when libraries use turnkey solutions that restrict administrative access to the machine housing their OPAC. On the other hand, without a mash-up server, results from mash-up sources can no longer be centrally cached. Thus the mash-up sources themselves must be sufficiently Figure 1. Server-side mash-up construction Figure 2. Client-side mash-up construction 78 iNFormAtioN tecHNoloGY ANd liBrAries | JuNe 2010 scalable to handle the expected number of requests. As a load-reducing strategy, mash-up sources can label their results with appropriate expiration times to influence the caching of results in the clients’ browsers. Availability is increased because the mash-up degrades gracefully if some of the mash-up sources fail, since the information from the remaining sources can still be dis- played to the user. Assuming that requests are sent by the client in parallel or as soon as possible, and assuming that each mash-up source responds with similar latency to requests sent by the user’s browser as to requests sent by a mash-up server, the latency for a client-side mash-up is similar to the server-side mash-up. However, unlike in the server-side approach, the page designer has the option to display partial results to the user while some requests are still in progress, or even to delay sending some requests until the user explicitly requests the data by clicking on a link or other element on the page. Because client-side mash-ups rely on JavaScript code to contact Web services directly, they are subject to a number of restrictions that stem from the security model governing the execution of JavaScript code in current browsers. This security model is designed to protect the user from malicious websites that could exploit client-side code and abuse the user’s credentials to retrieve HTML or XML data from other websites to which a user has access. Such malicious code could then relay this potentially sensitive data back to the malicious site. To prevent such attacks, the security model allows the retrieval of HTML text or XML data only from sites within the same domain as the origin site, a policy commonly known as same- origin policy. In figure 2, sources A and B come from the same domain as the page the user visits. The restrictions of the same-origin policy can be avoided by using the JavaScript Object Notation (JSON) interchange format.5 Because client-side code may retrieve and execute JavaScript code served from any domain, Web services that are not co-located with the origin site can make their results available using JSON. Doing so facilitates their inclusion into any page, independent of the domain from which it is served (see source C in figure 2). Many existing Web services already provide an option to return data in JSON format, perhaps along with other formats such as XML. For Web services that do not, a proxy server may be required to translate the data com- ing from the service into JSON. If the implementation of a proxy server is not feasible, the Web service is usable only on pages within the same domain as the website using it. Client-side mash-ups lend themselves naturally to enhancing the functionality of existing, proprietary OPAC systems, particularly when a vendor provides only lim- ited extensibility. Because they do not require server-side programming, the absence of a suitable vendor-provided server-side programming interface does not prevent their creation. Oftentimes, vendor-provided templates or variables can be suitably adapted to send the necessary HTML markup and JavaScript code to the client. The amount of JavaScript code a librarian needs to write (or copy from a provided example) determines both the likelihood of adoption and the maintainability of a given mash-up creation. The less JavaScript code there is to write, the larger the group of librarians who feel comfortable try- ing and adopting a given implementation. The approach of using HTML widgets hides the use of JavaScript almost entirely from the mash-up creator. HTML widgets repre- sent specially composed markup, which will be replaced with information coming from a mash-up source when the page is rendered. Because the necessary code is contained in a JavaScript library, adapters do not need to understand programming to use the information coming from the Web service. Finally, HTML widgets are also preferable for JavaScript-savvy users because they create a layer of abstraction over the complexity and browser dependencies inherent in JavaScript programming. ■■ The Google Book Classes Widget Library To illustrate our approach, we present a first example that allows the integration of data obtained from Google Book Search into any website, including OPAC pages. Google Book Search provides access to Google’s database of book metadata and contents. Because of the company’s book scanning activities as well as through agreements with publishers, Google hosts scanned images of many book jackets as well as partial or even full previews for some books. Many libraries are interested in either using the book jackets when displaying OPAC records or alerting their users if Google can provide a partial or full view of an item a user selected in their catalog, or both.6 This service can help users decide whether to borrow the book from the library. the Google Book search dynamic link APi The Google Book Search Dynamic Link API is a JSON- based Web service through which Google provides certain metadata for items it has indexed. It can be queried using bibliographic identifiers such as ISBN, OCLC number, or Library of Congress Control Number (LCCN). It returns a small set of data that includes the URL of a book jacket thumbnail image, the URL of a page with bibliographic information, the URL of a preview page (if available), as well as information about the extent of any preview and whether the preview viewer can be embedded directly into other pages. Table 1 shows the JSON result returned for an example ISBN. weB services ANd widGets For liBrArY iNFormAtioN sYstems | BAck ANd BAileY 79 widgetization To facilitate the easy integration of this service into web- sites without JavaScript programming, we developed a widget library. From the adapter’s perspective, the use of these widgets is extremely simple. The adapter places HTML or
tags into the page where they want data from Google Book Search to display. These tags contain an HTML attribute that acts as an identifier to describe the bibliographic item for which information should be retrieved. It may contain its ISBN, OCLC num- ber, or LCCN. In addition, the tags also contain one or more HTML <class> attributes to describe which processing should be done with the information retrieved from Google to integrate it into the page. These classes can be combined with a list of traditional CSS classes in the <class> attribute to apply further style and formatting control. examples As an example, consider the following HTML an adapter may use in a page: <span title=“ISBN:0596000278” class=“gbs -thumbnail gbs-link-to-preview”></span> When processed by the Google Book Classes widget library, the class “gbs-thumbnail” instructs the widget to embed a thumbnail image of the book jacket for ISBN 0596000278, and “gbs-link-to-preview” provides instruc- tions to wrap the <span> tag in a hyperlink pointing to Google’s preview page. The result is as if the server had contacted Google’s Web service and constructed the HTML shown in example 1 in table 2, but the mash-up creator does not need to be concerned with the mechanics of contacting Google’s service and making the necessary manipulations to the document. Example 2 in table 2 demonstrates a second possible use of the widget. In this example, the creator’s intent is to display an image that links to Google’s information page if and only if Google provides at least a partial preview for the book in question. This goal is accom- plished by placing the image inside the span and using style=“display:none” to make the span initially invisible. The span is made visible only if a preview is available at Google, displaying the hyperlinked image. The full list of features supported by the Google Book Classes widget library can be found in table 3. integration with legacy oPAcs The approach described thus far assumes that the mash- up creator has sufficient control over the HTML markup that is sent to the user. This assumption does not always hold if the HTML is produced by a vendor-provided system, since such systems automatically generate most of the HTML used to display OPAC search results or indi- vidual bibliographic records. If the OPAC provides an extension system, such as a facility to embed customized links to external resources, it may be used to generate the necessary HTML by utilizing variables (e.g., “@#ISBN@” for ISBN numbers) set by the OPAC software. If no extension facility exists, accommodations by the widget library are needed to maintain the goal of not requiring any programming on the part of the adapter. We implemented such accommodations to facilitate the use of Google Book Classes within a III Millennium OPAC.7 We used magic strings such as “ISBN:millennium.record” in a Table 1. Sample Request and Response for Google Book Search Dynamic Link API Request: http://books.google.com/books?bibkeys=ISBN:0596000278&jscmd=viewapi&callback=process JSON Response: process({ “ISBN:0596000278”: { “bib_key”: “ISBN:0596000278”, “info_url”: “http://books.google.com/books?id=ezqe1hh91q4C\x26source=gbs_ViewAPI”, “preview_url”: “http://books.google.com/books?id=ezqe1hh91q4C\x26printsec=frontcover\x26 source=gbs_ViewAPI”, “thumbnail_url”: “http://bks4.books.google.com/books?id=ezqe1hh91q4C\x26printsec=frontcover\x26 img=1\x26zoom=5\x26sig=ACfU3U2d1UsnXw9BAQd94U2nc3quwhJn2A”, “preview”: “partial”, “embeddable”: true } }); 80 iNFormAtioN tecHNoloGY ANd liBrAries | JuNe 2010 Table 2. Example of client-side processing by the Google Book Classes widget library Example 1: HTML Written by Adapter Browser Display <span title=“ISBN:0596000278” class=“gbs-thumbnail gbs-link-to-preview”> </span> Resultant HTML after Client-Side Processing <a href=“http://books.google.com/books?id=ezqe1hh91q4C& printsec=frontcover&source=gbs_ViewAPI”> <span title=“” class=”gbs-thumbnail gbs-link-to-preview”> <img src=“http://bks3.books.google.com/books?id=ezqe1hh91q4C& amp;printsec=frontcover&img=1&zoom=5& sig=ACfU3U2d1UsnXw9BAQd94U2nc3quwhJn2A” /> </span> </a> Example 2: HTML Written by Adapter Browser Display <span style=“display: none” title=“ISBN:0596000278” class=“gbs-link-to-info gbs-if-partial-or-full”> <img src=“http://www.google.com/intl/en/googlebooks/images/ gbs_preview_button1.gif” /> </span> Resultant HTML after Client-Side Processing <a href=”http://books.google.com/books?id=ezqe1hh91q4C& source=gbs_ViewAPI”> <span title=“” class=“gbs-link-to-info gbs-if-partial-or-full”> <img src=“http://www.google.com/intl/en/googlebooks/images/ gbs_preview_button1.gif” /> </span> </a> Table 3. Supported Google Book classes Google Book Class Meaning gbs-thumbnail gbs-link-to-preview gbs-link-to-info gbs-link-to-thumbnail gbs-embed-viewer gbs-if-noview gbs-if-partial-or-full gbs-if-partial gbs-if-full gbs-remove-on-failure Include an <img...> embedding the thumbnail image Wrap span/div in link to preview at Google Book Search (GBS) Wrap span/div in link to info page at GBS Wrap span/div in link to thumbnail at GBS Directly embed a viewer for book’s content into the page, if possible Keep this span/div only if GBS reports that book’s viewability is “noview” Keep this span/div only if GBS reports that book’s viewability is at least “partial” Keep this span/div only if GBS reports that book’s viewability is “partial” Keep this span/div only if GBS reports that book’s viewability is “full” Remove this span/div if GBS doesn’t return book information for this item <title> attribute to instruct the widget library to harvest the ISBN from the current page via screen scraping. Figure 3 provides an example of how a Google Book Classes widget can be integrated into an OPAC search results page. ■■ The Tictoclookup Widget Library The ticTOCs Journal Table of Contents Service is a free online service that allows academic researchers and weB services ANd widGets For liBrArY iNFormAtioN sYstems | BAck ANd BAileY 81 other users to keep up with newly published research by giving them access to thousands of journal tables of con- tents from multiple publishers.8 The ticTOCs consortium compiles and maintains a dataset that maps ISSNs and journal titles to RSS-feed URLs for the journals’ tables of contents. the tictoclookup web service We used the ticTOCs dataset to create a simple JSON Web service called “Tictoclookup” that returns RSS-feed URLs when queried by ISSN and, optionally, by journal title. Table 4 shows an example query and response. To accommodate different hosting scenarios, we created two implementations of this Tictoclookup: a standalone and a cloud-based implementation. The standalone version is implemented as a Python Web application conformant to the Web Services Gateway Interface (WSGI) specification. Hosting this version requires access to a Web server that supports a WSGI- compatible environment, such as Apache’s mod_wsgi. The Python application reads the ticTOCs dataset and responds to lookup requests for specific ISSNs. A cron job downloads the most up-to-date version of the dataset periodically. The cloud version of the Tictoclookup service is implemented as a Google App Engine (GAE) applica- tion. It uses the highly scalable and highly available GAE Datastore to store ticTOCs data records. GAE applications run on servers located in Google’s regional data centers so that requests are handled by a data center geographically close to the requesting client. As of June 2009, Google hosting of GAE applications is free, which includes a free allotment of several computational resources. For each application, GAE allows quotas of up to 1.3 MB requests and the use of up to 10 GB of bandwidth per twenty-four- hour period. Although this capacity is sufficient for the purposes of many small- and medium-size institutions, additional capacity can be purchased at a small cost. widgetization To facilitate the easy integration of this service into websites without JavaScript programming, we developed a widget library. Like Google Book Classes, this widget library is controlled via HTML attributes associated with HTML <span> or <div> tags that are placed into the page where the user decides to display data from the Tictoclookup service. The HTML <title> attribute identifies the journal by its ISSN or its ISSN and title. As with Google Book Classes, Figure 3. Sample use of Google Book Classes in an OPAC results page Table 4. Sample request and response for ticTOCs lookup Web service Request: http://tictoclookup.appspot.com/0028-0836?title=Nature&jsoncallback=process JSON Response: process({ “lastmod”: “Wed Apr 29 05:42:36 2009”, “records”: [{ “title”: “Nature”, “rssfeed”: http://www.nature.com/nature/current_issue/rss }], “issn”: “00280836” }); 82 iNFormAtioN tecHNoloGY ANd liBrAries | JuNe 2010 the HTML <class> attribute describes the desired process- ing, which may contain traditional CSS classes. example Consider the following HTML an adapter may use in a page: <span style=“display:none” class=“tictoc-link tictoc-preview tictoc-alternate-link” title=“ISSN:00280836: Nature”> Click to subscribe to Table of Contents for this journal </span> When processed by the Tictoclookup widget library, the class “tictoc-link” instructs the widget to wrap the span in a link to the RSS feed at which the table of con- tent is published, allowing users to subscribe to it. The class “tictoc-preview” associates a tooltip element with the span, which displays the first entries of the feed when the user hovers over the link. We use the Google Feeds API, another JSON-based Web service, to retrieve a cached copy of the feed. The “tictoc-alternate-link” class places an alternate link into the current document, which in some browsers triggers the display of the RSS feed icon Figure 4. Sample use of tictoclookup classes in the status bar. The <span> element, which is initially invisible, is made visible if and only if the Tictoclookup service returns information for the given pair of ISSN and title. Figure 4 provides a screenshot of the display if the user hovers over the link. As with Google Book Classes, the mash-up creator does not need to be concerned with the mechanics of contacting the Tictoclookup Web service and making the necessary manipulations to the document. Table 5 provides a com- plete overview of the classes Tictoclookup supports. integration with legacy oPAcs Similar to the Google Book Classes widget library, we implemented provisions that allow the use of Tictoclookup classes on pages over which the mash-up creator has limited control. For instance, specifying a title attribute of “ISSN:millennium.issnandtitle” harvests the ISSN and journal title from the III Millennium’s record display page. ■■ MAJAX Whereas the widget libraries discussed thus far integrate external Web services into an OPAC display, MAJAX is a widget library that integrates information coming from an OPAC into other pages, such as resource guides or course displays. MAJAX is designed for use with a III Millennium Integrated Library System (ILS) whose vendor does not provide a Web-services interface. The tech- niques we used, however, extend to other OPACs as well. Like many Table 5. Supported Tictoclookup classes Tictoclookup Class Meaning tictoc-link tictoc-preview tictoc-embed-n tictoc-alternate-link tictoc-append-title Wrap span/div in link to table of contents Display tooltip with preview of current entries Embed preview of first n entries Insert <link rel=“alternate”> into document Append the title of the journal to the span/div weB services ANd widGets For liBrArY iNFormAtioN sYstems | BAck ANd BAileY 83 legacy OPACs, Millennium does not only lack a Web-services interface, but lacks any programming interface to the records contained in the system and does not provide access to the database or file system of the machine housing the OPAC. Providing oPAc data as a web service We implemented two methods to access records from the Millennium OPAC using bibliographic identifi- ers such as ISBN, OCLC number, bibliographic record number, and item title. Both methods provide access to complete MARC records and holdings information, along with locations and real-time availability for each held item. MAJAX extracts this information via screen- scraping from the MARC record display page. As with all screen-scraping approaches, the code performing the scraping must be updated if the output format provided by the OPAC changes. In our experience, such changes occur at a frequency of less than once per year. The first method, MAJAX 1, implements screen scrap- ing using JavaScript code that is contained in a document placed in a directory on the server (/screens), which is normally used for supplementary resources, such as images. This document is included in the target page as a hidden HTML <iframe> element (see frame B in figure 2). Consequently, the same-domain restriction applies to the code residing in it. MAJAX 1 can thus be used only on pages within the same domain—for instance, if the OPAC is housed at opac.library.university.edu, MAJAX 1 may be used on all pages within *.university.edu (not merely *.library.university.edu). The key advantage of MAJAX 1 is that no additional server is required. The second method, MAJAX 2, uses an intermediary server that retrieves the data from the OPAC, translates it to JSON, and returns it to the client. This method, shown in figure 5, returns JSON data and therefore does not suffer from the same-domain restriction. However, it requires hosting the MAJAX 2 Web service. Like the Tictoclookup Web service, we implemented the MAJAX 2 Web service using Python conformant to WSGI. A single installation can support multiple OPACs. widgetization The MAJAX widget library allows the integration of both MAJAX 1 and MAJAX 2 data into websites without JavaScript programming. The <span> tags function as placeholders, and <title> and <class> attributes describe the desired processing. MAJAX provides a number of “MAJAX classes,” multiple of which can be specified. These classes allow a mash-up creator to insert a large variety of bibliographic information, such as the val- ues of MARC fields. Classes are also provided to insert fully formatted, ready-to-copy bibliographic references in Harvard style, live circulation information, links to the catalog record, links to online versions of the item (if applicable), a ready-to-import RIS description of the item, and even images of the book cover. A list of classes MAJAX supports is provided in table 6. examples Figure 6 provides an example use of MAJAX widgets. Four <span> tags expand into the book cover, a complete Harvard-style reference, the valid of a specific MARC field (020), and a display of the current availability of the item, wrapped in a link to the catalog record. Texts such as “copy is available” shown in figure 6 are localizable. Even though there are multiple MAJAX <span> tags that refer to the same ISBN, the MAJAX widget library will contact the MAJAX 1 or MAJAX 2 Web service only once per identifier, independent of how often it is used in a page. To manage the load, the MAJAX client site library can be configured to not exceed a maximum number of requests per second, per client. All software described in this paper is available under the LGPL Open Source License. The MAJAX libraries have been used by us and others for about two years. For instance, the “New Books” list in our library uses MAJAX 1 to provide circulation information. Faculty members at our institution are using MAJAX to enrich their course websites. A number of libraries have adopted MAJAX 1, which is particularly easy to host because no additional server is required. ■■ Related work Most ILSs in use today do not provide suitable Web-services interfaces to access either bibliographic information Figure 5. Architecture of the MAJAX 2 Web service 84 iNFormAtioN tecHNoloGY ANd liBrAries | JuNe 2010 or availability data.9 This shortcoming is addressed by multiple initiatives. The ILS Discovery Interface task force (ILS-DI) created a set of rec- ommendations that facilitate the integration of discovery interfaces with legacy ILSs, but does not define a concrete API.10 Related, the ISO 20775 Holdings standard describes an XML schema to describe the availability of items across sys- tems, but does not describe an API for accessing them.11 Many ILSs provide a Z39.50 interface in addition to their HTML- based Web OPACs, but Z39.50 does not provide standardized holdings and availability.12 Nevertheless, there is hope within the community that ILS vendors will react to their customers’ needs and provide Web-services interfaces that implement these recommenda- tions. The Jangle project provides an API and an implementation of the ILS-DI recommendations through a Representations State Transfer (REST)–based interface that uses the Atom Publishing Protocol (APP).13 Jangle can be linked to legacy ILSs via connec- tors. The use of the XML-based APP prevents direct access from client-side JavaScript code, how- ever. In the future, adoption and widespread implementation of the W3C working draft on cross- origin resource sharing may relax the same-origin restriction in a controlled fashion, and thus allow access to APP feeds from JavaScript across domains.14 Screen-scraping is a common technique used to over- come the lack of Web-services interfaces. For instance, OCLC’s WorldCat Local product obtains access to avail- ability information from legacy ILSs in a similar fashion as our MAJAX 2 service.15 Whereas the Web services used or created in our work exclusively use a REST-based model and return data in JSON format, interfaces based on SOAP (formerly Simple Object Access Protocol) whose semantics are described by a WSDL specification provide an alternative if access from within client-side JavaScript code is not required.16 HTML Written by Adapter <table width=“340”><tr><td> <span class=“majax-syndetics-vtech” title=“i1843341662”></span> </td><td> <span class=“majax-harvard-reference” title=“i1843341662”></span> <br /> ISBN: <span class=“majax-marc-020” title=“i1843341662”></span> <br /> <span class=“majax-linktocatalogmajax-showholdings” title=“i1843341662”></span> </td></tr></table> Display in Browser after Processing Dahl, Mark., Banerjee, Kyle., Spalti, Michael., 2006, Digital libraries : integrating content and systems / Oxford, Chandos Publishing, xviii, 203 p. ISBN: 1843341662 (hbk.) 1 copy is available Figure 6. Example use of MAJAX widgets OCLC Grid Services provides REST-based Web-services interfaces to several databases, including the WorldCat Search API and identifier services such as xISBN, xISSN, and xOCLCnum for FRBR-related metadata.17 These ser- vices support XML and JSON and could benefit from widgetization for easier inclusion into client pages. The use of HTML markup to encode processing instructions is common in JavaScript frameworks, such as YUI or Dojo, which use <div> elements with custom- defined attributes (so-called expando attributes) for this purpose.18 Google Gadgets uses a similar technique as well.19 The widely used Context Objects in Spans (COinS) specification exploits <span> tags to encode OpenURL Table 6. Selected MAJAX classes MAJAX Class Replacement majax-marc-FFF-s majax-marc-FFF majax-syndetics-* majax-showholdings majax-showholdings-brief majax-endnote majax-ebook majax-linktocatalog majax-harvard-reference majax-newline majax-space MARC field FFF, subfields concatenation of all subfields in field FFF book cover image current holdings and availability information …in brief format RIS version of record link to online version, if any link to record in catalog reference in Harvard style newline space weB services ANd widGets For liBrArY iNFormAtioN sYstems | BAck ANd BAileY 85 techniques for the seamless inclusion of information from Web services into websites. We considered the cases where an OPAC is either the target of such integra- tion or the source of the information being integrated. We focused on client-side techniques in which each user’s browser contacts Web services directly because this approach lends itself to the creation of HTML widgets. These widgets allow the integration and customization of Web services without requiring programming. Therefore nonprogrammers can become mash-up creators. We described in detail the functionality and use of several widget libraries and Web services we built. Table 7 provides a summary of the functionality and hosting requirements for each system discussed. Although the specific requirements for each system differ because of their respective nature, all systems are designed to be deployable with minimum effort and resource require- ments. This low entry cost, combined with the provision of a high-level, nonprogramming interface, constitute two crucial preconditions for the broad adoption of mash-up techniques in libraries, which in turn has the potential to context objects in pages for processing by client-side extension.20 LibraryThing uses client-side mash-up tech- niques to incorporate a social tagging service into OPAC pages.21 Although their technique uses a <div> ele- ment as a placeholder, it does not allow customization via classes—the changes to the content are encoded in custom-generated JavaScript code for each library that subscribes to the service. The Juice Project shares our goal of simplifying the enrichment of OPAC pages with content from other sources.22 It provides a set of reusable components that is directed at JavaScript programmers, not librarians. In the computer-science community, multiple emerg- ing projects investigate how to simplify the creation of server-side data mash-ups by end user programmers.23 ■■ Conclusion This paper explored the design space of mash-up Table 7. Summary of features and requirements for the widget libraries presented in this paper Majax 1 Majax 2 Google Book Classes Tictoclookup Classes Web Service Screen Scraping III Record Display JSON Proxy for III Record Display Google Book Search Dynamic Link API books.google.com ticTOC Cloud Application tictoclookup .appspot.com Hosted By Existing Millennium Installation /screens WSGI/Python Script on libx.lib.vt.edu Google, Inc. Google, Inc. via Google App Engine Data Provenance Your OPAC Your OPAC Google JISC (www.tictocs .ac.uk) Additional Cost N/A Can use libx.lib.vt.edu for testing, must run WSGI-enabled web server in production Free, but subject to Google Terms of Service Generous free quota, pay per use beyond that Same Domain Restriction Yes No No No Widgetization majax.js: class-based: majax- classes gbsclasses.js:class- based: gbs- tictoc.js:class-based: tictoc- Requires JavaScript programming No No No No Requires Additional Server No Yes (Apache+mod_wsgi) No No (if using GAE), else need Apache+mod_wsgi III Bibrecord Display N/A N/A Yes Yes III WebBridge Integration Yes Yes Yes Yes 86 iNFormAtioN tecHNoloGY ANd liBrAries | JuNe 2010 vastly increase the reach and visibility of their electronic resources in the wider community. References 1. Nicole Engard, ed., Library Mashups—Exploring New Ways to Deliver Library Data (Medford, N.J.: Information Today, 2009); Andrew Darby and Ron Gilmour, “Adding Delicious Data to Your Library Website,” Information Technology & Libraries 28, no. 2 (2009): 100–103. 2. Monica Brown-Sica, “Playing Tag in the Dark: Diagnosing Slowness in Library Response Time,” Information Technologies & Libraries 27, no. 4 (2008): 29–32. 3. Dapper, “Dapper Dynamic Ads,” http://www.dapper .net/ (accessed June 19, 2009); Yahoo!, “Pipes,” http://pipes .yahoo.com/pipes/ (accessed June 19, 2009). 4. Jennifer Bowen, “Metadata to Support Next-Genera- tion Library Resource Discovery: Lessons from the Extensible Catalog, Phase 1,” Information Technology & Libraries 27, no. 2 (2008): 6–19; John Blyberg, “ILS Customer Bill-of-Rights,” online posting, Blyberg.net, Nov. 20, 2005, http://www.blyberg .net/2005/11/20/ils-customer-bill-of-rights/ (accessed June 18, 2009). 5. Douglas Crockford, “The Application/JSON Media Type for JavaScript Object Notation (JSON),” memo, The Inter- net Society, July 2006, http://www.ietf.org/rfc/rfc4627.txt (accessed Mar. 30, 2010). 6. Google, “Who’s Using the Book Search APIs?” http:// code.google.com/apis/books/casestudies/ (accessed June 16, 2009). 7. Innovative Interfaces, “Millennium ILS,” http://www.iii .com/products/millennium_ils.shtml (accessed June 19, 2009). 8. Joint Information Systems Committee, “TicTOCs Jour- nal Tables of Contents Service,” http://www.tictocs.ac.uk/ (accessed June 18, 2009). 9. Mark Dahl, Kyle Banarjee, and Michael Spalti, Digital Libraries: Integrating Content and Systems (Oxford, United King- dom: Chandos, 2006). 10. John Ockerbloom et al., “DLF ILS Discovery Interface Task Group (ILS-DI) Technical Recommendation,” (Dec. 8, 2008), http://diglib.org/architectures/ilsdi/DLF_ILS_ Discovery_1.1.pdf (accessed June 18, 2009). 11. International Organization for Standardization, “Information and Documentation—Schema for Holdings Information,” http://www.iso.org/iso/catalogue_detail .htm?csnumber=39735 (accessed June 18, 2009) 12. National Information Standards Organization, “ANSI/ NISO Z39.50—Information Retrieval: Application Service Defi- nition and Protocol Specification,” (Bethesda, Md.: NISO Pr., 2003), http://www.loc.gov/z3950/agency/Z39-50-2003.pdf (accessed May 31, 2010). 13. Ross Singer and James Farrugia, “Unveiling Jangle: Untangling Library Resources and Exposing Them through the Atom Publishing Protocol,” The Code4Lib Journal no. 4 (Sept. 22, 2008), http://journal.code4lib.org/articles/109 (accessed Apr. 21, 2010); Roy Fielding, “Architectural Styles and the Design of Network-Based Software Architectures” (PhD diss., University of California, Irvine, 2000); J. C. Gregorio, ed., “The Atom Pub- lishing Protocol,” memo, The Internet Engineering Task Force, Oct. 2007, http://bitworking.org/projects/atom/rfc5023.html (accessed June 18, 2009). 14. World Wide Web Consortium, “Cross-Origin Resource Sharing: W3C Working Draft 17 March 2009,” http://www .w3.org/TR/access-control/ (accessed June 18, 2009). 15. OCLC Online Computer Library Center, “Worldcat and Cataloging Documentation,” http://www.oclc.org/support/ documentation/worldcat/default.htm (accessed June 18, 2009). 16. F. Curbera et al., “Unraveling the Web Services Web: An Introduction to SOAP, WSDL, and UDDI,” IEEE Internet Comput- ing 6, no. 2 (2002): 86–93. 17. OCLC Online Computer Library Center, “OCLC Web Services,” http://www.worldcat.org/devnet/wiki/Services (accessed June 18, 2009); International Federation of Library Asso- ciations and Institutions Study Group on the Functional Require- ments for Bibliographic Records, “Functional Requirements for Bibliographic Records : Final Report,” http://www.ifla.org/files/ cataloguing/frbr/frbr_2008.pdf (accessed Mar. 31, 2010). 18. Yahoo!, “The Yahoo! User Interface Library (YUI),” http://developer.yahoo.com/yui/ (accessed June 18, 2009); Dojo Foundation, “Dojo—The JavaScript Toolkit,” http://www .dojotoolkit.org/ (accessed June 18, 2009). 19. Google, “Gadgets.* API Developer’s Guide,” http://code. google.com/apis/gadgets/docs/dev_guide.html (accessed June 18, 2009). 20. Daniel Chudnov, “COinS for the Link Trail,” Library Jour- nal 131 (2006): 8–10. 21. LibraryThing, “LibraryThing,” http://www.librarything .com/widget.php (accessed June 19, 2009). 22. Robert Wallis, “Juice—JavaScript User Interface Compo- nentised Extensions,” http://code.google.com/p/juice-project/ (accessed June 18, 2009). 23. Jeffrey Wong and Jason Hong, “Making Mashups with Marmite: Towards End-User Programming for the Web” Confer- ence on Human Factors in Computing Systems, San Jose, California, April 28–May 3, 2007: Conference Proceedings, Volume 2 (New York: Association for Computing Machinery, 2007): 1435–44; Guiling Wang, Shaohua Yang, and Yanbo Han, “Mashroom: End-User Mashup Programming Using Nested Tables” (paper presented at the International World Wide Web Conference, Madrid, Spain, 2009): 861–70; Nan Zang, “Mashups for the Web-Active User” (paper presented at the IEEE Symposium on Visual Languages and Human-Centric Computing, Herrshing am Ammersee, Germany, 2008): 276–77. 3147 ---- weB services ANd widGets For liBrArY iNFormAtioN sYstems | HAN 87oN tHe clouds: A New wAY oF comPutiNG | HAN 87 shape cloud computing. For exam- ple, Sun’s well-known slogan “the network is the computer” was estab- lished in late 1980s. Salesforce.com has been providing on-demand Software as a Service (SaaS) for cus- tomers since 1999. IBM and Microsoft started to deliver Web services in the early 2000s. Microsoft’s Azure service provides an operating sys- tem and a set of developer tools and services. Google’s popular Google Docs software provides Web-based word-processing, spreadsheet, and presentation applications. Google App Engine allows system devel- opers to run their Python/Java applications on Google’s infrastruc- ture. Sun provides $1 per CPU hour. Amazon is well-known for provid- ing Web services such as EC2 and S3. Yahoo! announced that it would use the Apache Hadoop frame- work to allow users to work with thousands of nodes and petabytes (1 million gigabytes) of data. These examples demonstrate that cloud computing providers are offer- ing services on every level, from hardware (e.g., Amazon and Sun), to operating systems (e.g., Google and Microsoft), to software and ser- vice (e.g., Google, Microsoft, and Yahoo!). Cloud-computing provid- ers target a variety of end users, from software developers to the general public. For additional infor- mation regarding cloud computing models, the University of California (UC) Berkeley’s report provides a good comparison of these models by Amazon, Microsoft, and Google.4 As cloud computing providers lower prices and IT advancements remove technology barriers—such as virtualization and network band- width—cloud computing has moved into the mainstream.5 Gartner stated, “Organizations are switching from factors related to cloud computing: infinite computing resources avail- able on demand, removing the need to plan ahead; the removal of an up-front costly investment, allowing companies to start small and increase resources when needed; and a system that is pay-for-use on a short-term basis and releases customers when needed (e.g., CPU by hour, storage by day).2 National Institute of Standards and Technology (NIST) currently defines cloud computing as “a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g. network, servers, storage, appli- cations, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction.”3 As there are several definitions for “utility computing” and “cloud computing,” the author does not intend to suggest a better definition, but rather to list the characteristics of cloud computing. The term “cloud computing” means that ■■ customers do not own network resources, such as hardware, software, systems, or services; ■■ network resources are provided through remote data centers on a subscription basis; and ■■ network resources are delivered as services over the Web. This article discusses using cloud computing on an IT-infrastructure level, including building virtual server nodes and running a library’s essen- tial computer systems in remote data centers by paying a fee instead of run- ning them on-site. The article reviews current cloud computing services, presents the author’s experience, and discusses advantages and disadvan- tages of using the new approach. All kinds of clouds Major IT companies have spent bil- lions of dollars since the 1990s to On the Clouds: A New Way of Computing This article introduces cloud computing and discusses the author’s experience “on the clouds.” The author reviews cloud computing services and providers, then presents his experience of running mul- tiple systems (e.g., integrated library sys- tems, content management systems, and repository software). He evaluates costs, discusses advantages, and addresses some issues about cloud computing. Cloud com- puting fundamentally changes the ways institutions and companies manage their computing needs. Libraries can take advan- tage of cloud computing to start an IT project with low cost, to manage computing resources cost-effectively, and to explore new computing possibilities. S cholarly communication and new ways of teaching provide an opportunity for academic institutions to collaborate on pro- viding access to scholarly materials and research data. There is a grow- ing need to handle large amounts of data using computer algorithms that presents challenges to libraries with limited experience in handling nontextual materials. Because of the current economic crisis, aca- demic institutions need to find ways to acquire and manage computing resources in a cost-effective manner. One of the hottest topics in IT is cloud computing. Cloud computing is not new to many of us because we have been using some of its services, such as Google Docs, for years. In his latest book, The Big Switch: Rewiring the World, from Edison to Google, Carr argues that computing will go the way of electricity: purchase when needed, which he calls “utility computing.” His examples include Amazon’s EC2 (Elastic Computing Cloud), and S3 (Simple Storage) services.1 Amazon’s chief technol- ogy officer proposed the following Yan HanTutorial Yan Han (hany@u.library.arizona.edu) is Associate Librarian, University of Arizona Libraries, Tucson. 88 iNFormAtioN tecHNoloGY ANd liBrAries | JuNe 201088 iNFormAtioN tecHNoloGY ANd liBrAries | JuNe 2010 company-owner hardware and software to per-use service-based models.”6 For example, the U.S. gov- ernment website (http://www.usa .gov/) will soon begin using cloud computing.7 The New York Times used Amazon’s EC2 and S3 services as well as a Hadoop application to pro- vide open access to public domain articles from 1851 to 1922. The Times loaded 4 TB of raw TIFF images and their derivative 11 million PDFs into Amazon’s S3 in twenty-four hours at very reasonable cost.8 This project is very similar to digital library proj- ects run by academic libraries. OCLC announced its movement of library management services to the Web.9 It is clear that OCLC is going to deliver a Web-based integrated library sys- tem (ILS) to provide a new way of running an ILS. DuraSpace, a joint organization by Fedora Commons and DSpace Foundation, announced that they would be taking advan- tage of cloud storage and cloud computing.10 On the clouds Computing needs in academic librar- ies can be placed into two categories: user computing needs and library goals. User computing needs Academic libraries usually run hun- dreds of PCs for students and staff to fulfill their individual needs (e.g., Microsoft Office, browsers, and image-, audio-, and video-processing applications). Library goals A variety of library systems are used to achieve libraries’ goals to sup- port research, learning, and teaching. These systems include the following: ■■ Library website: The website may be built on simple HTML web- pages or a content management system such as Drupal, Joomla, or any home-grown PHP, Perl, ASP, or JSP system. ■■ ILS: This system provides tra- ditional core library work such as cataloging, acquisition, reporting, accounting, and user management. Typical systems include Innovative Interfaces, SirsiDynix, Voyager, and open- source software such as Koha. ■■ Repository system: This sys- tem provides submission and access to the institution’s digi- tal collections and scholarship. Typical systems include DSpace, Fedora, EPrints, ContentDm, and Greenstone. ■■ Other systems: for example, fed- erated search systems, learning object management systems, interlibrary loan (ILL) systems, and reference tracking systems. ■■ Public and private storage: staff file-sharing, digitization, and backup. Due to differences in end users and functionality, most systems do not use computing resources equally. For example, the ILS is input and output intensive and database query intensive, while repository systems require storage ranging from a few gigabytes to dozens of terabytes and substantial network bandwidth. Cloud computing brings a funda- mental shift in computing. It changes the way organizations acquire, configure, manage, and maintain computing resources to achieve their business goals. The availability of cloud computing providers allows organizations to focus on their busi- ness and leave general computing maintenance to the major IT compa- nies. In the fall of 2008, the author started to research cloud computing providers and how he could imple- ment cloud computing for some library systems to save staff and equipment costs. In January 2009, the author started his plan to build library systems “on the clouds.” The University of Arizona Libraries (UAL) has been a key player in the process of rebuilding higher education in Afghanistan since 2001. UAL Librarian Atifa Rawan and the author have received multiple grant contracts to build technical infra- structures for Afghanistan’s academic libraries. The technical infrastructure includes the following: ■■ Afghanistan ILS: a bilingual ILS based on the open-source system Koha.11 ■■ Afghanistan Digital Libraries website (http://www.afghan digitallibraries.org/): originally built on simple HTML pages, later rebuilt in 2008 using the con- tent management system Joomla. ■■ A digitization management sys- tem. The author has also developed a Japanese ILL system (http://gif project.libraryfinder.org) for the North American Coordinating Council on Japanese Library Resources. These systems had been running on UAL’s internal technical infrastructure. These systems run in a complex computing environment, require different modules, and do not use computing resources equally. For example, the Afghan ILS runs on Linux, Apache, MySQL, and Perl. Its OPAC and staff interface run on two different ports. The Afghanistan Digital Libraries website requires Linux, Apache, MySQL, and PHP. The Japanese ILL system was written in Java and runs on Tomcat. There are several reasons why the author moved these systems to the new cloud computing infrastructure: ■■ These systems need to be accessed in a system mode by people who are not UAL employees. ■■ System rebooting time can be substantial in this infrastructure because of server setup and IT policy. ■■ The current on-site server has weB services ANd widGets For liBrArY iNFormAtioN sYstems | HAN 89oN tHe clouds: A New wAY oF comPutiNG | HAN 89 reached its life expectancy and requires a replacement. By analyzing the complex needs of different systems and considering how to use resources more effec- tively, the author decided to run all the systems through one cloud computing provider. By comparing the features and the costs, Linode (http://www.linode.com/) was chosen because it provides full SSH and root access using virtualization, four data centers in geographically diverse areas, high availability and clustering support, and an option for month-to-month contracts. In addition, other customers have pro- vided positive reviews. In January 2009, the author purchased one node located in Fremont, California, for $19.95 per month. An imple- mentation plan (see appendix) was drafted to complete the project in phases. The author owns a virtual server and has access to everything that a physical server provides. In addition, the provider and the user community provided timely help and technical support. The migration of systems was straightforward: A Linux kernel (Debian 4.0) was installed within an hour, domain registration was com- plete and the domains went active in twenty-four hours, the Afghanistan Digital Libraries’ website (based on Joomla) migration was complete within a week, and all supporting tools and libraries (e.g., MySQL, Tomcat, and Java SDK) were installed and configured within a few days. A month later, the Afghanistan ILS (based on Koha) migration was com- pleted. The ILL system was also migrated without problem. Tests have been performed in all these systems to verify their usabil- ity. In summary, the migration of systems was very successful and did not encoun- ter any barriers. It addresses the issues facing us: After the migration, SSH log-ins for users who are not univer- sity employees were set up quickly; systems maintenance is managed by the author’s team, and rebooting now only takes about one minute; and there is no need to buy a new server and put it in a temperature and security controlled environment. The hardware is maintained by the provider. The administrative GUI for the Linux Nodes is shown in figure 1. Since migration, no downtime because of hardware or other failures caused by the provider has been observed. After migrating all the sys- tems successfully and running them in a reliable mode for a few months, the second phase was implemented (see appendix). Another Linux node (located in Atalanta, Georgia) was purchased for backup and moni- toring (see figure 2). Nagios, an open-source monitoring system, was tested and configured to identify and report problems for the above library systems. Nagios provides the follow- ing functions: (1) monitoring critical computing components, such as the network, systems, services, and serv- ers; (2) timely alerts delivered via e-mail or cell phone; and (3) report and record logs of outages, events, and alerts. A backup script is also run as a prescheduled job to back up the systems on a regular basis. Figure 1. Linux Node Administration Web interface Figure 2. Two Linux Nodes located in two remote data centers Node 1: 64.62.xxx.xxx (Fremont, CA) Node 2: 74.207.xxx.xxx (Atlanta, GA) Nagios Backup Afghan Digital Libraries Website Afghan ILS Interlibrary loan system DSpace 90 iNFormAtioN tecHNoloGY ANd liBrAries | JuNe 201090 iNFormAtioN tecHNoloGY ANd liBrAries | JuNe 2010 Findings and discussions Since January 2009, all the systems have been migrated and have been running without any issues caused by the provider. The author is very satisfied with the outcomes and cost. The annual cost of running two nodes is $480 per year, compared to at least $4,000 dollars if the hardware had been run in the library.12 From the author ’s experience, cloud computing provides the following advantages over the tradi- tional way of computing in academic institutions: ■■ Cost-effectiveness: From the above example and literature review, it is obvious that using cloud computing to run applications, systems, and IT infrastruc- ture saves staff and financial resources. UC Berkeley’s report and Zawodny’s blog provide a detailed analysis of costs for CPU hours and disk storage.13 ■■ Flexibility: Cloud computing allows organizations to start a project quickly without worrying about up-front costs. Computing resources such as disk storage, CPU, and RAM can be added when needed. In this case, the author started on a small scale by purchasing one node and added additional resources later. ■■ Data safety: Organizations are able to purchase storage in data centers located thousands of miles away, increasing data safety in case of natural disasters or other factors. This strategy is very difficult to achieve in a tra- ditional off-site backup. ■■ High availability: Cloud comput- ing providers such as Microsoft, Google, and Amazon have bet- ter resources to provide more up-time than almost any other organizations and companies do. ■■ The ability to handle large amounts of data: Cloud computing has a pay-for-use business model that allows academic institutions to analyze terabytes of data using distributed computing over hundreds of computers for a short-time cost. On-demand data storage, high availability and data safety are criti- cal features for academic libraries.14 However, readers should be aware of some technical and business issues: ■■ Availability of a service: In sev- eral widely reported cases, Amazon’s S3 and Google Gmail were inaccessible for a duration of several hours in 2008. The author believes that the com- mercial providers have better technical and financial resources to keep more up-time than most academic institutions. For those wanting no single point of fail- ure (e.g., a provider goes out of business), the author suggests storing duplicate data with a dif- ferent provider or locally. ■■ Data confidentiality: Most aca- demic libraries have open-access data. This issue can be solved by encrypting data before moving to the clouds. In addition, licens- ing terms can be negotiated with providers regarding data safety and confidentiality. ■■ Data transfer bottlenecks: Accessing the digital collections requires considerable network bandwidth, and digital collections are usually optimized for customer access. Moving huge amounts of data (e.g., preservation digital images, audios, videos, and data sets) to data centers can be scheduled during off hours (e.g., 1–5 a.m.), or data can be shipped on hard disks to the data centers. ■■ Legal jurisdiction: Legal jurisdic- tion creates complex issues for both providers and end users. For example, Canadian privacy laws regulate data privacy in public and private sectors. In 2008, the Office of the Privacy Commissioner of Canada released a finding that “outsourcing of canada .com email services to U.S.-based firm raises questions for subscrib- ers,” and expressed concerns about public sector privacy pro- tection.15 This brings concerns to both providers and end users, and it was suggested that privacy issues will be very challenging.16 Summary The author introduces cloud comput- ing services and providers, presents his experience of running multiple sys- tems such as ILS, content management systems, repository software, and the other system “on the clouds” since January 2009. Using cloud comput- ing brings significant cost savings and flexibility. However, readers should be aware of technical and business issues. The author is very satisfied with his experience of moving library systems to cloud computing. His experience demonstrates a new way of managing critical computing resources in an aca- demic library setting. The next steps include using cloud computing to meet digital collections’ storage needs. Cloud computing brings fun- damental changes to organizations managing their computing needs. As major organizations in library fields, such as OCLC, started to take advan- tage of cloud computing, the author believes that cloud computing will play an important role in library IT. Acknowledgments The author thanks USAID and Washington State University for pro- viding financial support. The author thanks Matthew Cleveland’s excel- lent work “on the clouds.” References 1. Nicholars Carr, The Big Switch: Rewiring the World, from Edison to Google weB services ANd widGets For liBrArY iNFormAtioN sYstems | HAN 91oN tHe clouds: A New wAY oF comPutiNG | HAN 91 (London: Norton, 2008). 2. Werner Vogels, “A Head in the Clouds—The Power of Infrastructure as a Service” (paper presented at the Cloud Computing and in Applications confer- ence (CCA ’08), Chicago, Oct. 22–23, 2008). 3. Peter Mell and Tim Grance, “Draft NIST Working Definition of Cloud Com- puting,” National Institute of Standards and Technology (May 11, 2009), http:// csrc.nist.gov/groups/SNS/cloud-com- puting/index.html (accessed July 22, 2009). 4. Michael Armbust et al., “Above the Clouds: A Berkeley View of Cloud Com- puting,” technical report, University of California, Berkeley, EECS Department, Feb. 10, 2009, http://www.eecs.berkeley .edu/Pubs/TechRpts/2009/EECS-2009- 28.html (accessed July 1, 2009). 5. Eric Hand, “Head in the Clouds: ‘Cloud Computing’ Is Being Pitched as a New Nirvana for Scientists Drowning in Data. But Can It Deliver?” Nature 449, no. 7165 (2007): 963; Geoffery Fowler and Ben Worthen, “The Internet Indus- try Is On a Cloud—Whatever That May Mean,” Wall Street Journal, Mar. 26, 2009, http://online.wsj.com/article/ SB123802623665542725.html (accessed July 14, 2009); Stephen Baker, “Google and the Wisdom of the Clouds,” Business Week (Dec. 14, 2007), http://www.msnbc .msn.com/id/22261846/ (accessed July 8, 2009). 6. Gartner, “Gartner Says Worldwide IT Spending on Pace to Supass $3.4 Tril- lion in 2008,” press release, Aug. 18, 2008, http://www.gartner.com/it/page .jsp?id=742913 (accessed July 7, 2009). 7. Wyatt Kash, “USA.gov, Gobierno USA.gov move into the Internet cloud,” Government Computer News, Feb. 23, 2009, http://gcn.com/articles/2009/02/23/ gsa-sites-to-move-to-the-cloud.aspx?s =gcndaily_240209 (accessed July 14, 2009). 8. Derek Gottfrid, “Self-Service, Prorated Super Computing Fun!” online posting, New York Times Open, Nov. 1, 2007, http://open.blogs .nytimes.com/2007/11/01/self-service -prorated-super-computing-fun/?scp =1&sq=self%20service%20prorated&st =cse (accessed July 8, 2009). 9. OCLC Online Computing Library Center, “OCLC announces strategy to move library management services to Web scale,” press release, Apr. 23, 2009, http://www.oclc.org/us/en/news/ releases/200927.htm (accessed July 5, 2009). 10. DuraSpace, “Fedora Commons and DSpace Foundation Join Together to Create DuraSpace Organization,” press release, May 12, 2009, http:// duraspace.org/documents/pressrelease .pdf (accessed July 8, 2009). 11. Yan Han and Atifa Rawan, “Afghanistan Digital Library Initiative: Revitalizing an Integrated Library Sys- tem,” Information Technology & Libraries 26, no. 4 (2007): 44–46. 12. Fowler and Worthen, “The Internet Industry Is on a Cloud.” 13. Jeremy Zawodney, “Replacing My Home Backup Server with Amazon’s S3,” online posting, Jeremy Zawod- ny’s Blog, Oct. 3, 2006, http://jeremy .zawodny.com/blog/archives/007624 .html (accessed June 19, 2009). 14. Yan Han, “An Integrated High Availability Computing Platform,” The Electronic Library 23, no. 6 (2005): 632–40. 15. Office of the Privacy Commissioner of Canada, “Tabling of Privacy Com- missioner of Canada’s 2005–06 Annual Report on the Privacy Act: Commissioner Expresses Concerns about Public Sector Privacy Protection,” press release, June 20, 2006, http://www.priv.gc.ca/media/ nr-c/2006/nr-c_060620_e.cfm (accessed July 14, 2009); Office of the Privacy Com- missioner of Canada, “Findings under the Personal Information Protection and Elec- tronic Documents Act (PIPEDA),” (Sept. 19, 2008), http://www.priv.gc.ca/cf -dc/2008/394_20080807_e.cfm (accessed July 14, 2009). 16. Stephen Baker, “Google and the Wisdom of the Clouds,” Business Week (Dec. 14, 2007), http://www.msnbc.msn .com/id/22261846/ (accessed July 8, 2009). Appendix. Project Plan: Building HA Linux Platform Using Cloud Computing Project Manager: Project Members: Object Statement: To build a High Availability (HA) Linux platform to support multiple systems using cloud computing in six months. Scope: The project members should identify cloud computing providers, evaluate the costs, and build a Linux platform for computer systems, including Afghan ILS, Afghanistan Digital Libraries website, repository system, Japanese inter- library loan website, and digitization management system. Resources: Project Deliverable: January 1, 2009—July 1, 2009 92 iNFormAtioN tecHNoloGY ANd liBrAries | JuNe 201092 iNFormAtioN tecHNoloGY ANd liBrAries | JuNe 2010 Phase I ■■ To build a stable and reliable Linux Platform to support multiple Web applications. The platform needs to consider reliability and high availability in a cost-effective manner ■■ To install needed libraries for the environment ■■ To migrate ILS (Koha) to this Linux platform ■■ To migrate Afghan Digital Libraries’ website (Joomla) to this platform ■■ To migrate Japanese interlibrary loan website ■■ To migrate Digitization Management system Phase II ■■ To research and implement a monitoring tool to monitor all Web applications as well as OS level tools (e.g. Tomcat, MySQL) ■■ To configure a cron job to run routine things (e.g., backup ) ■■ To research and implement storage (TB) for digitization and access Phase III ■■ To research and build Linux clustering Steps: 1. OS installation: Debian 4 2. Platform environment: Register DNS 3. Install Java 6, Tomcat 6, MySQL 5, etc. 4. Install source control env Git 5. Install statistics analysis tool (Google Analytics) 6. Install monitoring tool: Ganglia or Nagios 7. Web Applications 8. Joomla 9. Koha 10. Monitoring tool 11. Digitization management system 12. Repository system: Dspace, Fedora, etc. 13. HA tools/applications Note Calculation based on the following: ■■ leasing two nodes $20/month: $20 x 2 nodes x 12 months = $480/year ■■ A medium-priced server with backup with a life expectancy of 5 years ($5,000): $1,000/year ■■ 5 percent of system administrator time for managing the server ($60,000 annual salary): $3,000/year ■■ Ignore telecommunication cost, utility cost, and space cost. ■■ Ignore software developer’s time because it is equal for both options. Appendix. Project Plan: Building HA Linux Platform Using Cloud Computing (cont.) 3148 ---- From our reAders | edeN 93 Bradford Lee EdenFrom Our Readers The New User Environment: The End of Technical Services? Editor’s Note: “From Our Readers” is an occasional feature high- lighting ITAL readers’ letters and commentaries on timely issues. Technical Services: an obsolete term used to describe the largest component of most library staffs in the twentieth century. That component of the staff was entirely devoted to arcane and mysterious processes involved in selecting, acquiring, cataloging, pro- cessing, and otherwise making available to library users physical material containing information con- tent pieces (incops). The processes were compli- cated, expensive, and time-consuming, and generally served to severely limit direct service to users both by producing records that were difficult to under- stand and interpret, even by other library staff, and by consuming from 75–80 percent of the library’s financial and personnel resources. In the twenty-first century, the advent of new forms of publication and new techniques for providing universal records and universal access to information content made the organizational structure obsolete. That change in organizational structure, more than any other single factor, is generally credited as being responsible for the dramatic improvement in the quality of library service that has occurred in the first decade of the twenty-first century. T here are many who would say that I was the one who wrote this quotation. I didn’t, and it is, in fact, more than twenty-five years old!1 While I was beginning to research and prepare for this article, I began as most users today start their search for information: I started with Google. Granted, I rarely go beyond the first page of results (as most user surveys indicate), but the paucity of links made me click to the next screen. There, at number 16, was a scanned article. Jackpot! I thought as I started perusing the contents of this resource online, thinking to myself how the future had changed so dramatically since 1984, with the emergence of the Internet and the laptop, all of the new information formats, and the digitization of information. Ahh, the power of full text! After reading through the table of contents, introduction, and the first chapter, I noticed that some of the pages were missing. Mmmm, obviously some very shoddy scanning on the part of Google. But no, I finally realized that only part of this special issue was available on Google. Obviously, I missed the statement at the bottom of the front scan of the book: “This is a preview. The total pages displayed will be limited. Learn more.” And thus the issues regarding copy- right reared their ugly head. When discussing the new user environment, there are many demands facing libraries today. In a report by Martha Bates, citing the principle of least effort first attributed to philologist George Zipf and quoted in the Calhoun report to the Library of Congress, she states: People do not just use information that is easy to find; they even use information that they know to be of poor quality and less reliable—so long as it requires little effort to find—rather than using information they know to be of high quality and reliable, though harder to find . . . despite heroic efforts on the part of librarians, students seldom have sufficiently sustained exposure to and practice with library skills to reach the point where they feel real ease with and mastery of library information systems.2 According to the final report of Bibliographic Services Task Force of the University of California Libraries, users expect the following: ■■ one system or search to cover a wide information universe (e.g., Google or Amazon) ■■ enriched metadata (e.g., ONIX, tables of contents, and cover art) ■■ full-text availability ■■ to move easily and seamlessly from a citation about an item to the item itself—discovery alone is not enough ■■ systems to provide a lot of intelligent assistance ■❏ correction of obvious spelling errors ■❏ results sorting in order of relevance to their queries ■❏ help in navigating large retrievals through logi- cal subsetting or topical maps or hierarchies ■❏ help in selecting the best source through rel- evance ranking or added commentary from peers and experts or “others who used this also used that” tools ■❏ customization and personalization services ■■ authenticated single sign-on ■■ security and privacy ■■ communication and collaboration ■■ multiple formats available: e-books, MPEG, JPEG, RSS and other push technologies, along with tradi- tional, tangible formats ■■ direct links to e-mail, instant messaging, and sharing ■■ access to online virtual communities ■■ access to what the library has to offer without actu- ally having to visit the library3 Bradford lee eden (eden@library.ucsb.edu) is Associate Uni- versity Librarian for Technical Services & Scholarly Communica- tion, University of California, Santa Barbara. 94 iNFormAtioN tecHNoloGY ANd liBrAries | JuNe 2010 What is there in this new user environment for those who work in technical services? As indicated in the open- ing quote, would a dramatic improvement in library services occur if technical services were removed from the organizational structure? Even in 1983, the huge financial investment that libraries made in the organization and description of information, inventory, workflows, and personnel was recognized; today, that investment comes under intense scrutiny as libraries realize that we no longer have a monopoly on information access, and to survive we need to move forward more aggressively into the digital environment than ever before. As Marcum stated in her now-famous article, ■■ If the commonly available books and journals are accessible online, should we consider the search engines the primary means of access to them? ■■ Massive digitization radically changes the nature of local libraries. Does it make sense to devote local efforts to the cataloging of unique materials only rather than the regular books and journals? ■■ We have introduced our cataloging rules and the MARC format to libraries all over the world. How do we make massive changes without creating chaos? ■■ And finally, a more specific question: Should we proceed with AACR3 in light of a much-changed environment?4 There are larger internal issues to consider here as well. The budget situation in libraries requires the application of business models to workflows that have normally not been questioned nor challenged. Karen Calhoun discusses this topic in a number of her contribu- tions to the literature: When catalog librarians identify what they contribute to their communities with their methods (the catalog- ing rules, etc.) and with the product they provide (the catalog), they face the danger of “marketing myopia.” Marketing myopia is a term used in the business litera- ture to describe a nearsighted view that focuses on the products and services that a firm provides, rather than the needs those products and services are intended to address.5 For understanding the implementation issues associ- ated with the leadership strategy, it is important to be clear about what is meant by the “excess capacity” of catalogs. Most catalogers would deny there is excess capacity in today’s cataloging departments, and they are correct. Library materials continue to flood into acquisitions and cataloging departments and staff can barely keep up. Yet the key problem of today’s online catalog is the effect of declining demand. In healthy businesses, the demand for a product and the capacity to produce it are in balance. Research libraries invest huge sums in the infrastructure that produces their local catalogs, but search engines are students and scholars’ favorite place to begin a search. More users bypass catalogs for search engines, but research librar- ies’ investment in catalogs—and in the collections they describe—does not reflect the shift in user demand.6 I have discussed this exact problem in recent articles and technical reports as well.7 There have to be better, more efficient ways for libraries to organize and describe information not based on the status quo of redundant “localizing” of bibliographic records. A good analogy would be the current price of gas and the looming trans- portation crisis. For many years, Americans have had the luxury of being able to purchase just about any type of car, truck, SUV, Hummer, etc., that they wanted on the basis of their own preferences, personalities, and incomes, not on the size of the gas tank or on the mileage per gallon. Why not buy a Mercedes over a Kia? But with gas prices now well above the average person’s ability to consistently fill their gas tank without mortgaging their future, the market demands that people find alternative solutions in order to survive. This has meant moving away from the status quo of personal choice and selec- tion toward a more economic and sustainable model of informed fuel-efficiency transportation, so much so that public transportation is now inundated with more users than it can handle, and consumers have all but abandoned the truck and SUV markets. Libraries have long worked in the Mercedes arena, providing features such as authority control, subject classification, and redundant localizing of bibliographic records that were essential when libraries held the monopoly on informa- tion access but are no longer cost-efficient—nor even sane—strategies in the current information marketplace. Users are not accessing the OPAC anymore; well-known studies indicate that more than 80 percent of informa- tion seekers begin their search on a Web search engine. Libraries are investing huge resources in staffing and priorities fiddling with MARC bibliographic records in a time when they are struggling to survive and adapt from a monopoly environment to being just one of many players in the new information marketplace. Budgets are stagnant, staffing is at an all-time low, new information formats continue to appear and require attention, and users are no longer patient nor comfortable working with our clunky OPACs.8 Why do libraries continue to support an infrastructure of buying and offering the same books, CDs, DVDs, journals, etc., at every library, when the new information environment offers libraries the opportu- nity to showcase and present their unique information resources and one-of-a-kind collections to the world? Special collections materials held by every major research and public library in the world can now be digitized, and From our reAders | edeN 95 sparse library resources need to be adjusted to compete and offer these unique collections and their services to our users and the world. The October 2007 issue of Computers in Libraries is devoted solely to articles related to the enhancement, usability, appropriateness, and demise of the library OPAC. Interesting articles include “Fac-Back-OPAC: An Open Source Solution Interface to your Library System,” “Dreaming of a Better ILS,” “Plug Your Users into Library Resources with OpenSearch Plug-Ins,” Delivering What People Need, When and Where They Need It,” “The Birth of a New Generation of Library Interfaces,” and “Will the ILS Soon Be as Obsolete as the Card Catalog?” An especially interesting quote is given by Cervone, then assistant university librarian for information technology at Northwestern University: What I’d like to see is for the catalog to go away. To a great degree, it is an anachronism. What we need from the ILS is a solid, business-process back end that would facilitate the functions of the library that are truly unique such as circulation, acquiring materials, and “cataloging” at the item level for what amounts to inventory-control purposes. Most of the other tradi- tional ILS functions could be rolled over into a central- ized system, like OCLC, that would be cooperatively shared. The catalog itself should be treated as just another database in the world of resources we have access to. A single interface to those resources that would combine our local print holdings, electronic text (both journal and ebook), as well as multimedia material is what we should be demanding from our vendors.9 One book that needs to be required reading for all librarians, especially catalogers, is Weinberger ’s Everything Is Miscellaneous.10 He describes the three orders of order (self organization, metadata, and digi- tal); provides an extensive history of how Western civilization has ordered information, specifically the links to nineteenth-century Victorianism; and the con- cepts of lumping and splitting. In the end, Weinberger argues that the digital environment allows users to manipulate information into their own organization sys- tem, disregarding all previous organizational attempts by supposed experts using outdated and outmoded systems. In the digital disorder of information, an object (leaf) can now be placed on many shelves (branches), figuratively speaking, and this new shape of knowledge brings out four strategic principles: 1. Filter on the way out, not on the way in. 2. Put each leaf on as many branches as possible. 3. Everything is metadata and everything can be a label. 4. Give up control. It is this last principle that libraries have challenges with. Whether we agree with this principle or not, it has already happened. Arguing about it, ignoring it, or just continuing to do business as usual isn’t going to change the fact that information is user-controled and user- initiated in the digital environment. So, where do we go from here? The future of technical services (and its staff) Far be it from me to try to predict the future of libraries as viable, and more importantly marketable, information organizations in this new environment. One has only to examine the quotations from the first issues of Technical Services Quarterly to see what happens to predictions and opinions. Titles of some of the contributions (from 1983, mind you) are worthy of mention: “Library Automation in the Year 2000,” “Musings on the Future of the Catalog,” and “Libraries on the Line.” There are developments, however, that require reexamination and strategic brain- storming regarding the future of library bibliographic organization and description. The appearance of WorldCat Local will have a tre- mendous impact on the disappearance of proprietary vendor OPACs. There will no longer be a need for an integrated library system (ILS); with WorldCat Local, the majority of the world’s MARC bibliographic records are available in a Library 2.0 format. The only things miss- ing are some type of inventory and acquisitions module that can be formatted locally and a circulation module. If OCLC could focus their programming efforts on these two services and integrate them into WorldCat Local, library administrators and systems staff would no longer have to deal with proprietary and clunky OPACs (and their huge budgetary lines), but could use the power of Web 2.0 (and hopefully 3.0) tools and services to better position themselves in the new information marketplace. Another major development is the Google digitiza- tion project (and other associated ventures). While there are some concerns about quality and copyright,11 as well as issues related to the disappearance of print and the time involved to digitize all print,12 no one can deny the gradual and inevitable effect that mass digitization of print resources will have in the new information marketplace. Just the fact that my research explorations for this article brought up digitized portions of the 1983 Technical Services Quarterly articles is an example. More and more, published print information will be available in full-text online. What effect will this have on the physical collection that all libraries maintain, not only in terms of circulation, but also in terms of use of space, preservation, and collection devel- opment? No one knows for sure, but if the search strategies and information discovery patterns of our users are any 96 iNFormAtioN tecHNoloGY ANd liBrAries | JuNe 2010 indication, then we need to be strategically preparing and developing directions and options. Automatic metadata generation has been a topic of discussion for a number of years, and Jane Greenberg’s work at the University of North Carolina–Chapel Hill is one of the leading examples of research in this area.13 While there are still viable concerns about metadata generation without any type of human intervention, semiautomatic and even nonlibrary-facilitated metadata generation has been successful in a number of venues. As libraries grapple with decreased budgets, multi- plying formats, fewer staff to do the work, and more retraining and reprofessional development of existing staff, library administrators have to examine all options to maximize personnel as well as budgetary resources. Incorporating new technologies and tools for generat- ing metadata without human intervention into library workflows should be viewed as a viable option. User tagging would be included in this area. Even Intner, a long-time proponent of traditional technical services, has written that generating cataloging data automati- cally would be of great benefit to the profession, and that more tools and more programming ought to be focused toward this goal.14 So, with print workflows being replaced by digital and electronic workflows, how can administrators assist their technical services staff to remain viable in this new information environment? How can technical services staff not only help themselves but their supervisors and administrators to incorporate their unique talents, exper- tise, education, and experience toward the type of future scenarios indicated above? Competencies and challenges for technical services staff There are some good opinions available for assisting technical services staff with moving into the new environ- ment. Names have power, whether we like to admit it or not, and changing the name from “Technical Services” to something more understandable to our users, let alone our colleagues within the library, is one way to start. Names such as “Collections and Data Management Services” or “Reference Data Services” have been men- tioned.15 An interesting quote sums up the dilemma: It’s pretty clear that technical services departments have long been the ugly ducklings in the library pond, trumped by a quintet of swans: reference departments (the ones with answers for a grateful public); IT depart- ments (the magicians who keep the computers hum- ming); children’s and youth departments (the warm and fuzzy nurturers); other specialty departments (the experts in good reads, music, art, law, business, medicine, government documents, AV, rare books and manuscripts, you-name-it); and administrative groups (the big bosses). Part of the trouble is that the rest of our colleagues don’t really know what technical services librarians do. They only know that we do it behind closed doors and talk about it in language no one else understands. If it can’t be seen, can’t be understood, and can’t be discussed, maybe it’s all smoke and mirrors, lacking real substance. It’s easy to ignore.16 Ruschoff mentions competencies for technical ser- vices librarians in the new information environment: comfortable working in both print and digital worlds, specialized skills such as foreign languages and subject area expertise, comfortable working in both digital and Web-based technologies (suggesting more computing and technology skills), expertise in digital asset manage- ment, and problem-solving analytical skills.17 In a recent blog posting summarizing a presentation at the 2008 ALA Annual Conference on this topic, comparisons between catalogers going extinct or retooling are provided. The following is a summary of that post: converging trends ■■ More catalogers work at the support-staff level than as professional librarians. ■■ More cataloging records are selected by machines. ■■ More catalog records are being captured from pub- lisher data or other sources. ■■ More updating of catalog records is done via batch processes. ■■ Libraries continue to deemphasize processing of sec- ondary research products in favor of unique primary materials. what are our choices? ■■ Behind door number one—the extinction model. ■■ Behind door number two—the retooling model. How it’s done ■■ Extinction ■❏ Keep cranking about how nobody appreciates us. ■❏ Assert over and over that we’re already doing everything right—why should we change? ■❏ Adopt a “chicken little” approach to envision- ing the future. ■■ Retooling ■❏ Considers what catalogers already do. ■❏ Look for support. ■❏ Find a new job. what catalogers do ■■ Operate within the boundaries of detailed standards. ■■ Describe items one-at-a-time. ■■ Treat items as if they are intended to fit carefully From our reAders | edeN 97 within a specific application—the catalog. ■■ Ignore the rest of the world of information. what metadata librarians do ■■ Think about descriptive data without preconceptions around descriptive level, granularity, or descriptive vocabularies. ■■ Consider the entirety of the discovery and access issues around a set or collection of materials. ■■ Consider users and uses beyond an individual ser- vice when making design decisions—not necessarily predetermined. ■■ Leap tall buildings in a single bound. what new metadata librarians do ■■ Be aware of changing user needs. ■■ Understand the evolving information environment. ■■ Work collaboratively with technical staff. ■■ Be familiar with all metadata formats and encoding metadata. ■■ Seek out tall buildings—otherwise jumping skills will atrophy. the cataloger skill set ■■ AACR2, LC, etc. the metadata librarian skill set ■■ Views data as collections, sets, streams. ■■ Approaches the task as designing data to “play well with others.” characteristics of our new world ■■ No more ILS ■■ Bibliographic utilities are unlikely to be the central node for all data. ■■ Creation of metadata will become more decentralized. ■■ Nobody knows how this will all shake out, but meta- data librarians will be critical in forging solutions.18 While the above summary focuses on catalogers and their future, many of the directions also apply to any librarian or support staff member currently working in technical services. In a recent EDUCAUSE Review article, Brantley lists a number of mantras that all libraries need to repeat and keep in mind in this new information environment: ■■ Libraries must be available everywhere. ■■ Libraries must be designed to get better through use. ■■ Libraries must be portable. ■■ Libraries must know where they are. ■■ Libraries must tell stories. ■■ Libraries must help people learn. ■■ Libraries must be tools of change. ■■ Libraries must offer paths for exploration. ■■ Libraries must help forge memory. ■■ Libraries must speak for people. ■■ Libraries must study the art of war.19 You will have to read the article to find out about that last point. The above mantras illustrate that each of these issues must also be aligned with the work done by technical services departments in support of the rest of the library’s services. And there definitely isn’t one right way to move forward; each library with its unique blend of services and staff has to define, initiate, and engender dialogue on change and strategic direction, and then actively make decisions with integrity and vigor toward both its users and its staff. As Calhoun indicates, there are a number of challenges to feasibility for next steps in this area, some technically oriented but many based on our own organizational structures and strictures: ■■ Difficulty achieving consensus on standardized, sim- plified, more automated workflows. ■■ Unwillingness or inability to dispense with highly customized acquisitions and cataloging operations. ■■ Overcoming the “not invented here” mindset pre- venting ready acceptance of cataloging copy from other libraries or external sources. ■■ Resistance to simplifying cataloging. ■■ Inability to find and successfully collaborate with necessary partners (e.g., ILS vendors). ■■ Difficulty achieving basic levels of system interoper- ability. ■■ Slow development and implementation of necessary standards. ■■ Library-centric decision making; inability to base priorities on how users behave and what they want ■■ Limited availability of data to support management decisions. ■■ Inadequate skill set among library staff; unwilling- ness or inability to retrain. ■■ Resistance to change from faculty members, deans, or administrators.20 Moving forward in the new information world In a recent discussion on the Autocat electronic discus- sion list regarding the client-business paradigm now being impressed on library staff, an especially interesting quote puts the entire debate into perspective: The irony of this discussion is that our patrons/users/ clients [et al.] expect to be treated as well as business customers. They pay tuition or taxes to most of our institutions and expect to have a return in value. And a very large percentage of them care about the differ- ences between the government services vs. business 98 iNFormAtioN tecHNoloGY ANd liBrAries | JuNe 2010 arguments we present. What they know is that when they want something, they want it. More library powers-that-be now come from the world of business rather than libraries because of the pressure on the bottom line. Business administrators are viewed, even by those in public administration, as being more fiscally able than librarians. I would rec- ommend that we fuss less about titles and semantics and develop ways to show the value of libraries to the public.21 Wheeler, in a recent Educause Review article, docu- ments a number of “eras” that colleges and universities have gone through in recent history.22 First is the “Era of Publishing,” followed by the “Era of Participation” with the appearance of the Internet and its social networking tools. The next era, the “Era of Certitude,” is one in which users will want quick, timely answers to questions, along with some thought about the need and context of the question. Wheeler espouses five dimensions that tools of certitude must have: reach, response, results, resources, and rights. He explains these dimensions in regards to var- ious tools and services that libraries can provide through human–human, human–machine, and machine–machine interaction.23 Wheeler sees extensive rethinking and reengineering by libraries, campuses, and information technology to assist users to meet their information needs. Are there ways that technical services staff can assist in these efforts? Although somewhat dated, Calhoun’s extensive article on what is needed from catalogers and librarians in the twenty-first century expounds a number of salient points.24 In table 1, she illustrates some of the many challenges fac- ing traditional library cataloging, providing her opinion on what the challenges are, why they exist, and some solutions for survivability and adaptability in the new marketplace.25 One quote in particular deserves attention: At the very least, adapting successfully to current demands will require new competencies for librarians, and I have made the case elsewhere that librarians must move beyond basic computer literacy to “IT flu- ency”—that is, an understanding of the concepts of information technology, especially applying problem solving and critical thinking skills to using informa- tion technology. Raising the bar of IT fluency will be even more critical for metadata specialists, as they shift away from a focus on metadata production to approaches based on IT tools and techniques on the one hand, and on consulting and teamwork on the other. As a result of the increasing need for IT fluency among metadata specialists, they may become more closely allied with technical support groups in campus computing centers. The chief challenges for metadata spe- cialists will be getting out of library back rooms, becoming familiar with the larger world of university knowledge communities, and developing primary contacts with the appropriate domain experts and IT specialists.26 Getting out of the back room and interacting with users seems to be one of the dominant themes of evolv- ing technical services positions to fit the new information marketplace. Putting Web 2.0 tools and services into the library OPAC has also gained some momentum since the launch of the Endeca-based OPAC at North Carolina State University. As some people have stated, however, putting “lipstick on a pig” doesn’t change the fundamen- tal problems and poor usability of something that never worked well in the first place.27 In their recent article, Jia Mi and Cathy Weng tried to answer the following questions: Why is the current OPAC ineffective? What can libraries and librarians do to deliver an OPAC that is as good as search engines to better serve our users?28 Of course, the authors are biased toward the OPAC and wish to make it better, given that the last sentence in their abstract is, “Revitalizing the OPAC is one of the press- ing issues that has to be accomplished.” Users’ search patterns have already moved away from the OPAC as a discovery tool; why should personnel and resource investment continue to be allocated toward something that users have turned away from? In their recommenda- tions, Mi and Weng indicate that system limitations, not fully exploiting the functionality already made available by ILSs, and the unsuitability of MARC standards to online bibliographic display are the primary factors to the ineffectiveness of library OPACs. Exactly. Debate and discussion on Autocat after the publication of their article again shows the line drawn between conservative opin- ions (added value, noncommercialization, and overall ideals of the library profession and professional cata- loging workflows) and the newer push for open-source models, junking the OPAC, and learning and working with non-MARC metadata standards and tools. Conclusion From an administrative point of view, there are a number of viable options for making technical services as efficient as possible, in its current emanation: ■■ Conduct a process review of all current workflows, following each type of format from receipt at loading dock to access by user. Revise and redesign work- flows for efficiency. ■■ Eliminate all backlogs, incorporating and standardiz- ing various types of bibliographic organization (from brief records to full records, using established criteria of importance and access). ■■ As much as possible, contract with vendors to make From our reAders | edeN 99 all print materials shelf-ready, establishing and moni- toring profiles for quality and accuracy. Establish a rate of error that is amenable to technical services staff; once that error rate is met, review incoming print materials only once or twice a year. ■■ Assure technical services staff that their skills, expe- rience, and attention to detail are needed in the electronic environment, and provide training and professional development to assist them in scan- ning and digitizing unique collections, learning non-MARC metadata standards, improving project management, and performing consultation training to interact with faculty and students who work with data sets, metadata, and research planning. Support and actively work for revised job reclassification of library support staff positions. Most libraries are forced to work with fewer staff, and it is essential that current personnel are valued for their institutional knowledge and skill sets (knowledge man- agement philosophy). Library administrations need to emphasize to their staff that the organization has a vested interest in providing them with the tools and training they need to assist the organization in the new informa- tion marketplace. The status quo of technical services operations is no longer viable or cost-effective; all of us must look at ways to regain market share and restruc- ture our organizations to collaborate and consult with users regarding their information and research needs. No longer is it enough to just provide access to information; we must also provide tools and assistance to the user in manipulating that information. To end, I would like to quote from a few of the articles from that 1983 issue of Technical Services Quarterly I have alluded to throughout this chapter: Like all prognostications, predictions about cataloging in a fully automated library may bear little resem- blance to the ultimate reality. While the future cata- loging scenario discussed here may seem reasonable now, it could prove embarrassing to read 10–20 years hence. Still, I would be pleasantly surprised if, by the year 2000, TS operations are not fully integrated, TS staff has not been greatly reduced, there has not been a large-scale jump in TS productivity accompanied by a dramatic decline in TS costs, and if most of us are not cooperating through a national database.29 In conclusion, I will revert to my first subject, the uncertain nature of predictions. In addition to the fear- less predictions already recorded, I predict that some of these predictions will come true and perhaps even most of them. Some of them will come true, but not in the time anticipated, while others never will. Let us hope that the influences not guessed that will prevent the actualization of some of these predictions will be happy ones, not dire. However they turn out, I predict that in ten years no one will remember or really care what these predictions were.30 Technical services as we know them now may well not exist by the end of the century. The aims of technical services will exist for as long as there are libraries. The Technical Services Quarterly may well have changed its name and its coverage long before then, but its con- cerns will remain real and the work to which many of us devote our lives will remain worthwhile. There can be few things in life that are as worth doing as enabling libraries to fulfill their unique and uniquely important role in culture and civilization.31 Twenty-five years have come and gone; some of the predictions in this first issue of Technical Services Quarterly came true, many of them did not. There have been dra- matic changes in those twenty-five years, most of which were unforeseen, as they always are. What is a certainty is that libraries can no longer sustain or maintain the status quo in technical services. What also is a certainty is that technical services staff, with their unique skills, talents, abilities, and knowledge in relation to the organization and description of information, are desperately needed in the new information environment. It is the responsibil- ity of both library administrators and technical services staff to work together to evolve and redesign workflows, standards, procedures, and even themselves to survive and succeed into the future. References 1. Norman D. Stevens, “Selections from a Dictionary of Libinfosci Terms,” in “Beyond ‘1984’: The Future of Technical Services,” special issue, Technical Services Quarterly 1, no. 1–2 (Fall/Winter 1983): 260. 2. Marcia J. Bates, “Improving User Access to Library Catalog and Portal Information: Final Report,” (paper pre- sented at the Library of Congress Bicentennial Conference on Bibliographic Control for the New Millennium, June 1, 2003): 4, http://www.loc.gov/catdir/bibcontrol/2.3BatesReport6-03 .doc.pdf (accessed Apr. 7, 2009). See also Karen Calhoun, “The Changing Nature of the Catalog and Its Integration with Other Discovery Tools,” final report to the Library of Congress, Mar. 17, 2006, 25, http://www.loc.gov/catdir/calhoun-report-final .pdf (accessed Apr. 7, 2009). 3. University of California Libraries Bibliographic Services Task Force, “Rethinking How We Provide Bibliographic Ser- vices for the University of California,” final report, Dec. 2005, 8, http://libraries.universityofcalifornia.edu/sopag/BSTF/Final. pdf (accessed Apr. 7, 2009). 4. Deanna B. Marcum, “The Future of Cataloging,” Library Resources & Technical Services 50, no. 1 (Jan. 2006): 9, http://www .loc.gov/library/reports/CatalogingSpeech.pdf (accessed Apr. 100 iNFormAtioN tecHNoloGY ANd liBrAries | JuNe 2010 7, 2009). 5. Karen Calhoun, “Being a Librarian: Metadata and Meta- data Specialists in the Twenty-First Century,” Library hi tech 25, no. 2 (2007), http://www.emeraldinsight.com/Insight/View ContentServlet?Filename=Published/EmeraldFullTextArticle/ Articles/2380250202.html (accessed Apr. 7, 2009). 6. Calhoun, “The Changing Nature of the Catalog,” 15. 7. Bradford Lee Eden, “Ending the Status Quo,” American Libraries 39, no. 3 (Mar. 2008): 38; Eden, introduction to “Infor- mation Organization Future for Libraries,” Library Technology Reports 44, no. 8 (Nov./Dec. 2007): 5–7. 8. See Karen Schneider’s “How OPACs Suck” series on the ALA TechSource blog, http://www.techsource.ala.org/ blog/2006/03/how-opacs-suck-part-1-relevance-rank-or-the -lack-of-it.html, http://www.techsource.ala.org/blog/2006/04/ how-opacs-suck-part-2-the-checklist-of-shame.html, and http:// www.techsource.ala.org/blog/2006/05/how-opacs-suck-part- 3-the-big-picture.html (accessed Apr. 7, 2009). 9. H. Frank Cervone, quoted in Ellen Bahr, “Dreaming of a Better ILS,” Computers in Libraries 27, no. 9 (Oct. 2007): 14. 10. David Weinberger, Everything Is Miscellaneous: The Power of the New Digital Disorder (New York: Times, 2007). 11. For a list of these concerns, see Robert Darnton, “The Library in the New Age,” The New York Review of Books 55, no. 10 (June 12, 2008), http://www.nybooks.com/articles/21514 (accessed Apr. 7, 2009). 12. See Calhoun, “The Changing Nature of the Catalog,” 27. 13. See the Metadata Research Center, “Automatic Metadata Generation Applications (AMeGA),” http://ils.unc.edu/mrc/ amega (accessed, Apr. 7, 2009). 14. Sheila S. Intner, “Generating Cataloging Data Automati- cally,” Technicalities 28, no. 2 (Mar./Apr. 2008): 1, 15–16. 15. Sheila S. Intner, “A Technical Services Makeover,” Techni- calities 27, no. 5 (Sept./Oct. 2007): 1, 14–15. 16. Ibid, 14 (emphasis added). 17. Carlen Ruschoff, “Competencies for 21st Century Techni- cal Services,” Technicalities 27, no. 6 (Nov./Dec. 2007): 1, 14–16. 18. Diane Hillman, “A Has-Been Cataloger Looks at What Cataloging Will Be,” online posting, Metadata Blog, July 1, 2008, http://blogs.ala.org/nrmig.php?title=creating_the_future_of_ the_catalog_aamp_&more=1&c=1&tb=1&pb=1 (accessed Apr. 7, 2009). 19. Peter Brantley, “Architectures for Collaboration: Roles and Expectations for Digital Libraries,” Educause Review 43, no. 2 (Mar./Apr. 2008): 31–38. 20. Calhoun, “The Changing Nature of the Catalog,” 13. 21. Brian Briscoe, “That Business/Customer Stuff (Was: Let- ter to AL),” online posting, Autocat, May 30, 2008. 22. Brad Wheeler, “In Search of Certitude,” Educause Review 43, no. 3 (May/June 2008): 15–34. 23. Ibid., 22. 24. Karen Calhoun, “Being a Librarian.” 25. Ibid. 26. Ibid. (emphasis added). 27. Andrew Pace, quoted in Roy Tennant, “Digitl Librar- ies: ‘Lipstick on a Pig,’” Library Journal, Apr. 15, 2005, http:// www.libraryjournal.com/article/CA516027.html (accessed Apr. 7, 2009). 28. Jia Mi and Cathy Weng, “Revitalizing the Library OPAC: Interface, Searching, and Display Challenges,” Information Tech- nology & Libraries 27, no. 1 (Mar. 2008): 5–22. 29. Gregor A. Preston, “How Will Automation Affect Cata- loging Staff?” in “Beyond ‘1984’: The Future of Technical Ser- vices,” special issue, Technical Services Quarterly 1, no. 1–2 (Fall/ Winter 1983): 134. 30. David C. Taylor, “The Library Future: Computers,” in “Beyond ‘1984’: The Future of Technical Services,” special issue, Technical Services Quarterly 1, no. 1–2 (Fall/Winter 1983): 92–93. 31. Michael Gorman, “Technical Services, 1984–2001 (and before),” in “Beyond ‘1984’: The Future of Technical Services,” special issue, Technical Services Quarterly 1, no. 1–2 (Fall/Winter 1983): 71. LITA cover 2, cover 3 Neal-Schuman cover 4 Index to Advertisers 3150 ---- eDitORiAl | tRuitt 3 Marc Truitt Marc truitt (marc.truitt@ualberta.ca) is Associate University Librarian, Bibliographic and information Technology Services, University of Alberta Libraries, Edmonton, Alberta, Canada, and Editor of ITAL. Marc Truitt Editorial: And Now for Something (Completely) Different T he issue of ITAL you hold in your hands—be that issue physical or virtual; we won’t even go into the question of your hands!—represents something new for us. For a number of years, Ex Libris (and previ- ously, Endeavor Information Systems) has generously sponsored the LITA/Ex Libris (née LITA/Endeavor) Student Writing Award competition. The competition seeks manuscript submissions from enrolled LIS students in the areas of ITAL’s publishing interests; a LITA committee on which the editor of ITAL serves as an ex-officio member evaluates the entries and names a winner. Traditionally, the winning essay has appeared in the pages of ITAL. In recent years, perhaps mirroring the waning interest in publication in traditional peer- reviewed venues, the number of entrants in the competi- tion has declined. In 2008, for instance, there were but nine submissions, and to get those, we had to extend the deadline six weeks from the end of February to mid- April. In previous years, as I understand it, there often were even fewer. This year, without moving the goalposts, we had— hold onto your hats!—twenty-seven entries. Of these, the review committee identified six finalists for discussion. The turnout was so good, in fact, that with the agreement of the committee, we at ITAL proposed to publish not only the winning paper but the other finalist entries as well. We hope that you will find them as stimulating as have we. Even more importantly, we hope that by pub- lishing such a large group of papers representing 2009’s best in technology-focused LIS work, we will encourage similarly large numbers of quality submissions in the years to come. I would like to offer sincere thanks to my University of Alberta colleague Sandra Shores, who as guest editor for this issue worked tirelessly over the past few months to shepherd quality student papers into substantial and interesting contributions to the literature. She and Managing Editor Judith Carter—who guest-edited our recent Discovery Issue—have both done fabulous jobs with their respective ITAL special issues. Bravo! n Ex Libris’ sponsorship In one of those ironic twists that one more customarily associates with movie plots than with real life, the LITA/Ex Libris Student Writing Award recently almost lost its spon- sor. At very nearly the same time that Sandra was complet- ing the preparation of the manuscripts for submission to ALA Production Services (where they are copyedited and typeset), we learned that Ex Libris had notified LITA that it had “decided to cease sponsoring” the Student Writing Award. A brief round of e-mails among principals at LITA, Ex Libris, and ITAL ensued, with the outcome being that Carl Grant, president of Ex Libris North America, gra- ciously agreed to continue sponsorship for another year and reevaluate underwriting the award for the future. We at ITAL and I personally are grateful. Carl’s message about the sponsorship raises some interesting issues on which I think we should reflect. His first point goes like this: It simply is not realistic for libraries to continue to believe that vendors have cash to fund these things at the same levels when libraries don’t have cash to buy things (or want to delay purchases or buy the product for greatly reduced amounts) from those same vendors. Please understand the two are tied together. Point taken and conceded. Money is tight. Carl’s argu- ment, I think, speaks as well to a larger, implied question. Libraries and library vendors share highly synergistic and, in recent years, increasingly antagonistic relation- ships. Library vendors—and I think library system ven- dors in particular—come in for much vitriol and precious little appreciation from those of us on the customer side. We all think they charge too much (and by implication, must also make too much), that their support and service are frequently unresponsive to our needs, and that their systems are overly large, cumbersome, and usually don’t do things the way we want them done. At the same time, we forget that they are catering to the needs and whims of a small, highly specialized market that is characterized by numerous demands, a high degree of complexity, and whose members—“standards” notwithstanding—rarely perform the same task the same way across institutions. We expect very individualized service and support, but at the same time are penny-pinching misers in our ability and willingness to pay for these services. We are beggars, yet we insist on our right to be choosers. Finally, at least for those of us of a certain generation—and yep, I count myself among its members—we chose librarianship for very specific reasons, which often means we are more than a little uneasy with concepts of “profit” and “bottom line” as applied to our world. We fail to understand the open-source dictum that “free as in kittens and not as in beer” means that we will have to pay someone for these services—it’s only a question of whom we will pay. Carl continues, making another point: I do appreciate that you’re trying to provide us more recognition as part of this. Frankly, that was another consideration in our thought of dropping it—we just didn’t feel like we were getting much for it. Marc truitt (marc.truitt@ualberta.ca) is Associate University Librarian, Bibliographic and information Technology Services, University of Alberta Libraries, Edmonton, Alberta, Canada, and Editor of ITAL. 4 iNFORMAtiON tecHNOlOGY AND liBRARies | MARcH 2010 I’ve said before and I’ll say again, I’ve never, in all my years in this business had a single librarian say to me that because we sponsored this or that, it was even a consideration in their decision to buy something from us. Not once, ever. Companies like ours live on sales and service income. I want to encourage you to help make librarians aware that if they do appreciate when we do these things, it sure would be nice if they’d let us know in some real tangible ways that show that is true. . . . Good will does not pay bills or salaries unless that good will translates into purchases of products and services (and please note, I’m not just speaking for Ex Libris here, I’m saying this for all vendors). And here is where Carl’s and my views may begin to diverge. Let’s start by drawing a distinction between vendor tchotchkes and vendor sponsorship. In fairness, Carl didn’t say anything about tchotchkes, so why am I? I do so because I think that we need to bear in mind that there are multiple ways vendors seek to advertise themselves and their services to us, and geegaws are one such. Trinkets are nice—I have yet to find a better gel pen than the ones given out at IUG 14 (would that I could get more!)—but other than reminding me of a vendor’s name, they serve little useful purpose. The latter, vendor sponsorship, is something very different, very special, and not readily totaled on the bottom line. Carl is quite right that sponsorship of the Student Writing Award will not in and of itself cause me to buy Aleph, Primo, or SFX (Oh right, I have that last one already!). These are products whose purchase is the result of lengthy and complex reviews that include highly detailed and painstaking needs analysis, specifications, RFPs, site visits, demonstrations, and so on. Due diligence to our parent institutions and obligations to our users require that we search for a balance among best-of-breed solutions, top-notch support, and fair pricing. Those things aren’t related to sponsorship. What is related to sponsorship, though, is a sense of shared values and interests. Of “doing the right thing.” I may or may not buy Carl’s products because of the con- siderations above (and yes, Ex Libris fields very strong contenders in all areas of library automation); I definitely will, though, be more likely to think favorably of Ex Libris as a company that has similar—though not necessarily identical—values to mine, if it is obvious that it encour- ages and materially supports professional activities that I think are important. Support for professional growth and scholarly publication in our field are two such values. I’m sure we can all name examples of this sort of behavior: In addition to support of the Student Writing Award, Ex Libris’ long-standing prominence in the National Information Standards Organization (NISO) comes to mind. So too does the founding and ongoing support by Innovative Interfaces and the library consulting firm R2 for the Taiga Forum (http://www.taigaforum.org/), a group of academic associate university librarians. To the degree that I believe Ex Libris or another firm shares my values by supporting such activities—that it “does the right thing”—I will be just a bit more inclined to think positively of it when I’m casting about for solutions to a technology or other need faced by my institution. I will think of that firm as kin, if you will. With that, I will end this by again thanking Carl and Ex Libris—because we don’t say thank you often enough!—for their generous support of the LITA/Ex Libris Student Writing Award. I hope that it will continue for a long time to come. That support is something about which I do care deeply. If you feel similarly—be it about the Student Writing Award, NISO, Taiga, or whatever—I urge you to say so by sending an appropriate e-mail to your vendor’s representative or by simply saying thanks in person to the company’s head honcho on the ALA exhibit floor. And the next time you are neck-deep in seemingly identical vendor quotations and need a way to figure out how to decide between them, remember the importance of shared values. n Dan Marmion Longtime LITA members and ITAL readers in particu- lar will recognize the name of Dan Marmion, editor of this journal from 1999 through 2004. Many current and recent members of the ITAL editorial board—including Managing Editor Judith Carter, Webmaster Andy Boze, Board member Mark Dehmlow, and I—can trace our involvement with ITAL to Dan’s enthusiastic period of stewardship as editor. In addition to his leadership of ITAL, Dan has been a mentor, colleague, boss, and friend. His service philoso- phy is best summarized in the words of a simple epigram that for many years has graced the wall behind the desk in his office: “it’s all about access!!” Because of health issues, and in order to devote more time to his wife Diana, daughter Jennifer, and grand- daughter Madelyn, Dan recently decided to retire from his position as Associate Director for Information Systems and Digital Access at the University of Notre Dame Hesburgh Libraries. He also will pursue his personal interests, which include organizing and listening to his extensive collection of jazz recordings, listening to books on CD, and following the exploits of his favorite sports teams, the football Irish of Notre Dame, the Indianapolis Colts, and the New York Yankees. We want to express our deep gratitude for all he has given to the profession, to LITA, to ITAL, and to each of us personally over many years. We wish him all the best as he embarks on this new phase of his life. 3151 ---- A PARtNeRsHiP FOR cReAtiNG successFul PARtNeRsHiPs | GRANt 5 Ex Libris Column Carl Grant carl Grant is [tK] Ex Libris Column Carl Grant A Partnership for Creating Successful Partnerships Carl Grant W hen Marc asked me to write this column I eagerly accepted because I feel strongly about libraries leveraging their role to their greater advantage in the rapidly changing information land- scape. I see sponsorships and partnerships as an impor- tant tool for doing that. However, as noted in Marc’s column in this issue, we’d been having a discussion about the continuing involvement of Ex Libris in the LITA/Ex Libris Student Writing Award. Like many of you, we at Ex Libris are trying to keep our costs low in this chal- lenging economic environment so that we can in turn keep your costs low. Thus we are closely evaluating all expenditures to ensure their cost is justified by the value they return to our organization. I won’t repeat the discus- sion already outlined by Marc above, but will just note with great pleasure his willingness to not only listen to my concerns, but to try and address them. His invitation to write this column was part of that response, a chance for me to share my thoughts and concerns with you about sponsorships and partnerships and where they need to go in the future. To do that, I’d like to expand on some of the concepts Marc and I were discussing and talk about how to make sponsorships and partnerships successful. I want to look at what successful ones consist of as well as what types are needed in our profession tomorrow. n The elements of successful sponsorships and partnerships For a sponsorship or partnership to be successful in today’s environment, it should offer at least the following components: 1. Clear and shared goals. Agreeing what is to be achieved via the sponsorship or partnership is essential. Furthermore, it should be readily appar- ent that the goals are achievable. This will happen through joint planning and execution of an agreed- upon project plan that results in that achievement. It is up to each partner to ensure that they have the resources to execute that project plan on schedule and on budget. As there will always be unplanned events and issues, there must also be ongoing, open communications throughout the life of the sponsorship or partnership. This way, surprises are avoided and issues can be dealt with before they become problems. 2. Risks and rewards must be real and shared. Members of a sponsorship or partnership should share risks and rewards in proportion to the role they hold. Furthermore, the rewards must be seen to be real rewards to all the members. Step into the other members’ shoes and look at what you’re offering. Does it clearly bring value to the other organizations in the arrangement? If so, how? If not, what can be done to address that disparity? Sponsorships and partnerships should not take advantage of any one sponsor or partner by allo- cating risks or rewards disproportionately to their contributions. Rewards realized by members of the sponsorship or partnership should be proportion- ally shared by all the members. 3. Defined time. A sponsorship or partnership is for a defined amount of time and should not be assumed to be ongoing. Regular reviews of how well the sponsorship or partnership is working for the partners must be conducted and decisions made on the basis of those results. It might be that the landscape is changing and the benefits are no longer as meaningful, or there are alternatives now available that provide better benefits for on of the members. Maintaining a sponsorship or partner- ship past its useful life will only result in the disin- tegration of the overall relationship. 4. Write it down. Organizations merge, are acquired and sold, people change jobs, and people change responsibilities. Any sponsorship or partnership should have a written agreement outlining the ele- ments above. Once finalized, it should be signed by an appropriate person representing each member organization. That way, when things do change, there is a reference point and the arrangement is more likely to survive any of these precipitous events. n The sponsorships and partnerships needed for tomorrow Successful sponsorships and partnerships are a necessary part our landscape today. The world of information and knowledge has become too large, exists in too many silos, and is far too complex. “Competition, collaboration, and cooperation” defines the only path possible for navigating the landscape successfully. As the president of a company in the library automation marketplace, I continue to seek out opportunities that uniquely position our company to effectively maintain success in the marketplace and to provide value for our customers and thus our company. I believe libraries need to seek the same opportunities for their organizations. carl Grant (carl.grant@exlibrisgroup.com) is President of Ex Libris North America, des Plaines, illinois. Continued on page 7 eDitORiAl BOARD tHOuGHts | sHORes 7 Looking ahead, it seems clear that the pace of change in today’s environment will only continue to accelerate; thus the need for us to quickly form and dissolve key sponsorships and partnerships that will result in the suc- cessful fostering and implementation of new ideas, the currency of a vibrant profession. The next challenge is to realize that many of the key sponsorship and partnerships that need to be formed are not just with traditional organizations in this profession. Tomorrow’s sponsorships and partnership will be with those organizations that will benefit from the expertise of libraries and their suppliers while in return helping to develop or provide the new funding opportunities and means and places for disseminating access to their expertise and resources. Likely organizations would be those in the fields of education, publishing, content cre- ation and management, and social and community Web- based software. To summarize, we at Ex Libris believe in sponsor- ships and partnerships. We believe they’re important and should be used in advancing our profession and organizations. From long experience we also have learned there are right ways and wrong ways to implement these tools, and I’ve shared thoughts on how to make them work for all the parties involved. Again, I thank Marc for his receptiveness to this discussion and my even deeper appreciation for trying to address the issues. It’s serves as an excellent example of what I discussed above. People forget, but paper, the scroll, the codex, and later the book were all major technological leaps, not to mention the printing press and moveable type. . . . There is so much potential for using technology to equalize access to information, regardless of how much money you have, what language you speak, or where you live. Big ideas, enthusiasm, and hope for the profession, in addition to practical technology-focused information await the reader. Enjoy the issue, and congratulations to the winner and all the finalists! Note 1. All quotations are taken with permission from private e-mail correspondence. A Partnership for Creating Successful Partnerships continued from page 5 3152 ---- 6 iNFORMAtiON tecHNOlOGY AND liBRARies | MARcH 2010 sandra shores is [tk] Sandra Shores Editorial Board Thoughts: Issue Introduction to Student Essays T he papers in this special issue, although covering diverse topics, have in common their authorship by people currently or recently engaged in gradu- ate library studies. It has been many years since I was a library science student—twenty-five in fact. I remember remarking to a future colleague at the time that I found the interview for my first professional job easy, not because the interviewers failed to ask challenging questions, but because I had just graduated. I was passionate about my chosen profession, and my mind was filled from my time at library school with big ideas and the latest theories, techniques, and knowledge of our discipline. While I could enthusiastically respond to anything the interviewers asked, my colleague remarked she had been in her job so long that she felt she had lost her sense of the big questions. The busyness of her daily work life drew her focus away from contemplation of our purpose, principles, and values as librarians. I now feel at a similar point in my career as this colleague did twenty-five years ago, and for that reason I have been delighted to work with these student authors to help see their papers through to publication. The six papers represent the strongest work from a wide selection that students submitted to the LITA/ Ex Libris Student Writing Award competition. This year’s winner is Michael Silver, who looks for- ward to graduating in the spring from the MLIS program at the University of Alberta. Silver entered the program with a strong library technology foundation, having pro- vided IT services to a regional library system for about ten years. He notes that “the ‘accidental systems librarian’ position is probably the norm in many small and medium sized libraries. As a result, there are a number of practices that libraries should adopt from the IT world that many library staff have never been exposed to.”1 His paper, which details the implementation of an open-source mon- itoring system to ensure the availability of library systems and services, is a fine example of the blending of best practices from two professions. Indeed, many of us who work in IT in libraries have a library background and still have a great deal to learn from IT professionals. Silver is contemplating a PhD program or else a return to a library systems position when he graduates. Either way, the pro- fession will benefit from his thoughtful, well-researched, and useful contributions to our field. Todd Vandenbark’s paper on library Web design for persons with disabilities follows, providing a highly prac- tical but also very readable guide for webmasters and others. Vandenbark graduated last spring with a mas- ters degree from the School of Library and Information Science at Indiana University and is already working as a Web services librarian at the Eccles Health Sciences Library at the University of Utah. Like Mr. Silver, he entered the program with a number of years’ work experience in the IT field, and his paper reflects the depth of his technical knowledge. Vandenbark notes, however, that he has found “the enthusiasm and collegiality among library technology professionals to be a welcome change from other employment experiences,” a gratifying com- ment for readers of this journal. Ilana Tolkoff tackles the challenging concept of global interoperability in cataloguing. She was fascinated that a single database, OCLC, has holdings from libraries all over the world. This is also such a recent phenom- enon that our current cataloging standards still do not accommodate such global participation. I was inter- ested to see what librarians were doing to reconcile this variety of languages, scripts, cultures, and indepen- dently developed cataloging standards. Tolkoff also graduated this past spring and is hoping to find a position within a music library. Marijke Visser addresses the overwhelming question of how to organize and expose Internet resources, looking at tagging and the social Web as a solution. Coming from a teaching background, Visser has long been interested in literacy and life-long learning. She is concerned about “the amount of information found only online and what it means when people are unable . . . to find the best resources, the best article, the right website that answers a question or solves a critical problem.” She is excited by “the potential for creativity made possible by technol- ogy” and by the way librarians incorporate “collaborative tools and interactive applications into library service.” Visser looks forward to graduating in May. Mary Kurtz examines the use of the Dublin Core metadata schema within DSpace institutional repositor- ies. As a volunteer, she used DSpace to archive historical photographs and was responsible for classifying them using Dublin Core. She enjoyed exploring how other institutions use the same tools and would love to delve further into digital archives, “how they’re used, how they’re organized, who uses them and why.” Kurtz graduated in the summer and is looking for the right job for her interests and talents in a location that suits herself and her family. Finally, Lauren Mandel wraps up the issue exploring the use of a geographic information system to under- stand how patrons use library spaces. Mandel has been an enthusiastic patron of libraries since she was a small child visiting her local county and city public libraries. She is currently a doctoral candidate at Florida State University and sees an academic future for herself. Mandel expresses infectious optimism about technology in libraries: sandra shores (sandra.shores@ualberta.ca) is Guest Editor of this issue and operations Manager, information Technology Servi- ces, University of Alberta Libraries, Edmonton, Alberta, Canada. eDitORiAl BOARD tHOuGHts | sHORes 7 Looking ahead, it seems clear that the pace of change in today’s environment will only continue to accelerate; thus the need for us to quickly form and dissolve key sponsorships and partnerships that will result in the suc- cessful fostering and implementation of new ideas, the currency of a vibrant profession. The next challenge is to realize that many of the key sponsorship and partnerships that need to be formed are not just with traditional organizations in this profession. Tomorrow’s sponsorships and partnership will be with those organizations that will benefit from the expertise of libraries and their suppliers while in return helping to develop or provide the new funding opportunities and means and places for disseminating access to their expertise and resources. Likely organizations would be those in the fields of education, publishing, content cre- ation and management, and social and community Web- based software. To summarize, we at Ex Libris believe in sponsor- ships and partnerships. We believe they’re important and should be used in advancing our profession and organizations. From long experience we also have learned there are right ways and wrong ways to implement these tools, and I’ve shared thoughts on how to make them work for all the parties involved. Again, I thank Marc for his receptiveness to this discussion and my even deeper appreciation for trying to address the issues. It’s serves as an excellent example of what I discussed above. People forget, but paper, the scroll, the codex, and later the book were all major technological leaps, not to mention the printing press and moveable type. . . . There is so much potential for using technology to equalize access to information, regardless of how much money you have, what language you speak, or where you live. Big ideas, enthusiasm, and hope for the profession, in addition to practical technology-focused information await the reader. Enjoy the issue, and congratulations to the winner and all the finalists! Note 1. All quotations are taken with permission from private e-mail correspondence. A Partnership for Creating Successful Partnerships continued from page 5 3153 ---- 8 iNFORMAtiON tecHNOlOGY AND liBRARies | MARcH 2010 T. Michael Silver Monitoring Network and Service Availability with Open-Source Software Silver describes the implementation of a monitoring sys- tem using an open-source software package to improve the availability of services and reduce the response time when troubles occur. He provides a brief overview of the litera- ture available on monitoring library systems, and then describes the implementation of Nagios, an open-source network monitoring system, to monitor a regional library system’s servers and wide area network. Particular atten- tion is paid to using the plug-in architecture to monitor library services effectively. The author includes example displays and configuration files. Editor’s note: This article is the winner of the LITA/Ex Libris Writing Award, 2009. L ibrary IT departments have an obligation to provide reliable services both during and after normal busi- ness hours. The IT industry has developed guide- lines for the management of IT services, but the library community has been slow to adopt these practices. The delay may be attributed to a number of factors, including a dependence on vendors and consultants for technical expertise, a reliance on librarians who have little formal training in IT best practices, and a focus on automation systems instead of infrastructure. Larger systems that employ dedicated IT professionals to manage the orga- nization’s technology resources likely implement best practices as a matter of course and see no need to discuss them within the library community. In The Practice of System and Network Administration, Thomas A. Limoncelli, Christine J. Hogan, and Strata R. Chalup present a comprehensive look at best practices in managing systems and networks. Early in the book they provide a short list of first steps toward improving IT ser- vices, one of which is the implementation of some form of monitoring. They point out that without monitoring, systems can be down for extended periods before admin- istrators notice or users report the problem.1 They dedi- cate an entire chapter to monitoring services. In it, they discuss the two primary types of monitoring—real-time monitoring, which provides information on the current state of services, and historical monitoring, which pro- vides long-term data on uptime, use, and performance.2 While the software discussed in this article provides both types of monitoring, I focus on real-time monitoring and the value of problem identification and notification. Service monitoring does not appear frequently in library literature, and what is written often relates to single-purpose custom monitoring. An article in the September 2008 issue of ITAL describes the development and deployment of a wireless network, including a Perl script written to monitor the wireless network and asso- ciated services.3 The script updates a webpage to display the results and sends an e-mail notifying staff of problems. An enterprise monitoring system could perform these tasks and present the results within the context of the complete infrastructure. It would require using advanced features because of the segregation of networks discussed in their article but would require little or no extra effort than it took to write the single-purpose script. Dave Pattern at the University of Huddersfield shared another Perl script that monitors OPAC functionality.4 Again, the script provided a single-purpose monitoring solution that could be integrated within a larger model. Below, I discuss how I modified his script to provide more meaningful monitoring of our OPAC than the stock webpage monitoring plug-in included with our open- source networks monitoring system, Nagios. Service monitoring can consist of a variety of tests. In its simplest form, a ping test will verify that a host (server or device) is powered on and successfully con- nected to the network. Feher and Sondag used ping tests to monitor the availability of the routers and access points on their network, as do I for monitoring connectivity to remote locations.5 A slightly more meaningful check would test for the establishment of a connection on a port. Feher and Sondag used this method to check the daemons in their network.6 The step further would be to evaluate a service response, for example checking the status code returned by a Web server. Evaluating content forms the next level of meaning. Limoncelli, Hogan, and Chalup discuss end-to-end monitoring, where the moni- toring system actually performs meaningful transactions and evaluates the results.7 Pattern’s script, mentioned above, tests OPAC func- tionality by submitting a known keyword search and evaluating the response.8 I implemented this after an incident where Nagios failed to alert me to a problem with the OPAC. The Web server returned a status code of 200 to the request for the search page. Users, however, want more from an OPAC, and attempts to search were unsuccessful because of problems with the index server. Modifying Pattern’s original script, I was able to put together a custom check command that verifies a greater level of functionality by evaluating the number of results for the known search. n Software selection Limoncelli, Hogan, and Chalup do not address specific t. Michael silver (michael.silver@ualberta.ca) is an MLiS stu- dent, School of Library and information Studies, University of Al- berta, Edmonton, Alberta, Canada. MONitORiNG NetwORK AND seRvice AvAilABilitY witH OPeN-sOuRce sOFtwARe | silveR 9 how-to issues and rarely mention specific products. Their book provides the foundational knowledge necessary to identify what must be done. In terms of monitoring, they leave the selection of an appropriate tool to the reader.9 Myriad monitoring tools exist, both commercial and open-source. Some focus on network analysis, and some even target specific brands or model lines. The selection of a specific software package should depend on the ser- vices being monitored and the goals for the monitoring. Wikipedia lists thirty-five different products, of which eighteen are commercial (some with free versions with reduced functionality or features); fourteen are open- source projects under a General Public License or similar license (some with commercial support available but without different feature sets or licenses); and three offer different versions under different licenses.10 Von Hagen and Jones suggest two of them: Nagios and Zabbix.11 I selected the Nagios open-source product (http:// www.nagios.org). The software has an established his- tory of active development, a large and active user community, a significant number of included and user- contributed extensions, and multiple books published on its use. Commercial support is available from a company founded by the creator and lead developer as well as other authorized solution providers. Monitoring appliances based on Nagios are available, as are sensors designed to interoperate with Nagios. Because of the flexibility of a software design that uses a plug-in archi- tecture, service checks for library-specific applications can be implemented. If a check or action can be scripted using practically any protocol or programming language, Nagios can monitor it. Nagios also provides a variety of information displays, as shown in appendixes A–E. n Installation The Nagios system provides an extremely flexible solu- tion to monitor hosts and services. The object-orientation and use of plug-ins allows administrators to monitor any aspect of their infrastructure or services using standard plug-ins, user-contributed plug-ins, or custom scripts. Additionally, the open-source nature of the package allows independent development of extensions to add features or integrate the software with other tools. Community sites such as MonitoringExchange (formerly Nagios Exchange), Nagios Community, and Nagios Wiki provide repositories of documentation, plug-ins, extensions, and other tools designed to work with Nagios.12 But that flexibility comes at a cost—Nagios has a steep learning curve, and user- contributed plug-ins often require the installation of other software, most notably Perl modules. Nagios runs on a variety of Linux, Unix, and Berkeley Software Distribution (BSD) operating systems. For testing, I used a standard Linux server distribution installed on a virtual machine. Virtualization provides an easy way to test software, especially if an alternate operating system is needed. If given sufficient resources, a virtual machine is capable of running the production instance of Nagios. After installing and updating the operating system, I installed the following packages: n Apache Web server n Perl n GD development library, needed to produce graphs and status maps n libpng-devel and libjpeg-devel, both needed by the GD library n gcc and GNU make, which are needed to compile some plug-ins and Perl modules Most major Linux and BSD distributions include Nagios in their software repositories for easy instal- lation using the native package management system. Although the software in the repositories is often not the most recent version, using these repositories simplifies the installation process. If a reasonably recent version of the software is available from a repository, I will install from there. Some software packages are either outdated or not available, and I manually install these. Detailed installation instructions are available on the Nagios web- site, in several books, and on the previously mentioned websites.13 The documentation for version 3 includes a number of quick-start guides.14 Most package managers will take care of some of the setup, including modifying the Apache configuration file to create an alias available at http://server.name/nagios. I prepared the remainder of this article using the latest stable versions of Nagios (3.0.6) and the plug-ins (1.4.13) at the time of writing. n Configuration Nagios configuration relies on an object model, which allows a great deal of flexibility but can be complex. Planning your configuration beforehand is highly recom- mended. Nagios has two main configuration files, cgi.cfg and nagios.cfg. The former is primarily used by the Web inter- face to authenticate users and control access, and it defines whether authentication is used and which users can access what functions. The latter is the main configuration file and controls all other program operations. The cfg_file and cfg_dir directives allow the configuration to be split into manageable groupsusing additional recourse files and the object definition files (see figure 1). The flexibility offered allows a variety of different structures. I group network 10 iNFORMAtiON tecHNOlOGY AND liBRARies | MARcH 2010 devices into groups but create individual files for each server. Nagios uses an object- oriented design. The objects in Nagios are dis- played in table 1. A complete review of Nagios configuration is beyond the scope of this article. The documenta- tion installed with Nagios covers it in great detail. Special attention should be paid to the concepts of templates and object inheritance as they are vital to creating a man- ageable configuration. The discussion below provides a brief introduction, while appendixes F–J provide concrete examples of working configuration files. n cgi.cfg The cgi.cfg file controls the Web interface and its asso- ciated CGI (Common Gateway Interface) programs. During testing, I often turn off authentication by setting use_authentication to 0 if the Web interface is not accessible from the Internet. There also are various configuration directives that provide greater control over which users can access which features. The users are defined in the /etc/nagios/htpasswd.users file. A summary of com- mands to control entries is presented in table 2. The Web interface includes other features, such as sounds, status map displays, and integration with other products. Discussion of these directives is beyond the scope of this article. The cgi.cfg file provided with the software is well commented, and the Nagios documen- tation provides additional information. A number of screenshots from the Web interface are provided in the appendixes, including status displays and reporting. n nagios.cfg The nagios.cfg file controls the operation of everything except the Web interface. Although it is possible to have a single monolithic configuration file, organizing the con- figuration into manageable files works better. The two main directives of note are cfg_file, which defines a single file that should be included, and cfg_dir, which includes all files in the specified directory with a .cfg extension. A third type of file that gets included is resource.cfg, which defines various macros for use in commands. Organizing the object files takes some thought. I monitor more than one hundred services on roughly seventy hosts, so the method of organizing the files was of more than academic interest. I use the following con- figuration files: n commands.cfg, containing command definitions n contacts.cfg, containing the list of contacts and associated information, such as e-mail address, (see appendix H) n groups.cfg, containing all groups—hostgroups, ser- vicegroups, and contactgroups, (see appendix G) n templates.cfg, containing all object templates, (see appendix F) n timeperiods.cfg, containing the time ranges for checks and notifications All devices and servers that I monitor are placed in directories using the cfg_dir directive: Servers—Contains server configurations. Each file includes the host and service configurations for a physical or virtual server. Devices—Contains device information. I create indi- vidual files for devices with service monitoring that goes beyond simple ping tests for connectiv- Table 1. Nagios objects Object Used for hosts servers or devices being monitored hostgroups groups of hosts services services being monitored servicegroups groups of services timeperiods scheduling of checks and notifications commands checking hosts and services notifying contacts processing performance data event handling contacts individuals to alert contactgroups groups of contacts Figure 1. Nagios configura- tion relationships. Copyright © 2009 Ethan Galstead, Nagios Enterprises. Used with permis- sion. MONitORiNG NetwORK AND seRvice AvAilABilitY witH OPeN-sOuRce sOFtwARe | silveR 11 ity. Devices monitored solely for connectivity are grouped logically into a single file. For example, we monitor connectivity with fifty remote locations, and all fifty of them are placed in a single file. The resource.cfg file uses two macros to define the path to plug-ins and event handlers. Thirty other macros are available. Because the CGI programs do not read the resource file, restrictive permissions can be applied to them, enabling some of the macros to be used for user- names and passwords needed in check commands. Placing sensitive information in service configurations exposes them to the Web server, creating a security issue. n Configuration The appendixes include the object configuration files for a simple monitoring situation. A switch is monitored using a simple ping test (see appendix J), while an opac server on the other side of the switch is monitored for both Web and Z39.50 operations (see appendix I). Note that the opac configuration includes a parents directive that tells Nagios that a problem with the gateway-switch will affect connectivity with the opac server. I monitor fifty remote sites. If my router is down, a single notification regarding my router provides more information if it is not buried in a storm of notifications about the remote sites. The Web port, Web service, and opac search services demon- strate different levels of monitoring. The Web port simply attempts to establish a connection to port 80 without evalu- ating anything beyond a successful connection. The Web service check requests a specific page from the Web server and evaluates only the status code returned by the server. It displays a warning because I configured the check to download a file that does not exist. The Web server is run- ning because it returns an error code, hence the warning status. The opac search uses a known search to evaluate the result content, specifically whether the correct number of results is returned for a known search. I used a number of templates in the creation of this configuration. Templates reduce the amount of repeti- tive typing by allowing the reuse of directives. Templates can be chained, as seen in the host templates. The opac definition uses the Linux-server template, which in turn uses the generic-host template. The host definition inher- its the directives of the template it uses, overriding any elements in both and adding new elements. In practical terms, generic-host directives are read first. Linux-server directives are applied next. If there is a conflict, the Linux- server directive takes precedence. Finally, opac is read. Again, any conflicts are resolved in favor of the last con- figuration read, in this case opac. n Plug-ins and service checks The nagios plugins package provides numerous plug-ins, including the check-host-alive, check_ping, check_tcp, and check_http commands. Using the plug-ins is straightfor- ward, as demonstrated in the appendixes. Most plug- ins will provide some information on use if executed with—help supplied as an argument to the command. By default, the plug-ins are installed in /usr/lib/nagios/ plugins. Some distributions may install them in a differ- ent directory. The plugins folder contains a subfolder with user- contributed scripts that have proven useful. Most of these plug-ins are Perl scripts, many of which require additional Perl modules available from the Comprehensive Perl Archive Network (CPAN). The check_hip_search plug-in (appendix K) used in the exam- ples requires additional modules. Installing Perl mod- ules is best accomplished using the CPAN Perl module. Detailed instructions on module installation are avail- able online.15 Some general tips: n Gcc and make should be installed before trying to install Perl modules, regardless of whether you are installing manually or using CPAN. Most modules are provided as source code, which may require compiling before use. CPAN automates this pro- cess but requires the presence of these packages. n Alternately, many Linux distributions provide Perl module packages. Using repositories to install usu- ally works well assuming the repository has all the needed modules. In my experience, that is rarely the case. Table 2. Sample commands for managing the htpasswd.users file Create or modify an entry, with password entered at a prompt: htpasswd /etc/nagios/htpasswd.users <username> Create or modify an entry using password from the command line: htpasswd -b /etc/nagios/htpasswd.users <username> <password> Delete an entry from the file: htpasswd -D /etc/nagios/htpasswd.users <username> 12 iNFORMAtiON tecHNOlOGY AND liBRARies | MARcH 2010 n Many modules depend on other modules, some- times requiring multiple install steps. Both CPAN and distribution package managers usually satisfy dependencies automatically. Manual installation requires the installer to satisfy the dependencies one by one. n Most plug-ins provide information on required software, including modules, in a readme file or in the source code for the script. In the absence of such documentation, running the script on the command line usually produces an error contain- ing the name of the missing module. n Testing should be done using the nagios user. Using another user account, especially the root user, to create directories, copy files, and run programs creates folders and files that are not accessible to the nagios user. The best practice is to use the nagios user for as much of the configuration and testing as possible. The lists and forums frequently include questions from new users that have successfully installed, configured, and tested Nagios as the root user and are confused when Nagios fails to start or function properly. n Advanced topics Once the system is running, more advanced features can be explored. The documentation describes many such enhancements, but the following may be particularly use- ful depending on the situation. n Nagios provides access control through the combi- nation of settings in the cgi.cfg and htpasswd.users files. Library administration and staff, as well as patrons, may appreciate the ability to see the sta- tus of the various systems. However, care should be taken to avoid disclosing sensitive information regarding the network or passwords, or allowing access to CGI programs that perform actions. n Nagios permits the establishment of dependency relationships. Host dependencies may be useful in some rare circumstances not covered by the parent–child relationships mentioned above, but service dependencies provide a method of connect- ing services in a meaningful manner. For example, certain OPAC functions are dependent on ILS ser- vices. Defining these relationships takes both time and thought, which may be worthwhile depending on any given situation. n Event handlers allow Nagios to initiate certain actions after a state change. If Nagios notices that a particular service is down, it can run a script or program to attempt to correct the problem. Care should be taken when creating these scripts as ser- vice restarts may delete or overwrite information critical to solving a problem, or worsen the actual situation if an attempt to restart a service or reboot a server fails. n Nagios provides notification escalations, permit- ting the automatic notification of problems that last longer than a certain time. For example, a service escalation could send the first three alerts to the admin group. If properly configured, the fourth alert would be sent to the managers group as well as the admin group. In addition to escalating issues to management, this feature can be used to establish a series of responders for multiple on-call personnel. n Nagios can work in tandem with remote machines. In addition to custom scripts using Secure Shell (SSH), the Nagios Remote Plug-in Executor (NRPE) add-on allows the execution of plug-ins on remote machines, while the Nagios Service Check Acceptor (NSCA) add-on allows a remote host to submit check results to the Nagios server for processing. Implementing Nagios on the Feher and Sondag wireless network mentioned earlier would require one of these options because the wireless network is not accessible from the external network. These add-ons also allow for distributed monitoring, sharing the load among a number of servers while still providing the administrators with a single interface to the entire monitored network. The Nagios Exchange (http://exchange.nagios .org/) contains similar user-contributed programs for Windows. n Nagios can be configured to provide redundant or failover monitoring. Limoncelli, Hogan, and Chalup call this metamonitoring and describe when it is needed and how it can be implemented, suggesting self-monitoring by the host or having a second monitoring system that only monitors the main system.16 Nagios permits more complex configurations, allowing for either two servers operating in parallel, only one of which sends notifications unless the main server fails, or two servers communicating to share the monitoring load. n Alternative means of notification increase access to information on the status of the network. I imple- mented another open-source software package, QuickPage, which allows Nagios text messages to be sent from a computer to a pager or cell phone.17 Appendix L shows a screenshot of a Firefox exten- sion that displays host and service problems in the status bar of my browser and provides optional audio alerts.18 The Nagios community has devel- oped a number of alternatives, including special- ized Web interfaces and RSS feed generators.19 MONitORiNG NetwORK AND seRvice AvAilABilitY witH OPeN-sOuRce sOFtwARe | silveR 13 n Appropriate use Monitoring uses bandwidth and adds to the load of machines being monitored. Accordingly, an IT depart- ment should only monitor its own servers and devices, or those for which it has permission to do so. Imagine what would happen if all the users of a service such as WorldCat started monitoring it! The additional load would be noticeable and could conceivably disrupt service. Aside from reasons connected with being a good “netizen,” monitoring appears similar to port-scanning, a technique used to discover network vulnerabilities. An organization that blithely monitors devices without the owner’s permission may find their traffic is throttled back or blocked entirely. If a library has a definite need to moni- tor another service, obtaining permission to do so is a vital first step. If permission is withheld, the service level agree- ment between the library and its service provider or ven- dor should be reevaluated to ensure that the provider has an appropriate system in place to respond to problems. n Benefits The system-administration books provide an accurate overview of the benefits of monitoring, but personally reaping those benefits provides a qualitative background to the experience. I was able to justify the time spent on setting up monitoring the first day of production. One of the available plug-ins monitors Sybase database servers. It was one of the first contributed plug-ins I implemented because of past experiences with our production database running out of free space, causing the system to become nonfunctional. This happened twice, approximately a year apart. Each time, the integrated library system was down while the vendor addressed the issue. When I enabled the Sybase service checks, Nagios immediately returned a warning for the free space. The advance warning allowed me to work with the vendor to extend the database volume with no downtime for our users. That single event con- vinced the library director of the value of the system. Since that time, Nagios has proven its worth in alert- ing IT staff to problem situations, providing information on outage patterns both for in-house troubleshooting and discussions with service providers. n Conclusion Monitoring systems and services provides IT staff with a vital tool in providing quality customer service and managing systems. Installing and configuring such a system involves a learning curve and takes both time and computing resources. My experiences with Nagios have convinced me that the return on investment more than justifies the costs. References 1. Thomas A. Limoncelli, Christina J. Hogan, and Strata R. Chalup, The Practice of System and Network Administration, 2nd ed. (Upper Saddle River, N.J.: Addison-Wesley, 2007): 36. 2. Ibid., 523–42. 3. James Feher and Tyler Sondag, “Administering an Open- Source Wireless Network,” Information Technology & Libraries 27, no. 3 (Sept. 2008): 44–54. 4. Dave Pattern, “Keeping an Eye on Your HIP,” online post- ing, Jan. 23, 2007, Self-Plagiarism is Style, http://www.daveyp .com/blog/archives/164 (accessed Nov. 20, 2008). 5. Feher and Sondag, “Administering an Open-Source Wire- less Network,” 45–54. 6. Ibid., 48, 53–54. 7. Limoncelli, Hogan, and Chalup, The Practice of System and Network Administration, 539–40. 8. Pattern, “Keeping an Eye on Your HIP.” 9. Limoncelli, Hogan, and Chalup, The Practice of System and Network Administration, xxv. 10. “Comparison of Network Monitoring Systems,” Wikipe- dia, The Free Encyclopedia, Dec. 9, 2008, http://en.wikipedia .org/wiki/Comparison_of_network_monitoring_systems (accessed Dec. 10, 2008). 11. William Von Hagen and Brian K. Jones, Linux Server Hacks, Vol. 2 (Sebastopol, Calif.: O’Reilly, 2005): 371–74 (Zabbix), 382–87 (Nagios). 12. MonitoringExchange, http://www.monitoringexchange. org/ (accessed Dec. 23, 2009); Nagios Community, http:// community.nagios.org (accessed Dec. 23, 2009); Nagios Wiki, http://www.nagioswiki.org/ (accessed Dec. 23, 2009). 13. “Nagios Documentation,” Nagios, Mar. 4, 2008, http:// www.nagios.org/docs/ (accessed Dec. 8, 2008); David Joseph- sen, Building a Monitoring Infrastructure with Nagios (Upper Saddle River, N.J.: Prentice Hall, 2007); Wolfgang Barth, Nagios: System and Network Monitoring, U.S. ed. (San Francisco: Open Source Press; No Starch Press, 2006). 14. Ethan Galstead, “Nagios Quickstart Installation Guides,” Nagios 3.x Documentation, Nov. 30, 2008, http://nagios.source forge.net/docs/3_0/quickstart.html (accessed Dec. 3, 2008). 15. The Perl Directory, (http://www.perl.org/) contains com- plete information on Perl. Specific information on using CPAN is available in “How Do I Install a Module from CPAN?” perlfaq8, Nov. 7, 2007, http://perldoc.perl.org/perlfaq8.html (accessed Dec. 4, 2008). 16. Limoncelli, Hogan, and Chalup, The Practice of System and Network Administration, 539–40. 17. Thomas Dwyer III, QPage Solutions, http://www.qpage .org/ (accessed Dec. 9, 2008). 18. Petr Šimek, “Nagioschecker,” Google Code, Aug. 12, 2008, http://code.google.com/p/nagioschecker/ (accessed Dec. 8, 2008). 19. “Notifications,” MonitoringExchange, http://www .monitoringexchange.org/inventory/Utilities/AddOn-Proj- ects/Notifications (accessed Dec. 23, 2009). 14 iNFORMAtiON tecHNOlOGY AND liBRARies | MARcH 2010 Appendix A. Service detail display from test system Appendix B. Service details for OPAC (hip) and ILS (horizon) servers from production system Appendix C. Sybase freespace trends for a specified period Appendix D. Connectivity history for a specified period Appendix E. Availability report for host shown in Appendix D Appendix F. templates.cfg file ############################################################################ # TEMPLATES.CFG - SAMPLE OBJECT TEMPLATES ############################################################################ ############################################################################ # CONTACT TEMPLATES ############################################################################ MONitORiNG NetwORK AND seRvice AvAilABilitY witH OPeN-sOuRce sOFtwARe | silveR 15 # Generic contact definition template - This is NOT a real contact, just # a template! define contact{ name generic-contact service_notification_period 24x7 host_notification_period 24x7 service_notification_options w,u,c,r,f,s host_notification_options d,u,r,f,s service_notification_commands notify-service-by-email host_notification_commands notify-host-by-email register 0 } ############################################################################ # HOST TEMPLATES ############################################################################ # Generic host definition template - This is NOT a real host, just # a template! define host{ name generic-host notifications_enabled 1 event_handler_enabled 1 flap_detection_enabled 1 failure_prediction_enabled 1 process_perf_data 1 retain_status_information 1 retain_nonstatus_information 1 notification_period 24x7 register 0 } # Linux host definition template - This is NOT a real host, just a template! define host{ name linux-server use generic-host check_period 24x7 check_interval 5 retry_interval 1 max_check_attempts 10 check_command check-host-alive notification_period workhours notification_interval 120 notification_options d,u,r contact_groups admins register 0 } Appendix F. templates.cfg file (cont.) 16 iNFORMAtiON tecHNOlOGY AND liBRARies | MARcH 2010 # Define a template for switches that we can reuse define host{ name generic-switch use generic-host check_period 24x7 check_interval 5 retry_interval 1 max_check_attempts 10 check_command check-host-alive notification_period 24x7 notification_interval 30 notification_options d,r contact_groups admins register 0 } ############################################################################ # SERVICE TEMPLATES ############################################################################ # Generic service definition template - This is NOT a real service, # just a template! define service{ name generic-service active_checks_enabled 1 passive_checks_enabled 1 parallelize_check 1 obsess_over_service 1 check_freshness 0 notifications_enabled 1 event_handler_enabled 1 flap_detection_enabled 1 failure_prediction_enabled 1 process_perf_data 1 retain_status_information 1 retain_nonstatus_information 1 is_volatile 0 check_period 24x7 max_check_attempts 3 normal_check_interval 10 retry_check_interval 2 contact_groups admins notification_options w,u,c,r notification_interval 60 notification_period 24x7 register 0 } Appendix F. templates.cfg file (cont.) MONitORiNG NetwORK AND seRvice AvAilABilitY witH OPeN-sOuRce sOFtwARe | silveR 17 # Define a ping service. This is NOT a real service, just a template! define service{ use generic-service name ping-service notification_options n check_command check_ping!1000.0,20%!2000.0,60% register 0 } Appendix F. templates.cfg file (cont.) Appendix G. groups.cfg file ############################################################################ # CONTACT GROUP DEFINITIONS ############################################################################ # We only have one contact in this simple configuration file, so there is # no need to create more than one contact group. define contactgroup{ contactgroup_name admins alias Nagios Administrators members nagiosadmin } ############################################################################ # HOST GROUP DEFINITIONS ############################################################################ # Define an optional hostgroup for Linux machines define hostgroup{ hostgroup_name linux-servers ; The name of the hostgroup alias Linux Servers ; Long name of the group } # Create a new hostgroup for ILS servers define hostgroup{ hostgroup_name ils-servers ; The name of the hostgroup alias ILS servers ; Long name of the group } # Create a new hostgroup for switches define hostgroup{ hostgroup_name switches ; The name of the hostgroup alias Network Switches ; Long name of the group } ############################################################################ # SERVICE GROUP DEFINITIONS ############################################################################ 18 iNFORMAtiON tecHNOlOGY AND liBRARies | MARcH 2010 # Define a service group for network connectivity define servicegroup{ servicegroup_name network alias Network infrastructure services } # Define a servicegroup for ILS define servicegroup{ servicegroup_name ils-services alias ILS related services } Appendix G. groups.cfg file (cont.) Appendix H. contacts.cfg ############################################################################ # CONTACTS.CFG - SAMPLE CONTACT/CONTACTGROUP DEFINITIONS ############################################################################ # Just one contact defined by default - the Nagios admin (that’s you) # This contact definition inherits a lot of default values from the # ‘generic-contact’ template which is defined elsewhere. define contact{ contact_name nagiosadmin use generic-contact alias Nagios Admin email nagios@localhost } Appendix I. opac.cfg ############################################################################ # OPAC SERVER ############################################################################ ############################################################################ # HOST DEFINITION ############################################################################ # Define a host for the server we’ll be monitoring # Change the host_name, alias, and address to fit your situation define host{ use linux-server host_name opac parents gateway-switch alias OPAC server MONitORiNG NetwORK AND seRvice AvAilABilitY witH OPeN-sOuRce sOFtwARe | silveR 19 Appendix I. opac.cfg (cont.) address 192.168.1.123 } ############################################################################ # SERVICE DEFINITIONS ############################################################################ # Create a service for monitoring the HTTP port define service{ use generic-service host_name opac service_description web port check_command check_tcp!80 } # Create a service for monitoring the web service define service{ use generic-service host_name opac service_description Web service check_command check_http!-u/bogusfilethatdoesnotexist.html } # Create a service for monitoring the opac search define service{ use generic-service host_name opac service_description OPAC search check_command check_hip_search } # Create a service for monitoring the Z39.50 port define service{ use generic-service host_name opac service_description z3950 port check_command check_tcp!210 } Appendix J. switches.cfg ############################################################################ # SWITCH.CFG - SAMPLE CONFIG FILE FOR MONITORING SWITCHES ############################################################################ ############################################################################ # HOST DEFINITIONS ############################################################################ 20 iNFORMAtiON tecHNOlOGY AND liBRARies | MARcH 2010 Appendix K. check_hip_search script #!/usr/bin/perl -w ######################### # Check Horizon Information Portal (HIP) status. # HIP is the web-based interface for Dynix and Horizon # ILS systems by SirsiDynix corporation. # # This plugin is based on a standalone Perl script written # by Dave Pattern. Please see # http://www.daveyp.com/blog/index.php/archives/164/ # for the original script. # # The original script and this derived work are covered by # http://creativecommons.org/licenses/by-nc-sa/2.5/ ######################### use strict; use LWP::UserAgent; # Note the requirement for Perl module LWP::UserAgent! use lib “/usr/lib/nagios/plugins”; use utils qw($TIMEOUT %ERRORS); # Define the switch that we’ll be monitoring define host{ use generic-switch host_name gateway-switch alias Gateway Switch address 192.168.0.1 hostgroups switches } ############################################################################ ### # SERVICE DEFINITIONS ############################################################################ ### # Create a service to PING to switches # Note this entry will ping every host in the switches hostgroup define service{ use ping-service hostgroups switches service_description PING normal_check_interval 5 retry_check_interval 1 } Appendix J. switches.cfg MONitORiNG NetwORK AND seRvice AvAilABilitY witH OPeN-sOuRce sOFtwARe | silveR 21 ### Some configuration options my $hipServerHome = “http://ipac.prl.ab.ca/ipac20/ipac. jsp?profile=alap”; my $hipServerSearch = “http://ipac.prl.ab.ca/ipac20/ipac.jsp?menu=se arch&aspect=subtab132&npp=10&ipp=20&spp=20&profile=alap&ri=&index=.GW&term=li nux&x=18&y=13&aspect=subtab132&GetXML=true”; my $hipSearchType = “xml”; my $httpProxy = ‘’; ### check home page is available... { my $ua = LWP::UserAgent->new; $ua->timeout( 10 ); if( $httpProxy ) { $ua->proxy( ‘http’, $httpProxy ) } my $response = $ua->get( $hipServerHome ); my $status = $response->status_line; if( $response->is_success ) { } else { print “HIP_SEARCH CRITICAL: $status\n”; exit $ERRORS{‘CRITICAL’}; } } ### check search page is returning results... { my $ua = LWP::UserAgent->new; $ua->timeout( 10 ); if( $httpProxy ) { $ua->proxy( ‘http’, $httpProxy ) } my $response = $ua->get( $hipServerSearch ); my $status = $response->status_line; if( $response->is_success ) { my $results = 0; my $content = $response->content; if( lc( $hipSearchType ) eq ‘html’ ) { if ( $content =~ /\<b\>(\d+?)\<\/b\>\ \;titles matched/ ) { $results = $1; Appendix K. check_hip_search script (cont.) 22 iNFORMAtiON tecHNOlOGY AND liBRARies | MARcH 2010 } } if( lc( $hipSearchType ) eq ‘xml’ ) { if( $content =~ /\<hits\>(\d+?)\<\/hits\>/ ) { $results = $1; } } ### Modified section - original script triggered another function to ### save results to a temp file and email an administrator. unless( $results ) { print “HIP_SEARCH CRITICAL: No results returned|results=0\n”; exit $ERRORS{‘CRITICAL’}; } if ( $results ) { print “HIP_SEARCH OK: $results results returned|results=$results\n”; exit $ERRORS{‘OK’}; } } } Appendix K. check_hip_search script (cont.) Appendix L. Nagios Checker display 3156 ---- 34 iNFORMAtiON tecHNOlOGY AND liBRARies | MARcH 2010 Tagging: An Organization Scheme for the Internet Marijke A. Visser How should the information on the Internet be organized? This question and the possible solutions spark debates among people concerned with how we identify, classify, and retrieve Internet content. This paper discusses the benefits and the controversies of using a tagging system to organize Internet resources. Tagging refers to a clas- sification system where individual Internet users apply labels, or tags, to digital resources. Tagging increased in popularity with the advent of Web 2.0 applications that encourage interaction among users. As more information is available digitally, the challenge to find an organiza- tional system scalable to the Internet will continue to require forward thinking. Trained to ensure access to a range of informational resources, librarians need to be concerned with access to Internet content. Librarians can play a pivotal role by advocating for a system that sup- ports the user at the moment of need. Tagging may just be the necessary system. W ho will organize the information available on the Internet? How will it be organized? Does it need an organizational scheme at all? In 1998, Thomas and Griffin asked a similar question, “Who will create the metadata for the Internet?” in their article with the same name.1 Ten years later, this question has grown beyond simply supplying metadata to assuring that at the moment of need, someone can retrieve the information necessary to answer their query. Given new classification tools available on the Internet, the time is right to reas- sess traditional models, such as controlled vocabularies and taxonomies, and contrast them with folksonomies to understand which approach is best suited for the future. This paper gives particular attention to Delicious, a social networking tool for generating folksonomies. The amount of information available to anyone with an Internet connection has increased in part because of the Internet’s participatory nature. Users add content in a variety of formats and through a variety of applications to personalize their Web experience, thus making Internet content transitory in nature and challenging to lock into place. The continual influx of new information is caus- ing a rapid cultural shift, more rapid than many people are able to keep up with or anticipate. Conversations on a range of topics that take place using Web technologies happen in real time. Unless you are a participant in these conversations and debates using Web-based communica- tion tools, changes are passing you by. Internet users in general have barely grasped the concept of Web 2.0 and already the advanced “Internet cognoscenti” write about Web 3.0.2 Regarding the organization and availability of Internet content, librarians need to be ahead of the crowd as the voice who will assure content will be readily accessible to those that seek it. Internet users actively participat- ing in and shaping the online communities are, perhaps unintentionally, influencing how those who access infor- mation via the Internet expect to be able to receive and use digital resources. Librarians understand that the way information is organized is critical to its accessibility. They also understand the communities in which they operate. Today, librarians need to be able to work seam- lessly among the online communities, the resources they create, and the end user. As Internet use evolves, librar- ians as information stakeholders should stay abreast of Web 2.0 developments. By positioning themselves to lead the future of information organization, librarians will be able to select the best emerging Web-based tools and applications, become familiar with their strengths, and leverage their usefulness to guide users in organizing Internet content. Shirky argues that the Internet has allowed new com- munities to form. Primarily online, these communities of Internet users are capable of dramatically changing society both on- and offline. Shirky contends that because of the Internet, “group action just got easier.”3 According to Shirky, we are now at the critical point where Internet use, while dependent on technology, is actually no longer about the technology at all. The Web today (Web 2.0) is about participation. “This [the Internet] is a medium that is going to change society.”4 Lessig points out that content creators are “writing in the socially, culturally relevant sense for the 21st century and to be able to engage in this writing is a measure of your literacy in the 21st century.”5 It is significant that creating content is no longer reserved for the Internet cognoscenti. Internet users with a variety of technological skills are participating in Web 2.0 com- munities. Information architects, Web designers, librarians, busi- ness representatives, and any stakeholder dependent on accessing resources on the Internet have a vested interest in how Internet information is organized. Not only does the architecture of participation inherent in the Internet encourage completely new creative endeavors, it serves as a platform for individual voices as demonstrated in Marijke A. visser (marijkea@gmail.com) is a Library and infor- mation Science graduate student at indiana University, india- napolis, and will be graduating May 2010. She is currently work- ing for ALA’s office for information and Technology Policy as an information Technology Policy Analyst, where her area of focus includes telecommunications policy and how it affects access to information. tAGGiNG: AN ORGANizAtiON scHeMe FOR tHe iNteRNet | visseR 35 personal and organizationally sponsored blogs: Lessig 2.0, Boing Boing, Open Access News, and others. These Internet conversations contribute diverse viewpoints on a stage where, theoretically, anyone can access them. Web 2.0 technologies challenge our understanding of what con- stitutes information and push policy makers to negotiate equitable Internet-use policies for the public, the content creators, corporate interests, and the service providers. To maintain an open Internet that serves the needs of all the players, those involved must embrace the opportunity for cultural growth the social Web represents. For users who access, create, and distribute digital content, information is anything but static; nor is using it the solitary endeavor of reading a book. Its digital format makes it especially easy for people to manipulate it and shape it to create new works. People are sharing these new works via social technologies for others to then remix into yet more distinct creative work. Communication is fundamentally altered by the ability to share content on the Internet. Today’s Internet requires a reevaluation of how we define and organize information. The manner in which digital information is classified directly affects each user’s ability to access needed information to fully participate in twenty-first-century culture. New para- digms for talking about and classifying information that reflect the participatory Internet are essential. n Background The controversy over organizing Web-based information can be summed up comparing two perspectives repre- sented by Shirky and Peterson. Both authors address how information on the Web can be most effectively orga- nized. In her introduction, Peterson states, “Items that are different or strange can become a barrier to networking.”6 Shirky maintains, “As the Web has shown us, you can extract a surprising amount of value from big messy data sets.”7 Briefly, in this instance ontology refers to the idea of defining where digital information can and should be located (virtually). Folksonomy describes an organiza- tional system where individuals determine the placement and categorization of digital information. Both terms are discussed in detail below. Although any organizational system necessitates talking about the relationship(s) among the materials being organized, the relationships can be classified in multiple ways. To organize a given set of entities, it is necessary to establish in what general domain they belong and in what ways they are related. Applying an ontological, or hierar- chical, classification system to digital information raises several points to consider. First, there are no physical space restrictions on the Internet, so relationships among digital resources do not need to be strictly identified. Second, after recognizing that Internet resources do not need the same classification standards as print material, librarians can begin to isolate the strengths of current nondigital systems that could be adapted to a system for the Internet. Third, librarians must be ready to eliminate current systems entirely if they fail to serve the needs of Internet users. Traditional systems for organizing information were developed prior to the information explosion on the Internet. The Internet’s unique platform for creating, storing, and disseminating information challenges pre– digital-age models. Designing an organizational system for the Internet that supports creative innovation and succeeds in providing access to the innovative work is paramount to moving the twenty-first-century culture forward. n Assessing alternative models Controversy encourages scrutiny of alternative models. In understanding the options for organizing digital infor- mation, it is important to understand traditional classifi- cation models. Smith discusses controlled vocabularies, taxonomies, and facets as three traditional methods for applying metadata to a resource. According to Smith, a controlled vocabulary is an unambiguous system for managing the meanings of words. It links synonyms, allowing a search to retrieve information on the basis of the relationship between synonyms.8 Taxonomies are hierarchical, controlled vocabularies that establish par- ent–child relationships between terms. A faceted classifi- cation system categorizes information using the distinct properties of that information.9 In such a system, infor- mation can exist in more than one place at a time. A fac- eted classification system is a precursor to the bottom-up system represented by folksonomic tagging. Folksonomy, a term coined in 2004 by Thomas Vander Wal, refers to a “user-created categorical structure development with an emergent thesaurus.”10 Vander Wal further separates the definition into two types: a narrow and a broad folk- sonomy.11 In a broad folksonomy, many people tag the same object with numerous tags or a combination of their own and others’ tags. In a narrow folksonomy, one or few people tag an object with primarily singular terms. Internet searching represents a unique challenge to people wanting to organize its available information. Search engines like Yahoo! and Google approach the cha- otic mass of information using two different techniques. Yahoo! created a directory similar to the file folder system with a set of predetermined categories that were intended to be universally useful. In so doing, the Yahoo! devel- opers made assumptions about how the general public would categorize and access information. The categories 36 iNFORMAtiON tecHNOlOGY AND liBRARies | MARcH 2010 and subsequent subcategories were not necessarily logi- cally linked in the eyes of the general public. The Yahoo! directory expanded as Internet content grew, but the digi- tal folder system, like a taxonomy, required an expert to maintain. Shirky notes the Yahoo! model could not scale to the Internet. There are too many possible links to be able to successfully stay within the confines of a hierar- chical classification system. Additionally, on the Internet, the links are sufficient for access because if two items are linked at least once, the user has an entry point to retrieve either one or both items.12 A hierarchical system does not assure a successful Internet search and it requires a user to comprehend the links determined by the managing expert. In the Google approach, developers acknowl- edged that the user with the query best understood the unique reasoning behind her search. The user therefore could best evaluate the information retrieved. According to Shirky, the Google model let go of the hierarchical file system because developers recognized effective search- ing cannot predetermine what the user wants. Unlike Yahoo!, Google makes the links between the query and the resources after the user types in the search terms.13 Trusting in the link system led Google to understand and profit from letting the user filter the search results. To select the best organizational model for the Internet it is critical to understand its emergent nature. A model that does not address the effects of Web 2.0 on Internet use and fails to capture participant-created content and tagging will not be successful. One approach to orga- nizing digital resources has been for users to bookmark websites of personal interest. These bookmarks have been stored on the user’s computer, but newer models now combine the participatory Web with saving, or tagging, websites. Social bookmarking typifies the emergent Web and the attraction of online networking. Innovative and controversial, the folksonomy model brings to light numerous criteria necessary for a robust organizational system. A social bookmarking network, Delicious is a tool for generating folksonomies. It com- bines a large amount of self-interest with the potential for an equal, if not greater, amount of social value. Delicious users add metadata to resources on the Internet by apply- ing terms, or tags, to URLs. Users save these tagged web- sites to a personal library hosted on the Delicious website. The default settings on Delicious share a user’s library publicly, thus allowing other people—not limited to reg- istered Delicious account holders—to view any library. That the Delicious developers understood how Internet users would react to this type of interactive application is reflected in the popularity of Delicious. Delicious arrived on the scene in 2003, and in 2007 developers introduced a number of features to encourage further user collabora- tion. With a new look (going from the original del.icio.us to its current moniker, Delicious) as well as more ways for users to retrieve and share resources by 2007, Delicious had 3 million registered users and 100 million unique URLs.14 The reputation of Delicious has generated inter- est among people concerned with organizing the infor- mation available via the Internet. How does the folksonomy or Delicious model of open-ended tagging affect searching, information retriev- ing, and resource sharing? Delicious, whose platform is heavily influenced by its users, operates with no hier- archical control over the vocabulary used as tags. This underscores the organization controversy. Bottom-up tagging gives each person tagging an equal voice in the categorization scheme that develops through the user generated tags. At the same time, it creates a chaotic infor- mation-retrieval system when compared to traditional controlled vocabularies, taxonomies, and other methods of applying metadata.15 A folksonomy follows no hier- archical scheme. Every tag generated supplies personal meaning to the associated URL and is equally weighted. There will be overlap in some of the tags users select, and that will be the point of access for different users. For the unique tags, each Delicious user can choose to adopt or reject them for their personal tagging system. Either way, the additional tags add possible future access points for the rest of the user community. The social usefulness of the tags grows organically in relationship to their adop- tion by the group. Can the Internet support an organizational system controlled by user-generated tags? By the very nature of the participatory Web, whose applications often get bet- ter with user input, the answer is yes. Delicious and other social tagging systems are proving that their folksonomic approach is robust enough to satisfy the organizational needs of their users. Defined by Vander Wal, a broad folk- sonomy is a classification system scalable to the Internet.16 The problem with projecting already-existing search and classification strategies to the Internet is that the Internet is constantly evolving, and classic models are quickly overcome. Even in the nonprint world of the Internet, taxonomies and controlled vocabulary entail a commitment both from the entity wanting to organize the system and the users who will be accessing it. Developing a taxonomy involves an expert, which requires an outlay of capital and, as in the case with Yahoo!, a taxonomy is not necessarily what users are looking for. To be used effectively, taxonomies demand a certain amount of user finesse and complacency. The user must understand the general hierarchy and by default must suspend their own sense of category and subcategory if they do not mesh with the given system. The search model used by Google, where the user does the filtering, has been a significantly more successful search engine. Google recognizes natural language, making it user friendly; however, it remains merely a search engine. It is successful at making links, but it leaves the user stranded without a means to orga- nize search results beyond simple page rank. Traditional tAGGiNG: AN ORGANizAtiON scHeMe FOR tHe iNteRNet | visseR 37 hierarchical systems and search strategies like those of Yahoo! and Google neglect to take into account the tre- mendous popularity of the participatory Web. Successful Web applications today support user interaction; to disre- gard this is naive and short-sighted. In contrast to a simple page-rank results list or a hierarchical system, Delicious results provide the user with rich, multilayer results. Figure 1 shows four of the first ten results of a Delicious search for the term “folk- sonomy.” The articles by the four authors in the left col- umn were tagged according to the diagram. Two of the articles are peer-reviewed, and two are cited repeatedly by scholars researching tagging and the Internet. In this example, three unique terms are used to tag those articles, and the other terms provide additional entry points for retrieval. Further information available using Delicious shows that the Guy article was tagged by 1,323 users, the Mathes article by 2,787 users, the Shirky article by 4,383 users, and the Peterson article by 579 users.17 From the basic Delicious search, the user can combine terms to narrow the query as well as search what other users have tagged with those terms. Similar to the card catalog, where a library patron would often unintentionally find a book title by browsing cards before or after the actual title she originally wanted, a Delicious user can browse other users’ libraries, often finding additional pertinent resources. A user will return a greater number of relevant and automatically filtered results than with an advanced Google search. As an ancillary feature, once a Delicious user finds an attractive tag stream—a series of tags by a particular user—they can opt to follow the user who created the tag stream, thereby increasing their personal resources. Hence Delicious is effective personally and socially. It emulates what Internet users expect to be able to do with digital content: find interesting resources, per- sonalize them, in this case with tags, and put them back out for others to use if they so choose. Proponents of folksonomy recognize there are ben- efits to traditional taxonomies and controlled vocabulary systems. Shirky delineates two features of an organi- zational system and their characteristics, providing an example of when a hierarchical system can be successful (see table 1).18 These characteristics apply to situations using data- bases, journal articles, and dissertations as spelled out by Peterson, for example.19 Specific organizations with identifiable common terminology—for example, medical libraries—can also benefit from a traditional classification system. These domains are the antithesis of the domain represented by the Web. The success of controlled vocab- ularies, taxonomies, and their resulting systems depends on broad user adoption. That, in combination with the cost of creating and implementing a controlled system, raises questions as to their utility and long-term viability for use on the Web. Though meant for longevity, a taxonomy fulfills a need at one fixed moment in time. A folksonomy is never static. Taxonomies developed by experts have not yet been able to be extended adequately for the breadth and depth of Internet resources. Neither have traditional viewpoints been scaled to accept the challenges encountered in try- ing to organize the Internet. Folksonomy, like taxonomy, seeks to provide the information critical to the user at the moment of need. Folksonomy, however, relies on users to create the links that will retrieve the desired results. Doctorow puts forward three critiques of a hierarchical metadata system, emphasizing the inadequacies of apply- ing traditional classification schemes to the digital stage: 1. There is not a “correct” way to categorize an idea. 2. Competing interests cannot come to a consensus Figure 1. Search results for “folksonomy” using delicious. Table 1. Domains and their participants Domain to be Organized Participants in the Domain Small corpus Expert catalogers Formal categories Authoritative source of judgment Restricted entities Coordinated users Clear Edges Expert users 38 iNFORMAtiON tecHNOlOGY AND liBRARies | MARcH 2010 on a hierarchical vocabulary. 3. There is more than one way to describe some- thing. Doctorow elaborates: “Requiring everyone to use the same vocabulary to describe their material denudes the cognitive landscape, enforces homogeneity in ideas.”20 The Internet raises the level of participation to include innumerable voices. The astonishing thing is that it thrives on this participation. Guy and Tonkin address the “folksonomic flaw” by saying user-generated tags are by definition imprecise. They can be ambiguous, overly personal, misspelled, and a contrived compound word. Guy and Tonkin suggest the need to improve tagging by educating the users or by improving the systems to encourage more accurate tagging.21 This, however, does not acknowledge that successful Web 2.0 applications depend on the emergent wisdom of the user community. The systems permit organic evolution and continual improvement by user participation. A folksonomy evolves much the way a spe- cies does. Unique or single-use tags have minimal social import and do not gain recognition. Tags used by more than a few people reinforce their value and emerge as the more robust species. n Conclusion The benefits of the Internet are accessible to a wide range of users. The rewards of participation are imme- diate, social, and exponential in scope. User-generated content and associated organization models support the Internet’s unique ability to bring together unlikely social relationships that would not necessarily happen in another milieu. To paraphrase Shirky and Lessig, people are participating in a moment of social and technologi- cal evolution that is altering traditional ways of thinking about information, thereby creating a break from tradi- tional systems. Folksonomic classification is part of that break. Its utility grows organically as users add tagged content to the system. It is adaptive, and its strengths can be leveraged according to the needs of the group. While there are “folksonomic flaws” inherent in a bottom- up classification system, there is tremendous value in weighting individual voices equally. Following the logic of Web 2.0 technology, folksonomy will improve accord- ing to the input of the users. It is an organizational system that reflects the basic tenets of the emergent Internet. It may be the only practical solution in a world of participa- tory content creation. Shirky describes the Internet by saying, “There is no shelf in the digital world.”22 Classic organizational schemes like the Dewey Decimal System were created to organize resources prior to the advent of the Internet. A hierarchical system was necessary because there was a physical limita- tion on where a resource could be located; a book can only exist in one place at one time. In the digital world, the shelf is simply not there. Material can exist in many different places at once and can be retrieved through many avenues. A broad folksonomy supports a vibrant search strategy. It combines individual user input with that of the group. This relationship creates data sets inherently meaningful to the community of users seeking information on any given topic at any given moment. This is why a folksonomic approach to organizing information on the Internet is suc- cessful. Users are rewarded for their participation, and the system improves because of it. Folksonomy mirrors and supports the evolution of the Internet. Librarians, trained to be impartial and ethically bound to assure access to information, are the logical mediators among content creators, the architecture of the Web, corporate interests, and policy makers. Critical con- versations are no longer happening only in traditional publications of the print world. They are happening with communication platforms like YouTube, Twitter, Digg, and Delicious. Information organization is one issue on which librarians can be progressive. Dedicated to making information available, librarians are in a unique position to take on challenges raised by the Internet. As the profession experiments with the introduction of Web 3.0, librarians need to position themselves between what is known and what has yet to evolve. Librarians have always leveraged the interests and needs of their users to tailor their services to the individual entry point of every person who enters the library. Because more and more resources are accessed via the Internet, librarians will have to maintain a presence throughout the Web if they are to continue to speak for the informational needs of their users. Part of that presence necessitates an ability to adapt current models to the Internet. More importantly, it requires recognition of when to forgo con- ventional service methods in favor of more innovative approaches. Working in concert with the early adopters, corporate interests, and general Internet users, librarians can promote a successful system for organizing Internet resources. For the Internet, folksonomic tagging is one solution that will assure users can retrieve information necessary to answer their queries. References and notes 1. Charles F. Thomas and Linda S. Griffin, “Who Will Cre- ate the Metadata for the Internet?” First Monday 3, no. 12 (Dec. 1998). 2. Web 2.0 is a fairly recent term, although now ubiquitous among people working in and around Internet technologies. Attributed to a conference held in 2004 between MediaLive tAGGiNG: AN ORGANizAtiON scHeMe FOR tHe iNteRNet | visseR 39 International and O’Reilly Media, Web 2.0 refers to the Web as being a platform for harnessing the collective power of Internet users interested in creating and sharing ideas and information without mediation from corporate, government, or other hierar- chical policy influencers or regulators. Web 3.0 is a much more fluid concept as of this writing. There are individuals who use it to refer to a Semantic Web where information is analyzed or processed by software designed specifically for computers to carry out the currently human-mediated activity of assigning meaning to information on a webpage. There are librarians involved with exploring virtual-world librarianship who refer to the 3D environment as Web 3.0. The important point here is that what Internet users now know as Web 2.0 is in the process of being altered by individuals continually experimenting with and improving upon existing Web applications. Web 3.0 is the undefined future of the participatory Internet. 3. Clay Shirky, “Here Comes Everybody: The Power of Organizing Without Organizations” (presentation videocast, Berkman Center for Internet & Society, Harvard University, Cambridge, Mass., 2008), http://cyber.law.harvard.edu/inter active/events/2008/02/shirky (accessed Oct. 1, 2008). 4. Ibid. 5. Lawerence Lessig, “Early Creative Commons History, My Version,” videocast, Aug. 11, 2008, Lessig 2.0, http://lessig.org/ blog/2008/08/early_creative_commons_history.html (accessed Aug. 13, 2008). 6. Elaine Peterson, “Beneath the Metadata: Some Philosophi- cal Problems with Folksonomy,” D-Lib Magazine 12, no. 11 (2006), http://www.dlib.org/dlib/november06/peterson/11peterson .html (accessed Sept. 8, 2008). 7. Clay Shirky, “Ontology is Overrated: Categories, Links, and Tags” online posting, Spring 2005, Clay Shirky’s Writings about the Internet, http://www.shirky.com/writings/ontology_ overrated.html#mind_reading (accessed Sept. 8, 2008). 8. Gene Smith, Tagging: People-Powered Metadata for the Social Web (Berkeley, Calif.: New Riders, 2008): 68. 9. Ibid., 76. 10. Thomas Vander Wal, “Folksonomy,” online posting, Feb. 7, 2007, vanderwal.net, http://www.vanderwal.net/folksonomy .html (accessed Aug. 26, 2008). 11. Thomas Vander Wal, “Explaining and Showing Broad and Narrow Folksonomies,” online posting, Feb. 21, 2005, Personal InfoCloud, http://www.personalinfocloud.com/2005/02/ explaining_and_.html (accessed Aug. 29, 2008). 12. Shirky, “Ontology is Overrated.” 13. Ibid. 14. Michael Arrington, “Exclusive: Screen Shots and Feature Overview of Delicious 2.0 Preview,” online posting, June 16, 2005, TechCrunch, http://www.techcrunch.com/2007/09/06/ exclusive-screen-shots-and-feature-overview-of-delicious-20 -preview/(accessed Jan. 6, 2010). 15. Smith, Tagging, 67–93 . 16. Vander Wal, “Explaining and Showing Broad and Narrow Folksonomies.” 17. Adam Mathes, “Folksonomies—Cooperative Classifica- tion and Communication through Shared Metadata” (graduate paper, University of Illinois Urbana–Champaign, Dec. 2004); Peterson, “Beneath the Metadata”; Shirky, “Ontology is Over- rated”; Thomas and Griffin, “Who Will Create the Metadata for the Internet?” 18. Shirky, “Ontology is Overrated.” 19. Peterson, “Beneath the Metadata.” 20. Cory Doctorow, “Metacrap: Putting the Torch to Seven Straw-Men of the Meta-Utopia,” online posting, Aug. 26, 2001, The Well, http://www.well.com/~doctorow/metacrap.htm (accessed Sept. 15, 2008). 21. Marieke Guy and Emma Tonkin, “Folksonomies: Tidy- ing up Tags?” D-Lib Magazine 12, no. 1 (2006), http://www.dlib .org/dlib/january06/guy/01guy.html (accessed Sept. 8, 2008). 22. Shirky, “Ontology is Overrated.” Global Interoperability continued from page 33 9. Julie Renee Moore, “RDA: New Cataloging Rules, Com- ing Soon to a Library Near You!” Library Hi Tech News 23, no. 9, (2006): 12. 10. Rick Bennett, Brian F. Lavoie, and Edward T. O’Neill, “The Concept of a Work in WorldCat: An Application of FRBR,” Library Collections, Acquisitions, & Technical Services 27, no. 1, (2003): 56. 11. Park, “Cross-Lingual Name and Subject Access.” 12. Ibid. 13. Thomas B. Hickey, “Virtual International Authority File” (Microsoft PowerPoint presentation, ALA Annual Conference, New Orleans, June 2006), http://www.oclc.org/research/ projects/viaf/ala2006c.ppt (accessed Dec. 9, 2009). 14. LEAF, “LEAF Project Consortium,” http://www.crxnet .com/leaf/index.html (accessed Dec. 9, 2009). 15. Bennett, Lavoie, and O’Neill, “The Concept of a Work in WorldCat.” 16. Alan Danskin, “Mature Consideration: Developing Biblio- graphic Standards and Maintaining Values,” New Library World 105, no. 3/4, (2004): 114. 17. Ibid. 18. Bennett, Lavoie, and O’Neill, “The Concept of a Work in WorldCat.” 19. Moore, “RDA.” 20. Danskin, “Mature Consideration,” 116. 21. Ibid.; Park, “Cross-Lingual Name and Subject Access.” 3154 ---- teNDiNG A wilD GARDeN: liBRARY weB DesiGN FOR PeRsONs witH DisABilities | vANDeNBARK 23 R. Todd Vandenbark Tending a Wild Garden: Library Web Design for Persons with Disabilities Nearly one-fifth of Americans have some form of dis- ability, and accessibility guidelines and standards that apply to libraries are complicated, unclear, and difficult to achieve. Understanding how persons with disabilities access Web-based content is critical to accessible design. Recent research supports the use of a database-driven model for library Web development. Existing tech- nologies offer a variety of tools to meet disabled patrons’ needs, and resources exist to assist library professionals in obtaining and evaluating product accessibility infor- mation from vendors. Librarians in charge of technology can best serve these patrons by proactively updating and adapting services as assistive technologies improve. I n March 2007, eighty-two countries signed the United Nations’ Convention on the Rights of Persons with Disabilities, including Canada, the European Community, and the United States. The convention’s purpose was “to promote, protect and ensure the full and equal enjoyment of all human rights and fundamental freedoms by all persons with disabilities, and to promote respect for their inherent dignity.”1 Among the many proscriptions for assuring respect and equal treatment of people with disabilities (PWD) under the law, signatories agreed to take appropriate measures: (g) To promote access for persons with disabilities to new information and communications technolo- gies and systems, including the Internet; and (h) To promote the design, development, production and distribution of accessible information and communications technologies and systems at an early stage, so that these technologies and systems become accessible at minimum cost. In addition, the convention seeks to guarantee equal access to information by doing the following: (c) Urging private entities that provide services to the general public, including through the Internet, to provide information and services in accessible and usable formats for persons with disabilities; and (d) Encouraging the mass media, including providers of information through the Internet, to make their services accessible to persons with disabilities.2 Because the Internet and its design standards are evolv- ing at a dizzying rate, it is difficult to create websites that are both cutting-edge and standards-compliant. This paper evaluates the challenge of Web design as it relates to individuals with disabilities, exploring current standards, and offering recommendations for accessible development. Examining the provision of IT for this demographic is vital because according to the U.S. Census Bureau, the U.S. public includes about 51.2 mil- lion noninstitutionalized people living with disabilities, 32.5 million of which are severely disabled. This means that nearly one-fifth of the U.S. public faces some physi- cal, mental, sensory, or other functional impairment (18 percent in 2002).3 Because a library’s mandate is to make its resources accessible to everyone, it is important to attend to the special challenges faced by patrons with disabilities and to offer appropriate services with those special needs in mind. n Current U.S. regulations, standards, and guidelines In 1990 Congress enacted the Americans with Disabilities Act (ADA), the first comprehensive legislation mandating equal treatment under the law for PWD. The ADA pro- hibits discrimination against PWD in employment, public services, public accommodations, and in telecommunica- tions. Title II of the ADA mandates that all state govern- ments, local governments, and public agencies provide access for PWD to all of their activities, services, and programs. Since school, public, and academic libraries are under the purview of Title II, they must “furnish auxiliary aids and services when necessary to ensure effective com- munication.”4 Though predating widespread use of the Internet, the law’s intent points toward the adoption and adaptation of appropriate technologies to allow persons with a variety of disabilities to access electronic resources in a way that is most effective for them. Changes to Section 508 of the 1973 Rehabilitation Act enacted in 1998 and 2000 introduced the first standards for “accessible information technology recognized by the federal government.”5 Many state and local govern- ments have since passed laws applying the standards of Section 508 to government agencies and related services. According to the Access Board, the independent federal agency charged with assuring compliance with a variety of laws regarding services to PWD, information and com- munication technology (ICT) includes any equipment or interconnected system or subsystem of equipment, that is used in the creation, conversion, or duplication of data or information. The term electronic R. todd vandenbark (todd.vandenbark@utah.edu) is Web Ser- vices Librarian, Eccles health Sciences Library, University of Utah, Salt Lake City. 24 iNFORMAtiON tecHNOlOGY AND liBRARies | MARcH 2010 and information technology includes, but is not limited to, telecommunications products (such as telephones), information kiosks and transaction machines, World Wide Web sites, multimedia, and office equipment such as copiers and fax machines.6 The Access Board further specifies guidelines for “Web-based intranet and internet information and appli- cations,” which are directly relevant to the provision of such services in libraries.7 What follows is a detailed examination of these standards with examples to assist in understanding and implementation. (a) A text equivalent for every non-text element shall be provided. Assistive technology cannot yet describe what pictures and other images look like; they require meaningful text-based information asso- ciated with each picture. If an image directs the user to do something, the associated text must explain the purpose and meaning of the image. This way, someone who cannot see the screen can understand and navigate the page success- fully. This is generally accomplished by using the “alt” and “longdesc” attributes for images: <img src=“image.jpg” alt=“Short description of image.” longdesc=“explanation.txt” />. However, these aids also can clutter a page when not used properly. The current versions of the most popular screen-reader software do not limit the amount of “alt” text they can read. However, Freedom Scientific’s JAWS 6.x divides the “alt” attribute into distinct chunks of 125 characters each (excluding spaces) and reads them separately as if they were separate graphics.8 This can be confusing to the end user. Longer con- tent can be put into a separate text file and the file linked to using the “longdesc” attribute. When a page contains audio or video files, a text alternative needs to be provided. For audio files such as inter- views, lectures, and podcasts, a link to a transcript of the audio file must be immediately available. For video clips such as those on YouTube, captions must accompany the clip. (b) Equivalent alternatives for any multimedia presen- tation shall be synchronized with the presentation. This means that captions for video must be real-time and synchronized with the actions in the video, not contained solely in a separate transcript. (c) Web pages shall be designed so that all informa- tion conveyed with color is also available with- out color, for example from context or markup. While color can be used, it cannot be the sole source or indicator of information. Imagine an edu- cational website offering a story problem presented in black and green print, and the answer to the problem could be deciphered using only the green letters. This would be inaccessible to students who have certain forms of color-blindness as well as those who use screen-reader software. (d) Documents shall be organized so they are read- able without requiring an associated style sheet. The introduction of cascading style sheets (CSS) can improve accessibility because they allow the separation of presentation from content. However, not all browsers fully support CSS, so webpages need to be designed so any browser can read them accurately. The content needs to be organized so that it can be read and understood with CSS for- matting turned off. (e) Redundant text links shall be provided for each active region of a server-side image map, and (f) Client-side image maps shall be provided instead of server-side image maps except where the regions cannot be defined with an available geometric shape. An image map can be thought of as a geometri- cally defined and arranged group of links to other content on a site. A clickable map of the fifty U.S. states is an example of a functioning image map. A server-side image map would appear to a screen reader only as a set of coordinates, whereas client- side maps can include information about where the link leads through “alt” text. The best practice is to only use client-side image maps and make sure the “alt” text is descriptive and meaningful. (g) Row and column headers shall be identified for data tables, and (h) Markup shall be used to associate data cells and header cells for data tables that have two or more logical levels of row or column headers. Correct table coding is critical. Each table should use the “table summary” attribute to provide a meaningful description of its content and arrange- ment: <table summary=“Concise explanation belongs here.”>. Headers should be coded using the table header (“th”) tag, and its “scope” attri- bute should specify whether the header applies to a row or a column: <th scope=“col”> or <th scope=“row”>. If the table’s content is complex, it may be necessary to provide an alternative presen- tation of the information. It is best to rely on CSS for page layout, taking into consideration the direc- tions in subparagraph (d) above. (i) Frames shall be titled with text that facili- tates frame identification and navigation. Frames are a deprecated feature of HTML, and their use should be avoided in favor of CSS layout. (j) Pages shall be designed to avoid caus- ing the screen to flicker with a frequency greater than 2 Hz and lower than 55 Hz. Lights with flicker rates in this range can trigger epileptic seizures. Blinking or flashing elements on teNDiNG A wilD GARDeN: liBRARY weB DesiGN FOR PeRsONs witH DisABilities | vANDeNBARK 25 a webpage should be avoided until browsers pro- vide the user with the ability to control flickering. (k) A text-only page, with equivalent information or functionality, shall be provided to make a Web site comply with the provisions of this part, when compliance cannot be accomplished any other way. The content of the text-only page shall be updated whenever the primary page changes. Complex content that is entirely visual in nature may require a separate text-only page, such as a page showing the English alphabet in American Sign Language. This requirement also serves as a stopgap measure for existing sites that require reworking for accessibility. Some consider this to be the Web’s version of separate-but-equal ser- vices, and should be avoided.9 Offering a text-only alternative site can increase the sense of exclusion that PWD already feel. Also, such versions of a website tend not to be equivalent to the parent site, leaving out promotions or advertisements. Finally, a text-only version increases the workload of Web development staff, making them more costly than creating a single, fully accessible site in the first place. (l) When pages utilize scripting languages to display content, or to create interface elements, the informa- tion provided by the script shall be identified with functional text that can be read by assistive technology. Scripting languages such as JavaScript allow for more interactive content on a page while reducing the number of times the computer screen needs to be refreshed. If functional text is not available, the screen reader attempts to read the script’s code, which outputs as a meaningless jumble of charac- ters. Using redundant text links avoids this result. (m) When a Web page requires that an applet, plug-in, or other application be present on the client system to interpret page content, the page must provide a link to a plug-in or applet that complies with [Subpart B: technical standards] §1194.22(a) through (i). Web developers need to ascertain whether a given plug-in or applet is accessible before requiring their webpage’s visitors to use it. When using applications such as QuickTime or RealAudio, it is important to provide an accessible link on the same page that will allow users to install the necessary plug-in. (n) When electronic forms are designed to be completed on-line, the form shall allow people using assistive technology to access information, field elements, and functionality required for completion and submis- sion of the form, including all directions and cues. If scripts used in the completion of the form are inaccessible, an alternative method of completing the form must be made immediately available. Each element of a form needs to be labeled prop- erly using the <label> tag. (o) A method shall be provided that per- mits users to skip repetitive navigation links. Persons using screen reader software typically navigate through pages using the Tab key, listen- ing as the text is read aloud. Websites commonly place their logo at the top of each page and make this graphic a link to the site’s homepage. Many sites also use a line of graphic images just beneath this logo on every page to serve as a navigation bar. To avoid having to listen through this same list of links on every page just to get to the page’s content, a “skip to content” link as the first option at the top of each page provides a simple solution to this problem. (p) When a timed response is required, the user shall be alerted and given sufficient time to indicate more time is required. Some sites log a user off if they have not typed or otherwise interacted with the page after a certain time period. Users must be notified in advance that this is going to happen and given sufficient time to respond and request more time as needed. n Standards-setting groups and their work One organization that seeks to move Internet tech- nology beyond basic Section 508 compliance is the Web Accessibility Initiative (WAI) of the World Wide Web Consortium (W3C). The mission of the WAI is to develop n guidelines that are widely regarded as the interna- tional standard for Web accessibility; n support materials to help understand and imple- ment Web accessibility; and n resources through international collaboration.10 The W3C published its first Web Content Accessibility Guidelines (WCAG 1.0) in May of 1999 for making online content accessible to PWD. By following these guidelines, developers create Web content that is readily available to every user regardless of the way it’s accessed. The WAI provides ten quick tips for improving accessibility in website design: n Images and animations. Use the “alt” attribute to describe the function of each visual. n Image maps. Use the client-side map and text for hotspots. n Multimedia. Provide captioning and transcripts of audio, and descriptions of video. 26 iNFORMAtiON tecHNOlOGY AND liBRARies | MARcH 2010 n Hypertext links. Use text that makes sense when read out of context. For example, avoid “click here.” n Page organization. Use headings, lists, and consis- tent structure. Use CSS for layout and style where possible. n Graphs and charts. Summarize or use the “longdesc” attribute. n Scripts, applets, and plug-ins. Provide alternative content in case active features are inaccessible or unsupported. n Frames. Use the “noframes” element and meaning- ful titles. n Tables. Make line-by-line reading sensible. Summarize. n Check your work. Validate. Use tools, checklist, and guidelines at http://www.w3.org/TR/WCAG.11 Many libraries and other organizations have sought to follow WCAG 1.0 since it was published. Recently, the W3C updated their standards to WCAG 2.0, and the WAI website offers an overview of these guidelines along with a “customizable quick reference” designed to facilitate successful compliance. The principles behind 2.0 can be summarized by the acronym P.O.U.R. Perceivable n Provide text alternatives for non-text content. n Provide captions and alternatives for multimedia. n Make information adaptable and available to assistive technologies. n Use sufficient contrast to make things easy to see and hear. Operable n Make all functionality keyboard accessible. n Give users enough time to read and use content. n Do not use content known to cause seizures. n Help users navigate and find content. Understandable n Make text readable and understandable. n Make content appear and operate in predictable ways. n Help users avoid and correct mistakes. Robust n Maximize compatibility with current and future technologies.12 These guidelines offer assistance in creating acces- sible Web-based materials. Given their breadth, however, they raise concerns of overly wide interpretation and the strong possibility of falling short of Section 508 standards. Reading the details in WCAG 2.0 does not give any additional assistance to library Web developers on how to create a Section 508–compliant website. Clark points out that the three WCAG 2.0 documents are long (72–165 pages), confusing, and sometimes internally contradic- tory.13 The goal of a library webmaster is to provide an interface (website, OPAC, database, and so on) that is both cutting-edge and accessible, and to encourage its use by patrons of all ability levels. While they have outlined a helpful rationale, the W3C’s overlong guidelines do little to help library Web developers to achieve this goal. n Recommendations Libraries today typically offer three types of Web-based resources: (1) access to the Internet, (2) access to subscrip- tion databases, and (3) a library’s own webpage, all of which need to be accessible to PWD. Libraries trying to comply with Section 508 are required to “furnish auxil- iary aids and services when necessary to ensure effective communication.”14 There are a number of options avail- able to libraries on tight budgets. The first set involves the features built into each computer’s operating sys- tem and software. For some users with visual impair- ments, enlarging the font size of text and images on the screen will make electronic content more accessible. Both Macintosh and Windows system software have universal-access capabilities built in, including the ability to read aloud text that is on the screen using synthesized speech. The Mac read-aloud tool is called Voice Over; the Windows read-aloud tool is called Narrator. Both systems allow for screen magnification. Exploring and learning the capabilities of these systems to enhance accessibility is a free and easy first step for any library’s technology offerings, regardless of funding restrictions. Libraries with more substantial technology budgets have a wide variety of hardware and software options to choose from to meet the needs of PWD. For patrons with visual impairments, several software packages are available to read aloud the content of a website or other electronic document using synthesized speech. JAWS by Freedom Scientific and WindowEyes by GW Micro are two of the best-known software packages, and both include the ability to output to a refreshable Braille dis- play (which both companies also sell). Kurzweil 3000 is an education-oriented software package that not only reads on-screen text aloud but has a wealth of additional tools to assist students with learning difficulties such as attention deficit disorder or dyslexia. It is designed to integrate with any education package as well as to assist students whose primary language is not English. Persons with low vision needing screen magnification beyond the features Windows offers may look to Magic by Freedom Scientific or ZoomText by Ai Squared. Some of these teNDiNG A wilD GARDeN: liBRARY weB DesiGN FOR PeRsONs witH DisABilities | vANDeNBARK 27 software companies offer free trial versions, have online demonstrations, or both. Because prices for this software and related equipment can be high, it is prudent to first check with patrons with visual impairments and profes- sionals in the field prior to making your purchase. Humbert and Stores, members of Indiana University’s Web Accessibility Team, offer accessibility evaluations of websites and other services at the university. When asked to compare Windows and Macintosh systems as to their usefulness in assisting PWD with Web-based media, Humbert rated the Windows operating system superior, explaining that it has the proper “handles” coded into its software for screen readers and assistive technologies to grab onto. Assistive technology software is more stable in Windows Vista because its predecessor, Windows XP, “used hacked together drivers to display the informa- tion.”15 Humbert discourages the use of Vista and JAWS on an older machine because Vista is a memory hog and can crash JAWS along with the rest of the system. The Web browsers Internet Explorer and Firefox allow the user to enlarge text and images on a webpage, though Firefox is more effective. Text can be enlarged only if the webpage being viewed is designed using resizable fonts. Stores, who is profoundly visually impaired, uses JAWS screen-reader software to work and to surf the Web. She notes that both browsers work equally well with screen- reader software.16 An important Web-based resource that libraries pro- vide is subscription databases. However, as one study has shown, “most librarians lack the time, resources and/or skills to evaluate the degree to which their library subscription databases are accessible to their disability communities.”17 The question is do the vendors them- selves make an effort to produce an accessible product? A 2007 survey of twelve major database companies found that while most “have integrated accessibility standards/ guidelines into their search interfaces and/or plan to improve accessibility in future releases,” only five actu- ally conducted usability studies with people who use assistive technology. A number of studies have found that “while most databases are functionally accessible, com- panies need to do more to meet the needs of the disability community and assure librarians of the accessibility of their products.”18 Subscription databases can be inaccessible to PWD in the display of search results and accompanying infor- mation. The three most common forms of results deliv- ery are HTML full text, HTML full text with graphics, and PDF files. PDF files are notoriously inaccessible to persons using screen readers. While Adobe has made significant strides in rendering PDFs accessible, many databases contain numerous PDF documents created in versions of Adobe Acrobat prior to version 5.0 (released in 2001), which are not properly tagged for screen read- ers. Even newer PDF documents are only as accessible as their tagging allows. Journal articles received from publishers may or may not be properly tagged, so data- base companies cannot guarantee that their content is fully accessible. One vendor that is avoiding this trap is JSTOR. Using optical character recognition (OCR) soft- ware, JSTOR delivers image-based PDFs with embedded text to make their content available to screen readers.19 Librarians must insist that database packages be acces- sible and compatible with the forms of assistive technol- ogy most frequently used by their patrons, both in-house and online. One tool used to evaluate database (or other prod- uct) accessibility is the Voluntary Product Accessibility Template (VPAT). Created in partnership between the Information Technology Industry (ITI) Council and the U.S. General Services Administration (GSA) in 2001, it provides “a simple, Internet-based tool to assist Federal contracting and procurement officials in fulfilling the new market research requirements contained in the Section 508 implementing regulations.”20 VPAT is a voluntary disclosure form arranged in a series of tables listing the criteria of relevant subsections of Section 508 discussed previously. Blank cells are provided to allow company representatives to describe how their product’s support- ing features meet the criteria and to provide additional detailed information. Library personnel can request that vendors complete this form to document which sub- sections of Section 508 their products meet, and how. To be most useful, the form needs to be completed by company representatives with both a clear understand- ing of Section 508 and its technical details and thorough knowledge of their product. Knowledgeable library staff are encouraged to verify the quality and accuracy of the information provided before purchasing. Like databases, a library’s website needs to be acces- sible to patrons with a variety of needs. According to Muncaster, accessible sites are 35 percent easier for every- one to use and are more likely to be found by Internet search engines.21 Fully accessible websites are simpler to maintain and are on average 50 percent smaller than inaccessible ones, which means they download faster, making them easier to use.22 In creating a basic site, cur- rent best practice has been to render the content in HTML or XHTML and design the layout using CSS. This way, if it is discovered the site’s pages are not fully accessible, a simple change to the CSS updates all pages, saving the site manager time and effort. Finally, creating an acces- sible site from the beginning is substantially easier than retrofitting an old one. A complete rebuild of a library website is an opportu- nity to improve accessibility. Reynolds’ article on creating a user-centered website for the Johnson County (Kans.) Library offers an example of how libraries can apply basic information architecture design principles on a budget. Johnson County focused on simple, low-budget 28 iNFORMAtiON tecHNOlOGY AND liBRARies | MARcH 2010 usability studies involving patrons in the selection of site navigation categories, designing the layout, and testing the resulting user interface. By involving average users in this process, this library was able to achieve substantial improvements in the site’s usability. Prior to the redesign, usability testing determined that 42 percent of users were not successful in finding information on the library’s old site. After the redesign, “only 4% of patrons were unsuccessful in finding core-task information on the first attempt.”23 Even so, a quick test of the site with the online accessibility evaluation tool CynthiaSays indicates that it still does not fully meet the requirements of Section 508. Had the library’s staff included PWD in their process, the demonstrated degree of improvement might have allowed them to meet and possibly exceed this standard. An understanding of how a person with disabilities experiences the online environment can help point the way toward improved accessibility. A recent study in the United Kingdom tracked the eye movements of able- bodied computer users in an effort to answer these ques- tions. Researchers asked eighteen people with normal or corrected vision to search for answers on two versions of a BBC website—the standard graphical page and the text- only version. Subjects’ eyes tended to dart around the standard page “as they attempt to locate what appears visually to be the next most likely location”24 for the answer. But in searching the text-only page, subjects went line-by-line, making smaller jumps across each page. Researchers determined that the webpage and its layout serve as a form of external memory, providing visual cues to the structure of its content and how to navigate it. If the Internet is an information superhighway, then the layout of a standard webpage serves as the borders and directional signs for browsing. The visual cues and navigation aids inherent in cur- rent webpages’ layouts provide no auditory equivalent for presentation to people with visual impairments. Information seeking on the Web is a complex process requiring “the ability to switch and coordinate among multiple information-seeking strategies” such as brows- ing, scanning, query-based searching, and so on.25 If Web browsers could translate formatting and presentation into audio tailored to the needs of the visually impaired, the use of the Internet would be a far more satisfying experi- ence for those users. However, such Web programming would require years of additional research and develop- ment. In the meantime, Web librarians must strive to build sites that are clean, hierarchical, and usable by all persons by following to the standards and guidelines currently available. One way to enhance the accessibility of sites is to fol- low a database-driven Web development model. In addi- tion to using XHTML and CSS, Dunlap recommends that content be stored in a relational database such as MySQL and that a coding language such as PHP be used to create pages dynamically. This approach has two advantages. First, it allows for the creation of “a flexible website design style that lives in a single, easily modified file that controls the presentation of every Web page of the site.”26 Second, it requires far less time for site maintenance, freeing staff to devote time to assuring accessibility while accommodating changes in Web technology. Such a model can be used by database vendors to ensure that their services can seamlessly integrate with the library’s online content. Use of mobile phones and similar devices to browse the Web is at an all-time high, and content providers are eager to make their sites mobile-friendly. Many of these end users experience similar barriers to accessing this content as PWD do. For example, persons with some motor disabilities as well as mobile phones with only a numeric keypad cannot access sites with links requiring the use of a mouse. Sites that follow either the W3C’s Mobile Web Best Practices (MWBP) or WCAG are well on their way to meeting both standards.27 By properly asso- ciating labels with their controls, Internet content can be made fully accessible to both end users. Understanding the similarities between MWPB and WCAG can lead to website design that is truly perceivable, operable, under- standable, and robust. n Summary Librarians with responsibility for Web design and tech- nology management operate in an evolving environment. Legal requirements make clear the expectation to serve the wide variety of needs of patrons with disabilities. Yet the guidelines and standards available to assist in this venture range from complex to vague and insufficient. Assistive technologies continue to improve with many traditional vendors confident that their products are accessible. In actual use, however, substantial challenges and shortcomings remain. The challenge for technology librarians is to be proactive in keeping abreast of tech- nological advances, to experiment and learn from their efforts, and to continually update and adapt to provide Web or hypermedia information and services to patrons of all kinds. References 1. United Nations, Convention on the Rights of Persons with Disabilities, 2008, http://www.un.org/disabilities/default .asp?navid=12&pid=150 (accessed Aug. 10, 2009). 2. Ibid. 3. Erika Steinmetz, Americans with Disabilities (Washington, D.C.: U.S. Census Bureau, 2002). teNDiNG A wilD GARDeN: liBRARY weB DesiGN FOR PeRsONs witH DisABilities | vANDeNBARK 29 4. U.S. Department of Justice, Civil Rights Division, Disabil- ity Rights Section, “Title II Highlights,” Aug. 29, 2002, http:// www.ada.gov/t2hlt95.htm (accessed July 26, 2008). 5. Marilyn Irwin, Resources and Services for People with Dis- abilities: Lesson 1b Transcript (Indianapolis: Indiana University at Indianapolis School of Library and Information Science, 2008): 10 6. Ibid., 10 7. 1998 Amendment to Section 508 of the Rehabilitation Act, Subpart B—Technical Standards, §1194.22, http://www .section508.gov/index.cfm?FuseAction=content&ID=12#Appli cation (access Dec. 2, 2009). 8. Access IT, “How Long Can an ‘Alt’ Attribute Be?” Uni- versity of Washington, 2008, http://www.washington.edu/ accessit/articles?257 (accessed Dec. 12, 2008). 9. Matt May, “On ‘Separate but Equal’ Design,” online post- ing, June 24, 2004, bestkungfu weblog, http://www.bestkungfu .com/archive/date/2004/06/on-separate-but-equal-design/ (accessed Dec. 18, 2008). 10. Web Accessibility Initiative, “WAI Mission and Organiza- tion,” 2008, http://www.w3.org/WAI/about.html (accessed July 22, 2008). 11. Shawn Lawton Henry and Pasquale Popolizio, “WAI, Quick Tips to Make Accessible Web Sites,” World Wide Web Consortium, Feb. 5, 2008, http://www.w3.org/WAI/quicktips/ Overview.php (accessed Mar. 30, 2008). 12. Ben Caldwell et al., “Web Content Accessibility Guide- lines (WCAG) 2.0,” World Wide Web Consortium, Dec. 11, 2008, http://www.w3.org/TR/WCAG20/ (accessed July 27, 2008). 13. Joe Clark, “To Hell with WCAG 2,” A List Apart no. 217 (May 26, 2006), http://www.alistapart.com/articles/tohellwith wcag2 (accessed July 25, 2008). 14. U.S. Department of Justice, “Title II Highlights.” 15. Joseph A. Humbert and Mary Stores, Questions about New Software and Accessibility (Richmond, Ind., July 28, 2008). 16. Ibid. 17. S. L. Byerley, M. B. Chambers, and M. Thohira, “Acces- sibility of Web-Based Library Databases: The Vendors’ Perspec- tives in 2007,” Library Hi Tech 25, no. 4 (2007): 509–27. 18. Ibid. 19. P. Muncaster, “Poor Accessibility Has a Price,” VNU Net, Feb. 9, 2006, http://www.vnunet.com/articles/send/2150099 (accessed July 27, 2008). 20. Information Technology Industry Council, “FAQ: Volun- tary Product Accessibility Template (VPAT),” http://www.itic .org/archives/articles/20040506/faq_voluntary_product_ accessibility_template_vpat.php (accessed July 29, 2008). 21. Muncaster, “Poor Accessibility Has a Price.” 22. Isaac Hunter Dunlap, “How Database-Driven Web Sites Enhance Accessibility,” Library Hi Tech 23, no. 8 (2008): 34–38. 23. Erica Reynolds, “The Secret to Patron-Centered Web Design: Cheap, Easy, and Powerful Usability Techniques,” Com- puters in Libraries 28, no. 6 (2008): 6–47. 24. Caroline Jay et al., “How People Use Presentation to Search for a Link: Expanding the Understanding of Accessibility on the Web,” Universal Access in the Information Society 6, no. 3 (2006): 307–20. 25. C. Kouroupetroglou, M. Salampasis, and A. Manitsaris, “Browsing Shortcuts as a Means to Improve Information Seek- ing of Blind People in the WWW,” Universal Access in the Informa- tion Society 6, no. 3 (2007): 11. 26. Dunlap, “How Database-Driven Web Sites Enhance Accessibility.” 27. Web Accessibility Initiative, “Mobile Web Best Prac- tices 1.0,” July 29, 2008, http://www.w3.org/TR/mobile-bp (accessed Aug. 10, 2009). 3155 ---- 30 iNFORMAtiON tecHNOlOGY AND liBRARies | MARcH 2010 The Path toward Global Interoperability in Cataloging Ilana Tolkoff Libraries began in complete isolation with no uniformity of standards and have grown over time to be ever more interoperable. This paper examines the current steps toward the goal of universal interoperability. These projects aim to reconcile linguistic and organizational obstacles, with a particular focus on subject headings, name authorities, and titles. I n classical and medieval times, library catalogs were completely isolated from each other and idiosyncratic. Since then, there has been a trend to move toward greater interoperability. We have not yet attained this international standardization in cataloging, and there are currently many challenges that stand in the way of this goal. This paper will examine the teleological evolution of cataloging and analyze the obstacles that stand in the way of complete interoperability, how they may be overcome, and which may remain. This paper will not provide a comprehensive list of all issues pertaining to interoper- ability; rather, it will attempt to shed light on those issues most salient to the discussion. Unlike the libraries we are familiar with today, medi- eval libraries worked in near total isolation. Most were maintained by monks in monasteries, and any regulations in cataloging practice were established by each religious order. One reason for their lack of regulations was that their collections were small by our standards; a monastic library had at most a few hundred volumes (a couple thousand in some very rare cases). The “armarius,” or librarian, kept more of an inventory than an actual cata- log, along with the inventories of all other valuable pos- sessions of the monastery. There were no standard rules for this inventory-keeping, although the armarius usually wrote down the author and title, or incipit if there was no author or title. Some of these inventories also contained bibliographic descriptions, which most often described the physical book rather than its contents. The inventories were usually taken according to the shelf organization, which was occasionally based on subject, like most librar- ies are today. These trends in medieval cataloging varied widely from library to library, and their inventories were entirely different from our modern OPACs. The inventory did not provide users access to the materials. Instead, the user consulted the armarius, who usually knew the col- lection by heart. This was a reasonable request given the small size of the collections.1 This type of nonstandardized cataloging remained relatively unchanged until the nineteenth century, when Charles C. Jewett introduced the idea of a union catalog. Jewett also proposed having stereotype plates for each bibliographic record, rather than a book catalog, because this could reduce costs, create uniformity, and organize records alphabetically. This was the precursor to the twentieth-century card catalog. While many of Jewett’s ideas were not actually practiced during his lifetime, they laid the foundation for later cataloging practices.2 The twentieth century brought a great revolution in cataloging standards, particularly in the United States. In 1914, the Library of Congress Subject Headings (LCSH) were first published and introduced a controlled vocabu- lary to American cataloging. The 1960s saw a wide array of advancements in standardization. The Library of Congress (LC) developed MARC, which became a national standard in 1973. It also was the time of the cre- ation of Anglo-American Cataloguing Rules (AACR), the Paris Principles, and International Standard Bibliographic Description (ISBD). While many of these standardization projects were uniquely American or British phenomena, they quickly spread to other parts of the world, often in translated versions.3 While the technology did not yet exist in the 1970s to provide widespread local online catalogs, technology did allow for union catalogs containing the records of many libraries in a single database. These union catalogs included the Research Libraries Information Network (RLIN), the OCLC Online Computer Library Center (OCLC), and the Western Library Network (WLN). In the 1980s the local online public access catalog (OPAC) emerged, and in the 1990s OPACs migrated to the Web (WebPACs).4 Currently, most libraries have OPACs and are members of OCLC, the largest union catalog, used by more than 71,000 libraries in 112 countries and ter- ritories.5 Now that most of the world’s libraries are on OCLC, librarians face the challenge and inconvenience of dis- crepancies in cataloging practice due to the differing stan- dards of diverse countries, languages, and alphabets. The fields of language engineering and linguistics are work- ing on various language translation and analysis tools. Some of these include machine translation; ontology, or the hierarchical organization of concepts; information extraction, which deciphers conceptual information from unorganized information, such as that on the Web; text summarization, in which computers create a short sum- mary from a long piece of text; and speech processing, which is the computer analysis of human speech.6 While these are all exciting advances in information technol- ogy, as of yet they are not intelligent enough to help us establish cataloging interoperability. It will be interesting to see whether language engineering tools will be capable of helping catalogers in the future, but for now they are ilana tolkoff (ilana.tolkoff@gmail.com) holds a BA in music and italian from vassar College, an MA in musicology from Brandeis University, and an MLS from the University at Buffalo. She is cur- rently seeking employment as a music librarian. tHe PAtH tOwARD GlOBAl iNteROPeRABilitY iN cAtAlOGiNG | tOlKOFF 31 best at making sense of unstructured information, such as the Web. The interoperability of library catalogs, which consist of highly structured information, must be tackled through software that innovative librarians of the future will produce. In an ideal world, OCLC would be smoothly interop- erable at a global level. A single thesaurus of subject headings would have translations in every language. There would be just one set of authority files. All mani- festations of a single work would be grouped under the same title, translatable to all languages. There would be a single bibliographic record for a single work, rather than multiple bibliographic records in different languages for the same work. This single bibliographic record could be translatable into any language, so that when searching in WorldCat, one could change the settings to any language to retrieve records that would display in that chosen lan- guage. When catalogers contribute to OCLC, they would create the records in their respective languages, and once in the database the records would be translatable to any other language. Because records would be so fluidly translatable, an OPAC could be searched in any language. For example, the default settings for the University at Buffalo’s OPAC could be English, but patrons could change those settings to accommodate the great variety of international students doing research. This vision is uto- pian to say the least, and it is doubtful that we will ever reach this point. But it is valuable to establish an ideal scenario to aim our innovation in the right direction. One major obstacle in the way of global interoper- ability is the existence of different alphabets and the inherently imperfect nature of transliteration. There are essentially two types of transliteration schemes: those based on phonetic structure and those based on mor- phemic structure. The danger of phonetic transliteration, which mimics pronunciation, is that semantics often get lost. It fails to differentiate between homographs (words that are spelled and pronounced the same way but have different meanings). Complications also arise when there are differences between careful and casual styles of speech. Park asserts, “When catalogers transcribe words according to pronunciation, they can create inconsistent and arbitrary records.”7 Morphemic transliteration, on the other hand, is based on the meanings of morphemes, and sometimes ends up being very different from the pronunciation in the source language. One advantage to this, however, is that it requires fewer diacritics than phonetic transliteration. Park, whose primary focus is on Korean–Roman transliteration, argues that the McCune Reischauer phonetic transliteration that libraries use loses too much of the original meaning. In other alphabets, however, phonetic transliteration may be more beneficial, as in the LC’s recent switch to Pinyin transliteration in Chinese. The LC found Pinyin to be more easily search- able than Wade-Giles or monosyllabic Pinyin, which are both morphemic. However, another problem with translit- eration that neither phonetic nor morphemic schemes can solve is word segmentation—how a transliterated word is divided. This becomes problematic when there are no contextual clues, such as in a bibliographic record.8 Other obstacles that stand in the way of interoperabil- ity are the diverse systems of subject headings, author- ity headings, and titles found internationally. Resource Description and Access (RDA) will not deal with subject headings because it is such a hefty task, so it is unlikely that subject headings will become globally interoperable in the near future.9 Fortunately, twenty-four national libraries of English speaking countries use LCSH, and twelve non-English-speaking countries use a translated or modified version of LCSH. This still leaves many more countries that use their own systems of subject headings, which ultimately need to be made interoperable. Even within a single language, subject headings can be compli- cated and inconsistent because they can be expressed as a single noun, compound noun, noun phrase, or inverted phrase; the problem becomes even greater when trying to translate these to other languages. Bennett, Lavoie, and O’Neill note that catalogers often assign different subject headings (and classifications) to different manifestations of the same work.10 That is, the record for the novel Gone with the Wind might have different subject headings than the record for the movie. This problem could poten- tially be resolved by the Functional Requirements for Bibliographic Records (FRBR), which will be discussed below. Translation is a difficult task, particularly in the con- text of strict cataloging rules. It is especially complicated to translate among unrelated languages, where one might be syntactic and the other inflectional. This means that there are discrepancies in the use of prepositions, con- junctions, articles, and inflections. The ability to add or remove terms in translation creates endless variations. A single concept can be expressed in a morpheme, a word, a phrase, or a clause, depending on the language. There also are cultural differences that are reflected in differ- ent languages. Park gives the example of how Anglo- American culture often names buildings and brand names after people, reflecting our culture’s values of individualism, while in Korea this phenomenon does not exist at all. On the other hand, Korean’s use of formal and informal inflections reflects their collectivist hierarchical culture. Another concept that does not cross cultural lines is the Korean pumasi system in which family and friends help someone in a time of need with the understanding that the favor will be returned when they need it. This cannot be translated into a single English word, phrase, or subject heading. One way of resolving ambiguity in translations is through modifiers or scope notes, but this is only a partial solution.11 Because translation and transliteration are so difficult, 32 iNFORMAtiON tecHNOlOGY AND liBRARies | MARcH 2010 as well as labor-intensive, the current trend is to link already existing systems. Multilingual Access to Subjects (MACS) is one such linking project that aims to link subject headings in English, French, and German. It is a joint project under the Conference of European National Librarians among the Swiss National Library, the Bibliothèque nationale de France (BnF), the British Library (BL), and Die Deutsche Bibliothek (DDB). It aims to link the English LCSH, the French Répertoire d’autorité matière encyclopédique et alphabétique unifié (RAMEAU), and the German Schlagwortnormdatei/ Regeln für den Schlagwortkatalog (SWD/RSWK). This requires manually analyzing and matching the concepts in each heading. If there is no conceptual equivalent, then it simply stands alone. MACS can link between headings and strings or even create new headings for linking pur- poses. This is not as fruitful as it sounds, however, as there are fewer correspondences than one might expect. The MACS team experimented with finding correspondences by choosing two topics: sports, which was expected to have a particularly high number of correspondences, and theater, which was expected to have a particularly low number of correspondences. Of the 278 sports head- ings, 86 percent matched in all three languages, 8 percent matched in two, and 6 percent was unmatched. Of the 261 theater headings, 60 percent matched in three lan- guages, 18 percent matched in two, and 22 percent was unmatched.12 Even in the most cross-cultural subject of sports, 14 percent of terms did not correspond fully, mak- ing one wonder whether linking will work well enough to prevail. A similar project—the Virtual International Authority File (VIAF)—is being undertaken for authority headings, a joint project of the LC, the BnF, and DDB, and now including several other national libraries. VIAF aims to link (not consolidate) existing authority files, and its beta version (available at http://viaf.org) allows one to search by name, preferred name, or title. OCLC’s software mines these authority files and the titles associated with them for language, LC control number, LC classifica- tion, usage, title, publisher, place of publication, date of publication, material type, and authors. It then derives a new enhanced authority record, which facilitates map- ping among authority records in all of VIAF’s languages. These derived authority records are stored on OAI serv- ers, where they are maintained and can be accessed by users. Users can search VIAF by a single national library or broaden their possibilities by searching all participat- ing national libraries. As of 2006, between the LC’s and DDB’s authority files, there were 558,618 matches, includ- ing 70,797 complex matches (one-to-many), and 487,821 unique matches (one-to-one) out of 4,187,973 LC names and 2,659,276 DDB names. Ultimately, VIAF could be used for still more languages, including non-Roman alphabets.13 Recently the National Library of Israel has joined, and VIAF can link to the Hebrew alphabet. A similar project to VIAF that also aimed to link authority files was Linking and Exploring Authority Files (LEAF), which was under the auspices of the Information Society Technologies Programme of the Fifth Framework of the European Commission. The three-year project began in 2001 with dozens of libraries and organizations (many of which are national libraries), representing eight languages. Its website describes the project as follows: Information which is retrieved as a result of a query will be stored in a pan-European “Central Name Authority File.” This file will grow with each query and at the same time will reflect what data records are rel- evant to the LEAF users. Libraries and archives want- ing to improve authority information will thus be able to prioritise their editing work. Registered users will be able to post annotations to particular data records in the LEAF system, to search for annotations, and to download records in various formats.14 Park identifies two main problems with linking authority files. One is that name authorities still contain some language-specific features. The other is that disam- biguation can vary among name authority systems (e.g., birth/death dates, corporate qualifiers, and profession/ activity). These are the challenges that projects like LEAF and VIAF must overcome. While the linking of subject headings and name authorities is still experimental and imperfect, the FRBR model for linking titles is much more promising and will be incorporated in the soon-to-be-released RDA. According to Bennett, Lavoie, and O’Neill, there are three important benefits to FRBR: (1) it allows for different views of a bibliographic database, (2) it creates a hierarchy of bibliographic entities in the catalog such that all versions of the same work fall into a single collapsible entry point, (3) and the confluence of the first two benefits makes the cata- log more efficient. In the FRBR model, the bibliographic record consists of four entities: (1) the work, (2) the expres- sion, (3) the manifestation, and (4) the item. All manifesta- tions of a single work are grouped together, allowing for a more economical use of information because the title needs to be entered only once.15 That is, a “title authority file” will exist much like a name authority file. This means that all editions in all languages and in all formats would be grouped under the same title. For example, the Lord of the Rings title would include all novels, films, translations, and editions in one grouping. This would reduce the number of bibliographic records, and as Danskin notes, “The idea of creating more records at a time when publishing output threatens to outstrip the cataloguing capacity of national bibliographic agencies is alarming.”16 The FRBR model is particularly beneficial for com- plex canonical works like the Bible. There are a small number of complex canonical works, but they take up a tHe PAtH tOwARD GlOBAl iNteROPeRABilitY iN cAtAlOGiNG | tOlKOFF 33 disproportionate number of holdings in OCLC.17 Because this only applies to a small number of works, it would not be difficult to implement, and there would be a disproportionate benefit in the long run. There is some uncertainty, however, in what constitutes a complex work and whether certain items should be grouped under the same title.18 For instance, should Prokofiev’s Romeo and Juliet be grouped with Shakespeare’s? The advantage of the FRBR model for titles over subject headings or name authorities is that no such thing as a title authority file exists (as conceptualized by FRBR). We would be able to start from scratch, creating such title authority files at the international level. Subject headings and name authori- ties, on the other hand, already exist in many different forms and languages so that cross-linking projects like VIAF might be our only option. It is encouraging to see the strides being made to make subject headings, name authority headings, and titles globally interoperable, but what about other access points within a record’s bibliographic description? These are usually in only one language, or two if cataloged in a bilingual country. Should these elements (format, contents, and so on) be cross-linked as well, and is this even possible? What should reasonably be considered an access point? Most people search by subject, author, or title, so perhaps it is not worth making other types of access points interoperable for the few occasions when they are useful. Yet if 100 percent universal interoperabil- ity is our ultimate utopian goal, perhaps we should not settle for anything less than true international access to all fields in a record. Because translation and transliteration are such com- plex undertakings, linking of extant files is the future of the field. There are advantages and disadvantages to this. On the one hand, linking these files is certainly bet- ter than having them exist only for their own countries. They are easily executed projects that would not require a total overhaul of the way things currently stand. The disadvantages are not to be ignored, however. The fact that files do not correspond perfectly from language to language means that many files will remain in isolation in the national library that created them. Another problem is that cross-linking is potentially more confusing to the user; the search results on http://www.viaf.org are not always simple and straightforward. If cross-linking is where we are headed, then we need to focus on a more user-friendly interface. If the ultimate goal of interoper- ability is simplification, then we need to actually simplify the way query results are organized rather than make them more confusing. Very soon RDA will be released and will bring us to a new level of interoperability. AACR2 arrived in 1978, and though it has been revised several times, it is in many ways outdated and mainly applies to books. RDA will bring something completely new to the table. It will be flexible enough to be used in other metadata schemes besides MARC, and it can even be used by different industries such as publishers, museums, and archives.19 Its incorporation of the FRBR model is exciting as well. Still, there are some practical problems in implementing RDA and FRBR, one of which is that reeducating librar- ians about the new rules will be costly and take time. Also, FRBR in its ideal form would require a major over- haul of the way OCLC and integrated library systems currently operate, so it will be interesting to see to what extent RDA will actually incorporate FRBR and how it will be practically implemented. Danskin asks, “Will the benefits of international co-operation outweigh the costs of effecting changes? Is the USA prepared to change its own practices, if necessary, to conform to European or wider IFLA standards?”20 It seems that the United States is in fact ready and willing to adopt FRBR, but to what extent is yet to be determined. What I have discussed in this paper are some of the more prominent international standardization projects, although there are countless others, such as EuroWordNet, the Open Language Archives Community (OLAC), and International Cataloguing Code (ICC), to name but a few.21 In general, the current major projects consist of linking subject headings, name authority files, and titles in multiple languages. Linking may not have the best cor- respondence rates, we have still not begun to tackle the cross-linking of other bibliographic elements, and at this point search results may be more confusing than help- ful. But the existence of these linking projects means we are at least headed in the right direction. The emergent universality of OCLC was our most recent step toward interoperability, and it looks as if cross-linking is our next step. Only time will tell what steps will follow. References 1. Lawrence S. Guthrie II, “An Overview of Medieval Library Cataloging,” Cataloging & Classification Quarterly 15, no. 3 (1992): 93–100. 2. Lois Mai Chan and Theodora Hodges, Cataloging and Classification: An Introduction, 3rd ed. (Lanham, Md.: Scarecrow, 2007): 48. 3. Ibid., 6–8. 4. Ibid., 7–9. 5. OCLC, “About OCLC,” http://www.oclc.org/us/en/ about/default.htm (accessed Dec. 9, 2009). 6. Jung-Ran Park, “Cross-Lingual Name and Subject Access: Mechanisms and Challenges,” Library Resources & Technical Ser- vices 51, no. 3 (2007): 181. 7. Ibid., 185. 8. Ibid. Continued on page 39 tAGGiNG: AN ORGANizAtiON scHeMe FOR tHe iNteRNet | visseR 39 International and O’Reilly Media, Web 2.0 refers to the Web as being a platform for harnessing the collective power of Internet users interested in creating and sharing ideas and information without mediation from corporate, government, or other hierar- chical policy influencers or regulators. Web 3.0 is a much more fluid concept as of this writing. There are individuals who use it to refer to a Semantic Web where information is analyzed or processed by software designed specifically for computers to carry out the currently human-mediated activity of assigning meaning to information on a webpage. There are librarians involved with exploring virtual-world librarianship who refer to the 3D environment as Web 3.0. The important point here is that what Internet users now know as Web 2.0 is in the process of being altered by individuals continually experimenting with and improving upon existing Web applications. Web 3.0 is the undefined future of the participatory Internet. 3. Clay Shirky, “Here Comes Everybody: The Power of Organizing Without Organizations” (presentation videocast, Berkman Center for Internet & Society, Harvard University, Cambridge, Mass., 2008), http://cyber.law.harvard.edu/inter active/events/2008/02/shirky (accessed Oct. 1, 2008). 4. Ibid. 5. Lawerence Lessig, “Early Creative Commons History, My Version,” videocast, Aug. 11, 2008, Lessig 2.0, http://lessig.org/ blog/2008/08/early_creative_commons_history.html (accessed Aug. 13, 2008). 6. Elaine Peterson, “Beneath the Metadata: Some Philosophi- cal Problems with Folksonomy,” D-Lib Magazine 12, no. 11 (2006), http://www.dlib.org/dlib/november06/peterson/11peterson .html (accessed Sept. 8, 2008). 7. Clay Shirky, “Ontology is Overrated: Categories, Links, and Tags” online posting, Spring 2005, Clay Shirky’s Writings about the Internet, http://www.shirky.com/writings/ontology_ overrated.html#mind_reading (accessed Sept. 8, 2008). 8. Gene Smith, Tagging: People-Powered Metadata for the Social Web (Berkeley, Calif.: New Riders, 2008): 68. 9. Ibid., 76. 10. Thomas Vander Wal, “Folksonomy,” online posting, Feb. 7, 2007, vanderwal.net, http://www.vanderwal.net/folksonomy .html (accessed Aug. 26, 2008). 11. Thomas Vander Wal, “Explaining and Showing Broad and Narrow Folksonomies,” online posting, Feb. 21, 2005, Personal InfoCloud, http://www.personalinfocloud.com/2005/02/ explaining_and_.html (accessed Aug. 29, 2008). 12. Shirky, “Ontology is Overrated.” 13. Ibid. 14. Michael Arrington, “Exclusive: Screen Shots and Feature Overview of Delicious 2.0 Preview,” online posting, June 16, 2005, TechCrunch, http://www.techcrunch.com/2007/09/06/ exclusive-screen-shots-and-feature-overview-of-delicious-20 -preview/(accessed Jan. 6, 2010). 15. Smith, Tagging, 67–93 . 16. Vander Wal, “Explaining and Showing Broad and Narrow Folksonomies.” 17. Adam Mathes, “Folksonomies—Cooperative Classifica- tion and Communication through Shared Metadata” (graduate paper, University of Illinois Urbana–Champaign, Dec. 2004); Peterson, “Beneath the Metadata”; Shirky, “Ontology is Over- rated”; Thomas and Griffin, “Who Will Create the Metadata for the Internet?” 18. Shirky, “Ontology is Overrated.” 19. Peterson, “Beneath the Metadata.” 20. Cory Doctorow, “Metacrap: Putting the Torch to Seven Straw-Men of the Meta-Utopia,” online posting, Aug. 26, 2001, The Well, http://www.well.com/~doctorow/metacrap.htm (accessed Sept. 15, 2008). 21. Marieke Guy and Emma Tonkin, “Folksonomies: Tidy- ing up Tags?” D-Lib Magazine 12, no. 1 (2006), http://www.dlib .org/dlib/january06/guy/01guy.html (accessed Sept. 8, 2008). 22. Shirky, “Ontology is Overrated.” Global Interoperability continued from page 33 9. Julie Renee Moore, “RDA: New Cataloging Rules, Com- ing Soon to a Library Near You!” Library Hi Tech News 23, no. 9, (2006): 12. 10. Rick Bennett, Brian F. Lavoie, and Edward T. O’Neill, “The Concept of a Work in WorldCat: An Application of FRBR,” Library Collections, Acquisitions, & Technical Services 27, no. 1, (2003): 56. 11. Park, “Cross-Lingual Name and Subject Access.” 12. Ibid. 13. Thomas B. Hickey, “Virtual International Authority File” (Microsoft PowerPoint presentation, ALA Annual Conference, New Orleans, June 2006), http://www.oclc.org/research/ projects/viaf/ala2006c.ppt (accessed Dec. 9, 2009). 14. LEAF, “LEAF Project Consortium,” http://www.crxnet .com/leaf/index.html (accessed Dec. 9, 2009). 15. Bennett, Lavoie, and O’Neill, “The Concept of a Work in WorldCat.” 16. Alan Danskin, “Mature Consideration: Developing Biblio- graphic Standards and Maintaining Values,” New Library World 105, no. 3/4, (2004): 114. 17. Ibid. 18. Bennett, Lavoie, and O’Neill, “The Concept of a Work in WorldCat.” 19. Moore, “RDA.” 20. Danskin, “Mature Consideration,” 116. 21. Ibid.; Park, “Cross-Lingual Name and Subject Access.” 3157 ---- 40 iNFORMAtiON tecHNOlOGY AND liBRARies | MARcH 2010 Mary Kurtz Dublin Core, DSpace, and a Brief Analysis of Three University Repositories This paper provides an overview of Dublin Core (DC) and DSpace together with an examination of the institutional repositories of three public research universities. The uni- versities all use DC and DSpace to create and manage their repositories. I drew a sampling of records from each reposi- tory and examined them for metadata quality using the criteria of completeness, accuracy, and consistency. I also examined the quality of records with reference to the meth- ods of educating repository users. One repository used librarians to oversee the archiving process, while the other two employed two different strategies as part of the self- archiving process. The librarian-overseen archive had the most complete and accurate records for DSpace entries. T he last quarter of the twentieth century has seen the birth, evolution, and explosive proliferation of a bewildering variety of new data types and formats. Digital text and images, audio and video files, spreadsheets, websites, interactive databases, RSS feeds, streaming live video, computer programs, and macros are merely a few examples of the kinds of data that can be now found on the Web and elsewhere. These new dataforms do not always conform to conventional cata- loging formats. In an attempt to bring some sort of order from chaos, the concept of metadata (literally “data about data”) arose. Metadata is, according to ALA, “structured, encoded data that describe characteristics of information- bearing entities to aid in the identification, discovery, assessment, and management of the described entities.”1 Metadata is an attempt to capture the contextual information surrounding a datum. The enriching con- textual information assists the data user to understand how to use the original datum. Metadata also attempts to bridge the semantic gap between machine users of data and human users of the same data. n Dublin Core Dublin Core (DC) is a metadata schema that arose from an invitational workshop sponsored by the Online Computer Library Center (OCLC) in 1995. “Dublin” refers to the location of this original meeting in Dublin, Ohio, and “Core” refers to that fact DC is set of metadata elements that are basic, but expandable. DC draws upon concepts from many disciplines, including librarianship, computer science, and archival preservation. The standards and definitions of the DC element sets have been developed and refined by the Dublin Core Metadata Initiative (DCMI) with an eye to interoperabil- ity. DCMI maintains a website (http://dublincore.org/ documents/dces/) that hosts the current definitions of all the DC elements and their properties. DC is a set of fifteen basic elements plus three addi- tional elements. All elements are both optional and repeatable. The basic DC elements are: 1. Title 2. Creator 3. Subject 4. Description 5. Publisher 6. Contributor 7. Date 8. Type 9. Format 10. Identifier 11. Source 12. Language 13. Relation 14. Coverage 15. Rights The additional DC Elements are: 16. Audience 17. Provenance 18. Rights Holder DC allows for element refinements (or subfields) that narrow the meaning of an element, making it more specific. The use of these refinements is not required. DC also allows for the addition of nonstandard elements for local use. n DSpace DSpace is an open-source software package that provides management tools for digital assets. It is frequently used to create and manage institutional repositories. First released in 2002, DSpace is a joint development effort of Hewlett Packard (HP) Labs and the Massachusetts Institute of Technology (MIT). Today, DSpace’s future Mary Kurtz (mhkurtz@gmail.com) is a june 2009 graduate of drexel University’s School of information Technology. She also holds a BS in Secondary Education from the University of Scran- ton and an MA in English from the University of illinois at Urbana– Champaign. Currently, kurtz volunteers her time in technical ser- vices/cataloging at Simms Library at Albuquerque Academy and in corporate archives at Lovelace Respiratory Research institute (www.lrri.org), where she is using dSpace to manage a diverse collection of historical photographs and scientific publications. Dc, DsPAce, AND A BRieF ANAlYsis OF tHRee uNiveRsitY RePOsitORies | KuRtz 41 is guided by a loose grouping of interested developers called the DSpace Committers Group, whose members currently include HP Labs, MIT, OCLC, the University of Cambridge, the University of Edinburgh, the Australian National University, and Texas A&M University. DSpace version 1.3 was released in 2005 and the newest version, DSpace 1.5, was released in March 2008. More than one thousand institutions around the world use DSpace, including public and private colleges and universities and a variety not-for-profit corporations. DC is at the heart of DSpace. Although DSpace can be customized to a limited extent, the basic and quali- fied elements of DC and their refinements form DSpace’s backbone.2 n How DSpace works: a contributor’s perspective DSpace is designed for use by “metadata naive” contribu- tors. This is a conscious design choice made by its devel- opers and in keeping with the philosophy of inclusion for institutional repositories. DSpace was developed for use by a wide variety of contributors with a wide range of metadata and bibliographic skills. DSpace simplifies the metadata markup process by using terminology that is different from DC standards and by automating the production of element fields and XML/HTML code. DSpace has four hierarchical levels of users: users, contributors, community administrators, and network/ systems administrators. The user is a member of the general public who will retrieve information from the repository via browsing the database or conducting structured searches for specific information. The contributor is an individual who wishes to add their own work to the database. To become a contributor, one must be approved by a DSpace community adminis- trator and receive a password. A contributor may create, upload, and (depending upon the privileges bestowed upon him by his community administrator), edit or remove informational records. Their editing and removal privileges are restricted to their own records. A community administrator has oversight within their specialized area of DSpace and accordingly has more privileges within the system than a contributor. A community administrator may create, upload, edit, and remove records, but also can edit and remove all records available within the community’s area of the database. Additionally, the community administrator has access to some metadata about the repository’s records that is not available to users and contributors and has the power to approve requests to become contributors and grant upload access to the database. Lastly, the commu- nity administrator sets the rights policy for all materials included in the database and writes the statement of rights that every contributor must agree to with every record upload. The network/systems administrator is not involved with database content, focusing rather on software main- tenance and code customization. When a DSpace contributor wishes to create a new record, the software walks them through the process. DSpace presents seven screens in sequence that ask for specific information to be entered via check buttons, fill- in textboxes, and sliders. At the end of this process, the contributor must electronically sign an acceptance of the statement of rights. Because DSpace’s software attempts to simplify the metadata-creation process for contributors, its terminol- ogy is different from DC’s. DSpace uses more common terms that are familiar to a wider variety of individu- als. For example, DSpace asks the contributor to list an “author” for the work, not a “creator” or a “contribu- tor.” In fact, those terms appear nowhere in any DSpace. Instead, DSpace takes the text entered in the author textbox and maps it to a DC element—something that has profound implications if the mapping does not follow expected DC definitions. Likewise, DSpace does not use “subject” when asking the contributor to describe their material. Instead, DSpace asks the contributor to list keywords. Text entered into the keyword field is then mapped into the subject ele- ment. While this seems like a reasonable path, it does have some interesting implications for how the subject element is interpreted and used by contributors. DC’s metadata elements are all optional. This is not true in DSpace. DSpace has both mandatory and auto- matic elements in its records. Because of this, data records created in DSpace look different than data records created in DC. These mandatory, automatic, and default fields affect the fill frequency of certain DC elements—with all of these elements having 100 percent participation. In DSpace, the title element is mandatory; that is, it is a required element. The software will not allow the contributor to proceed if the title text box is left empty. As a consequence, all DSpace records will have 100 percent participation in the title element. DSpace has seven automatic elements, that is, ele- ment fields that are created by the software without any need for contributor input. Three are date elements, two are format elements, one is an identifier, and one is provenance. DSpace automatically records the time of the each record’s creation in machine-readable form. When the record is uploaded into the database, this time- stamp is entered into three element fields: dc.date.avail- able, dc.date.accessioned, and dc.date.issued. Therefore DSpace records have 100 percent participation in the date element. For previously published materials, a separate screen asks for the original publication date, which is then 42 iNFORMAtiON tecHNOlOGY AND liBRARies | MARcH 2010 placed in the dc.date.issued element. Like title, the origi- nal date of publication is a mandatory field, and failure to enter a meaningful numerical date into the textbox will halt the creation of a record. In a similar manner, DSpace “reads” the kind of file the contributor is uploading to the database. DSpace automatically records the size and type (.doc, .jpg, .pdf, etc.) of the file or files. This data is automatically entered into dc.format.mimetype and dc.format.extent. Like date, all DSpace records will have 100 percent participation in the format element. Likewise, DSpace automatically assigns a location identifier when a record is uploaded to the database. This information is recorded as an URI and placed in the identifier element. All DSpace records have a dc.identifier.uri field. The final automatic element is provenance. At the time of record creation, DSpace records the identity of the contributor (derived from the sign-in identity and pass- word) and places this information into a dc.provenance element field. This information becomes a permanent part of the DSpace record; however, this field is a hidden to users. Typically only community and network/sys- tems administrators may view provenance information. Still, like date, format, and identifier elements, DSpace records have automatic 100 percent participation in prov- enance. Because of the design of DSpace’s software, all DSpace-created records will have a combination of both contributor-created and DSpace-created metadata. All DSpace records can be edited. During record cre- ation, the contributor may at any time move backward through his record to alter information. Once the record has been finished and the statement of rights signed, the completed record moves into the community administra- tor’s workflow. Once the record has entered the workflow, the community administrator is able to view the record with all the metadata tags attached and make changes using DSpace’s editing tools. However, depending on the local practices and the volume of records passing through the administrator’s workflow, the administrator may simply upload records without first reviewing them. A record may also be edited after it has been uploaded, with any changes being uploaded into the database at the end of editing process. In editing a record after it has been uploaded, the contributor, providing he has been granted the appropriate privileges, is able to see all the metadata elements that have attached to the record. Calling up the editing tools at this point allows the contributor or admin- istrator to make significant changes to the elements and their qualifiers, something that is not possible during the record’s creation. When using the editing tools, the simpli- fied contributor interface disappears, and the metadata elements fields are labeled with their DC names. The con- tributor or administrator may remove metadata tags and the information they contain and add new ones selecting the appropriate metadata element and qualifier from a slider. For example, during the editing process, the contrib- utor or administrator may choose to create dc.contributor. editor or dc.subject.lcsh options—something not possible during the record-creation process. In the examination of the DSpace records from our three repositories, DSpace’s shaping influence on element participation and metadata quality will be clearly seen. n The repositories DSpace is principally used by academic and corporate nonprofit agencies to create and manage their insti- tutional repositories. For this study, I selected three academic institutions that shared similar characteristics (large, public, research-based universities) but which had differing approaches to how they managed their metadata-quality issues. The University of New Mexico (UNM) DSpace reposi- tory (DSpaceUNM) holds a wide-ranging set of records, including materials from the university’s faculty and administration, the Law School, the Anderson School of Business Administration, and the Medical School, as well as materials from a number of tangentially related university entities like the Western Water Policy Review Advisory Commission, New Mexico Water Trust Board, and Governor Richardson’s Task Force on Ethic Reform. At the time of the initial research for this paper (spring 2008), DSpaceUNM provided little easily acces- sible on-site education for contributors about the DSpace record-creation process. What was offered—a set of eight general information files—was buried deep inside the library community. A contributor would have to know the files existed to find them. By summer 2009, this had changed. DSpaceUNM had a new homepage layout. There is now a link to “help sheets and promotional materials” at the top center of the homepage. This link leads to the previously difficult-to- find help files. The content of the help files, however, remains largely unchanged. They discuss community creation, copy- rights, administrative workflow for community creation, a list of supported formats, a statement of DSpaceUNM’s privacy policy, and a list of required, encouraged, and not required elements for each new record created. For the most part, DSpaceUNM help sheets do not attempt to educate the contributor in issues of metadata quality. There is no discussion of DC terminology, no attempts to refer the contributor to a thesaurus or controlled vocabu- lary list, nor any explanation of the record-creation or editing process. This lack of contributor education may be explained in part because DSpaceUNM requires all new records Dc, DsPAce, AND A BRieF ANAlYsis OF tHRee uNiveRsitY RePOsitORies | KuRtz 43 to be reviewed by a subject area librarian as part of the DSpace community workflow. Thus any contributor errors, in theory, ought to be caught and corrected before being uploaded to the database. The University of Washington (UW) DSpace reposi- tory (ResearchWorks at The University of Washington) hosts a narrower set of records than DSpaceUNM, with the materials limited to the those contributed by the university’s faculty, students, and staff, plus materials from the UW’s archives and UW’s School of Public and Community Health. In 2008, ResearchWorks was self-archiving. Most contributors were expected to use DSpace to create and upload their record. There is no indication in the publicly available information about the record creation workflow if record reviews were conducted before record upload. The help link on the ResearchWorks homepage brought contributors to a set of screen-by-screen instructions on how to use DSpace’s software to create and upload a record. The step-through did not include instructions on how to edit a record once it had been created. No expla- nation of the meanings or definitions of the various DC elements was included in the help files. There also were no suggestions about the use of a controlled vocabulary or a thesaurus for subject headings. By 2009, this link had disappeared and the associated contributor education materials with it. The Knowledge Bank at Ohio State University(OSU) is the third repository examined for this paper. OSU’s repository hosts more than thirty communities, all of which are associated with various academic departments or special university programs. Like ResearchWorks at UW, OSU’s repository appears to be self-archiving with no clear policy statement as to whether a record is reviewed before it is uploaded to the repository’s database. OSU makes a strong effort to educate its contribu- tors. On the upper-left of the Knowledge Bank homepage is a slider link that brings the contributor (or any user) to several important and useful sources of repository information: About Knowledge Bank, FAQs, Policies, Video Upload Procedures, Community Set-Up Form, Describing Your Resources, and Knowledge Bank Licensing Agreement. The existence and use of metadata in Knowledge Bank are explicitly mentioned in the FAQ and Policies areas, together with an explanation of what metadata is and how metadata is used (FAQ), and a list of sup- ported metadata elements (Policies). The Describe Your Resources section gives extended definitions of each DSpace-available DC metadata element and provides examples of appropriate metadata-element use. Knowledge Bank provides the most comprehensive contributor education information of any of the three repositories examined. It does not use a controlled vocabulary list for subject headings, and it does not offer a thesaurus. n Data and analysis I chose twenty randomly selected full records from each repository. No more than one record was taken from any one collection to gather a broad sampling from each repository. I examined each record for the quality of its metadata. Metadata quality is a semantically slippery term. Park, in the spring 2009 special metadata issue of Cataloging and Classification Quarterly, suggested that most com- monly accepted criteria for metadata quality are com- pleteness, accuracy, and consistence.3 Those criteria will be applied in this analysis. For the purpose of this paper, I define completeness as the fill rate for key metadata elements. Because the purpose of metadata is to identify the record and to assist in the user’s search process, the key elements are title, contributor/creator, subject, and description.abstract— all contributor-generated fields. I chose these elements because these are the fields that the DSpace software uses when someone conducts an unrestricted search. Table 1 shows the fill rate for the title element is 100 percent for all three repositories. This is to be expected because, as noted above, title is mandatory field. The fill rate for contributor/creator is likewise high: 16 of 20 (80 percent) for UNM, 19 of 20 (95 percent) for UW, and 19 of 20 (95 percent) for OSU. (OSU’s fill rate for creator and contributor were summed because OSU uses different definitions for creator and contributor element fields than do UNM or UW. This discrepancy will be discussed in greater depth in the consistency of metadata terminology below.) The fill rate for subject was more variable. UNM’s subject fill rate was 100 percent, while UW’s was 55 per- cent, and OSU’s was 40 percent. The fill rate for the description.abstract subfield was 12 of 80 (60 percent) at UNM, 15 of 20 (75 percent) at UW, and 8 of 20 (40 percent) at OSU. (See appendix A for a complete list of metadata elements and subfields used by each of the three repositories.) The relatively low fill rate (below 50 percent) at the OSU KnowledgeBank in both subject and description .abstract suggests a lack of completeness in that reposi- tory’s records. Accuracy in metadata quality is the essential “cor- rectness” of a record. Correctness issues in a record range from data-entry issues (typos, misspellings, and inconsis- tent date formats) to the correct application of metadata definitions and data overlaps.4 Accuracy is perhaps the most difficult of the metadata 44 iNFORMAtiON tecHNOlOGY AND liBRARies | MARcH 2010 quality criteria to judge. Local practices vary widely, and DC allows for the creation of custom metadata tags for local use. Additionally, there is long-standing debate and confusion about the definitions of metadata elements even among librarians and information professionals.5 Because of this, only the most egregious of accuracy errors were considered for this paper. All three repositories had at least one record that contained one or more inaccurate metadata fields; two of them had four or more inaccurate records. Inaccurate records included a wide variety of accu- racy errors, including poor subject information (no matter how loosely one defines a subject heading, “the” is not an accurate descriptor); mutually contradictory metadata (record contained two different language tags, although only one applied to the content); and one in which the abstract was significantly longer and only tangentially related than the file it described. Additionally, records showed confusion over contributor versus creator ele- ments. In a few records, contributors entered duplicate information into both element fields. This observation supports Park and Childress’s findings that there is wide- spread confusion over these elements.6 Among the most problematic records in terms of accuracy were those contained in UW’s Early Buddhist Manuscripts Project. This collection, which has been removed from public access since the original data was drawn for this paper, contained numerous ambiguous, contradictory, and inaccurate metadata elements.7 While contributor-generated subject headings were specifically not examined for this paper, it must be noted that was a wide variation in the level of detail and vocab- ulary used to describe records. No community within any of the repositories had specific rules for the generation of keyword descriptors for records, and the lack of guidance shows. Consistency can be defined as the homogeneity of formats, definitions, and use of DC elements within the records. This consistency, or uniformity, of data is impor- tant because it promotes basic semantic interoperability. Consistency both inside the repository itself and with other repositories makes the repository easier to use and provides the user with higher quality information. All three repositories showed 100 percent consistency in DSpace-generated elements. DSpace’s automated cre- ation of date and format fields provided reliably consis- tent records in those element fields. DSpace’s automatic formatting of personal names in the dc.contributor.author and dc.creator fields also provided excellent internal con- sistency. However, the metadata elements were much less consistent for contributor-generated information. Inconsistency within the subject element is where most problems occurred. Personal names used as subject heading and capitalization within subject headings both proved to be particular issues. DSpace alphabetizes sub- ject headings according to the first letter of the free text entered in the keyword box. Thus the same name entered in different formats (first name first or last name first) generates different subject-heading listings. The same is true for capitalization. Any difference in capitalization of any word within the free-text entry generates a separate subject heading. Another field where consistency was an issue was dc.description.sponsorship. Sponsorship is problem because different communities, even different collections within the same community, use the field to hold differ- ent information. Some collections used the sponsorship field to hold the name of a thesis or dissertation advisor. Some collections used sponsorship to list the funding agency or underwriter for a project being documented inside the record. Some collections used sponsorship to acknowledge the donation of the physical materials docu- mented by the record. While all of these are valid uses of the field, they are not the same thing and do not hold the same meaning for the user. The largest consistency issue, however, came from Table 1. Metadata Fields and their Frequencies Element Univ. of N.M. Univ. of Wash. Ohio State Univ. Title 20 20 20 Creator 0 0 16 Subject 20 11 8 Description 12 16 17 Publisher 4 4 8 Contributor 16 19 3 Date 20 20 20 Type 20 20 20 Identifier 20 20 20 Source 0 0 0 Language 20 20 20 Relation 3 1 6 Coverage 2 0 0 Rights 2 0 0 Provenance ** ** ** **provenance tags are not visible to public users Dc, DsPAce, AND A BRieF ANAlYsis OF tHRee uNiveRsitY RePOsitORies | KuRtz 45 a comparison of repository policies regarding element use and definition. Unaltered DSpace software maps contributor-generated information entered into the author textbox during the record-creation process into the dc.contributor.author field. However, OSU’s DSpace software has been altered so that the dc.contributor .author field does not exist. Instead, text entered into the author textbox during the record-creation process maps to dc.creator. Although both uses are correct, this choice does create a significant difference in element definitions. OSU’s DSpace author fields are no longer congruent with other DSpace author fields. n Conclusions DSpace was created as repository management tool. By streamlining the record creation workflow and partially automating the creation of metadata, DSpace’s develop- ers hoped to make institutional repositories more useful and functional while time providing an improved experi- ence for both users and contributors. In this, DSpace has been partially successful. DSpace has made it easier for the “metadata naive” contributor to create records. And, in some ways, DSpace has improved the quality of repository metadata. Its automatically generated fields ensure better consistency in those elements and subfields. Its mandatory fields guarantee 100 percent fill rates in some elements, and this contributes to an increase in metadata completeness. However, DSpace still relies heavily on contributor- generated data to fill most of the DC elements, and it is in these contributor-generated fields that most of the metadata quality issues arise. Nonmandatory fields are skipped, leading to incomplete records. Data entry errors, a lack of authority control over subject headings, and con- fusion over element definitions can lead to poor metadata accuracy. A lack of enforced, uniform naming and capi- talization conventions leads to metadata inconsistency, as does the localized and individual differences in the application of metadata element definitions. While most of the records examined in this small survey could be characterized as “acceptable” to “good,” some are abysmal. To improve the inconsistency of the DSpace records, the three universities have tried differ- ing approaches. Only UNM’s required record review by a subject area librarian before upload seems to have made any significant impact on metadata quality. UNM has a 100 percent fill rate for subject elements in its records, while UW and OSU do not. This is not to say that UNM’s process is perfect and that poor records do not get into the system—they do (see appendix B for an example). But it appears that for now, the intermediary interven- tion of a librarian during the record-creation process is an improvement over self-archiving—even with educa- tion—by contributors. References and notes 1. Association of Library Collections & Technical Services, Committee on Cataloging: Description & Access, Task Force on Metadata, “Final Report,” June 16, 2000, http://www.libraries .psu.edu/tas/jca/ccda/tf-meta6.html (accessed Mar. 10, 2007). 2. A voluntary (and therefore less-than-complete) list of current DSpace users can be found at http://www.dspace. org/index.php?option=com_content&task=view&id=596&Ite mid=180. Further specific information about DSpace, includ- ing technical specifications, training materials, licensing, and a user wiki, can be found at http://www.dspace.org/index .php?option=com_content&task=blogcategory&id=44&Itemi d=125. 3. Jung-Ran Park “Metadata Quality in Digital Repositories: A Survey of the Current State of the Art,” Cataloging & Classifica- tion Quarterly 47, no. 3 (2009): 213–28. 4. Sarah Currier et al., “Quality Assurance for Digital Learning Object Repositories: Issues for the Metadata Creation Process,” ALT-J: Research in Learning Technology 12, no. 1 (2004): 5–20. 5. Jung-Ran Park and Eric Childress, “DC Metadata Seman- tics: An Analysis of the Perspectives of Informational Profession- als,” Journal of Information Science 20, no. 10 (2009): 1–13. 6. Ibid. 7. For a fuller discussion of the collection’s problems and challenges in using both DSpace and DC, see Kathleen For- sythe et al., University of Washington Ealy Buddhist Manuscripts Project in DSpace (paper presented at DC-2003, Seattle, Wash., Sept. 28–Oct. 2, 2003), http://dc2003.ischool.washington.edu/ Archive-03/03forsythe.pdf (accessed Mar. 10, 2007). LITA cover 2, cover 3 Neal-Schuman cover 4 OCLC 7 Index to Advertisers 46 iNFORMAtiON tecHNOlOGY AND liBRARies | MARcH 2010 Appendix A. A list of the most commonly used qualifiers in each repository university of New Mexico dc.date.issued (20) dc.date.accessioned (20) dc.date.available (20) dc.format.mimetype (20) dc.format.extent (20) dc.identifier.uri (20) dc.contributor.author (15)) dc.description.abstract (12) dc.identifier.citation (6) dc.description.sponsorship (4) dc.subject.mesh (2) dc.contributor.other (2) dc.description.sponsor (1) dc.date.created (1) dc.relation.isbasedon (1) dc.relation.ispartof (1) dc.coverage.temporal (1) dc.coverage.spatial (1) dc.contributor.other (1) university of washington dc.date.accessioned (20) dc.date.available (20) dc.date.issued (20) dc.format.mimetype (20) dc.format.extent (20) dc. identifier.uri (20) dc.contributor.author (18) dc.description.abstract (15) dc.identifier.citation (4) dc.identifier.issn (4) dc.description.sponsorship (1) dc.contributor.corporateauthor (1) dc.contributor.illustrator (1) dc.relation.ispartof (1) Ohio state university dc.date.issued (20) dc.date.available (20) dc.date.accessioned (20) dc.format.mimetype (20) dc.format.extent (20) dc.identifier.uri (20) dc.description.abstract (8) dc.identifier.citation (4) dc.subject.lcsh (4) dc.relation.ispartof (4) dc.description.sponsorship (3) dc.identifier.other (2) dc.contributor.editor (2) dc.contribtor.advisor (1) dc.identifier.issn (1) dc.description.duration (1) dc.relation.isformatof (1) dc.description.statementofresponsi- bility (1) dc.description.tableofcontents (1) Appendix B. Sample Record dc.identifier.uri http://hdl.handle.net/1928/3571 dc.description.abstract President Schmidly’s charge for the creation of a North Golf Course Community Advisory Board. dc.format.extent 17301 bytes dc.format.mimetype application/pdf dc.language.iso en_US dc.subject President dc.subject Schmidly dc.subject North dc.subject Golf dc.subject Course dc.subject Community dc.subject Advisory dc.subject Board dc.subject Charge dc.title Community_Advisory_Board_Charge dc.type Other 3158 ---- lauren H. Mandel (lmandel@fsu.edu) is a doctoral candidate at the florida State University College of Communication & informa- tion, School of Library & information Studies, and is Research Coordinator at the information Use Management & Policy insti- tute. Geographic Information Systems: Tools for Displaying In-Library Use Data Lauren H. Mandel GeOGRAPHic iNFORMAtiON sYsteMs: tOOls FOR DisPlAYiNG iN-liBRARY use DAtA | MANDel 47 In-library use data is crucial for modern libraries to understand the full spectrum of patron use, including patron self-service activities, circulation, and reference statistics. Rather than using tables and charts to display use data, a geographic information system (GIS) facili- tates a more visually appealing graphical display of the data in the form of a map. GISs have been used by library and information science (LIS) researchers and practitio- ners to create maps that display analyses of service area populations and demographics, facilities space manage- ment issues, spatial distribution of in-library use of materials, planned branch consolidations, and so on. The “seating sweeps” method allows researchers and librari- ans to collect in-library use data regarding where patrons are locating themselves within the library and what they are doing at those locations, such as sitting and reading, studying in a group, or socializing. This paper proposes a GIS as a tool to visually display in-library use data col- lected via “seating sweeps” of a library. By using a GIS to store, manage, and display the data, researchers and librarians can create visually appealing maps that show areas of heavy use and evidence of the use and value of the library for a community. Example maps are included to facilitate the reader’s understanding of the possibilities afforded by using GISs in LIS research. T he modern public library operates in a context of limited (and often continually reduced) funding where the librarians must justify the continued value of the library to funding and supervisory authori- ties. This is especially the case as more and more patrons access the library virtually, calling into question the relevance of the physical library. In this context, there is a great need for librarians and researchers to evaluate the use of library facility space to demonstrate that the physical library is still being used for important social and educational functions. Despite this need, no model of public library facility evaluation emphasizes the ways patrons use library facilities. The systematic collection of in-library use data must go beyond traditional circula- tion and reference transactions to include self-service activities, group study and collaboration, socializing, and more. Geographic information systems (GISs) are beginning to become deployed in library and information science (LIS) research as a tool for graphically displaying data. An initial review of the literature has yielded studies where a GIS has been used in analyzing service area populations through U.S. Census data;1 sitting facility locations;2 managing facilities, including spatial distribu- tion of in-library book use and occupancy of library study space;3 and planning branch consolidations.4 These uses of GIS are not mutually exclusive; studies have combined multiple uses of GISs.5 Also, GISs have been proposed as viable tools for producing visual representations of measurements of library facility use.6 These studies show the capabilities of a GIS for storing, managing, analyzing, and displaying in-library use data and the value of GIS- produced maps for library facility evaluations, in-library use research, and library justification. n Research purpose Observing and measuring the use of a library facility is a crucial step in the facility evaluation process. The library needs to understand how the facility is currently being used in order to justify the continued financial support necessary to maintain and operate it. Understanding how the facility is used can also help librarians identify high- traffic areas of the library that are ideal locations to mar- ket library services and materials. This understanding cannot be reached by analyzing circulation and reference transaction data alone; it must include in-library use mea- sures that account for all ways patrons are using the facil- ity. The purpose of this paper is to suggest a method by which to observe and record all uses of a library facility during a sampling period, the so-called “seating sweep” performed by Given and Leckie, and then to use a GIS to store, manage, and display the collected data on a map or series of maps that graphically depict library use.7 n Significance of facility evaluation Facility evaluation is a topic of vital importance in all fields, but this is especially true of a field such as public librarianship where funding is often a source of concern.8 In times of economic instability, libraries can benefit from the ability to identify uses of existing facilities and employ this information to justify the continued opera- tion of the library facility. Also, knowing which areas of the library are more frequently used than others can help lauren H. Mandel (lmandel@fsu.edu) is a doctoral candidate at the florida State University College of Communication & in- formation, School of Library & information Studies, and is Re- search Coordinator at the information Use Management & Policy institute. 48 iNFORMAtiON tecHNOlOGY AND liBRARies | MARcH 2010 librarians determine where to place displays of library materials and advertisements of library services. For a library to begin to evaluate patron use and how well the facility meets users’ needs, there must be an understanding of what users need from the library facility.9 To determine those needs, it is vital that library staff observe the facility while it is being used. This obser- vation can be applied to the facility evaluation plan to justify the continued operation of the facility to meet the needs of the library service population. Understanding how people use the public library facility beyond traditional measures of circulation statis- tics and reference transactions can lead to new theories of library use, an area of significant research interest for LIS. Additionally, the importance of this work transcends LIS because it applies to other government-funded com- munity service agencies as well. For example, recreation facilities and community centers could also benefit from a customer-use model that incorporates measures of the true use of those facilities. n Literature review Although much has been written on the use of library facilities, little of the research includes studies of how patrons actually use existing public library facilities and whether facilities are designed to accommodate this use.10 Rather, much of the research in public library facility eval- uation has focused on collection and equipment space needs,11 despite the user-oriented focus of public library accountability models.12 Recent research in library facility design is beginning to reflect this focus,13 but additional study would be useful to the field. Use of GIS is on the rise in the modern technologi- cal world. A GIS is a computer-based tool for compiling, storing, analyzing, and displaying data graphically.14 Usually this data is geospatial in nature, but a GIS also can incorporate descriptive or statistic data to provide a richer picture than figures and tables can. Although GIS has been around for half a century, it has become increas- ingly more affordable, allowing libraries and similar institutions to consider using a GIS as a measurement and analysis tool. GISs have started being used in LIS research as a tool for graphically displaying library data. One fruitful area has been the mapping of user demographics for facil- ity planning purposes,15 including studies that mapped library closures.16 Mapping also can include in-library use data,17 in which case a GIS is used to overlay collected in-library use data on library floor plans. This can offer a richer picture of how a facility is being used than tradi- tional charts and tables can provide. using a Gis to display library service area population data Adkins and Sturges suggest libraries use a GIS-based library service area assessment as a method to evaluate their service areas and plan library services to meet the unique demographic demands of their communities.18 They discuss the methods of using GIS, including down- loading U.S. Census TIGER (Topologically Integrated Geographic Encoding and Referencing) files, geocoding library locations, delineating service areas by multiple methods, and analyzing demographics. A key tenet of this approach is the concept that public libraries need to understand the needs of their patrons. This is a prevailing concept in the literature.19 Prieser and Wang, in reporting a method used to create a facilities master plan for the Public Library of Cincinnati and Hamilton County, Ohio, offer a convincing argument for combining GIS and building performance evaluation (BPE) methods to examine branch facility needs and offer individualized facilities recommendations.20 Like other LIS researchers,21 Preiser and Wang suggest a relation- ship between libraries and retail stores, noting the similar modern trends of destination libraries and destination bookstores. They also acknowledge the difficulty in com- pleting an accurate library performance assessment due to the multitude of activities and functions of a library. Their method is a combination of a GIS-based service area and population analysis with a BPE that includes staff and user interviews and surveys, direct observation, and photography. The described multimethod approach offers a more complete picture of a library facility’s per- formance than traditional circulation-based evaluations. Further use of GISs in library facility planning can be seen from a study comparing proposed branches by demographic data that has been analyzed and presented through a GIS. Hertel and Sprague describe research that used a GIS to conduct geospatial analysis of U.S. Census data to depict the demographics of populations that would be served by two proposed branch libraries for a public library system in Idaho.22 A primary purpose of this research is to demonstrate the possible ways public libraries can use GIS to present visual and quantitative demographic analyses of service area populations. Hertel and Sprague identify that public libraries are challenged to determine which public they are serving and the needs of that population, writing that “libraries are beginning to add customer-based satisfaction as a critical compo- nent of resource allocation decisions” and need the help of a GIS to provide hard-data evidence in support of staff observations.23 This evidence could take the form of demographic data, as discussed by Hertel and Sprague, and also could incorporate in-library use data to present a fuller picture of a facility’s use. GeOGRAPHic iNFORMAtiON sYsteMs: tOOls FOR DisPlAYiNG iN-liBRARY use DAtA | MANDel 49 using Gis to display in-library use data Xia conducted several studies in which he collected library- use data and mapped that data via a GIS. In one study designed to identify the importance of space management in academic libraries, Xia suggests applications of GISs in library space management, particularly his tool integrating library floor plans with feature data in a GIS.24 He explains that a GIS can overcome the constraints of drafting and computer automated design tools, such as those in use at Chico Meriam Library at California State University and at the Michigan State University Main Library. For example, GISs are not limited to space visualization manipulation, but can incorporate user perceptions, behavior, and daily activities, all of which are important data to library space management considerations and in-library use research. Xia also reviews the use of GIS tools that incorporate hos- pital and casino floor plans, noting that library facilities are as equally complex as hospitals and casinos; this is a com- pelling argument that academic libraries should consider the use of a GIS as a space management tool. In another study, Xia uses a GIS to visualize the spatial distribution of books in the library in an attempt to establish the relationship between the height of book- shelves and the in-library use of books.25 This study seeks to answer the question of how the location of books on shelves of different heights could influence user behav- ior (i.e., patrons may prefer to browse shelves at eye level rather than the top and bottom shelves). What is of interest here is Xia’s use of a GIS to spatially represent the collected data. Xia remarks that a GIS “is suitable for assisting in the research of in-library book use where library floor layouts can be drawn into maps on multiple- dimensional views.”26 In fact, Xia’s graphics depict the use of books by bookshelf height in a visual manner that could not be achieved without the use of a GIS. Similarly, a GIS can be used to spatially represent the collected data in an in-library use study by overlaying the data onto a representation of the library floor plan. In a third project, Xia measures study space use in academic libraries as a metric of user satisfaction with library services.27 He says that libraries need to evaluate space needs on case-by-case basis because every library is unique and serves a unique population. Therefore, to observe the occupancy of study areas in an academic library, Xia drew the library’s study facilities (including furniture) in a GIS. He then observed patrons’ use of the facilities and entered the observation data into the GIS to overlay on maps of the study areas. There are several advantages of using GIS in this way: Spatial databases can store continuing data sets, the system is powerful and flexible for manipulating and analyzing the spatial dataset, there are enhanced data visualization capabili- ties, and maps and data become interactive. conclusions drawn from the literature A GIS is a tool gaining momentum in the world of LIS research. GISs have been used to conduct and display service area population assessments,28 propose facility locations,29 and plan for and measure branch consolidation impacts and benefits.30 GISs also have been used to graphi- cally represent in-library use for managing facility space allocation, mapping in-library book use, and visualizing the occupancy of library study space.31 Additionally, GISs have been used in combination studies that examine library service areas and facility location proposals.32 These uses of GISs are only the beginning; a GIS can be used to map any type of data a library can collect, including all measures of in-library use. Additionally, GIS-based data analysis and display complements the focus in library-use research on gathering data to show a richer picture of a facility’s use and the focus in library facility design literature on build- ing libraries on the basis of community needs.33 n In-library use research that would benefit from spatial data displays Unobtrusive observational research offers a rich method for identifying and recording the use of a public library facility. A researcher could obtain a copy of a library’s floor plan, predetermine sampling times during which to “sweep” the library, and conduct the sweeps by marking all patrons observed on the floor plan.34 This data then could be entered into a GIS database for spatial analysis and display. Specific questions that could be addressed via such a method include the following: n What are all the ways in which people are using the library facility? n How many people are using traditional library resources, such as books and computers? n How many people are using the facility for other rea- sons, such as relaxation, meeting friends, and so on? n Do the ways in which patrons use the library vary by location within the facility (e.g., are the people using traditional library resources and the people using the library for other reasons using the same areas of the library or different areas)? n Which area(s) of the library facility receive the highest level of use? It is hoped that answers to these questions, in whole or in part, could begin to offer a picture of how a library facility is currently being used by library patrons. To better view this picture, the data recorded from the observational research could be entered into a GIS to 50 iNFORMAtiON tecHNOlOGY AND liBRARies | MARcH 2010 overlay onto the library floor plan in a similar manner as Xia’s use of a GIS to display occupancy of library study space.35 This spatial representation of the data should facilitate greater understanding of the actual use of the library facility. Instead of a library presenting tables and graphs of library use, it would be able to produce illus- trative maps that would help explain patterns of use to funding and supervising authorities. These maps would not require expensive proprietary GIS packages; the examples provided in this paper were created using the free, open-source MapWindow GIS package. example using Gis to display in-library use data For this paper, I produced example maps on the basis of fictional in-library use data. These maps were created using MapWindow GIS software along with Microsoft Excel, Publisher, and Paint (see figure 1 for a diagram of this process). MapWindow is an open-source GIS package that is easy to learn and use, but its layout and graphic design features are limited compared to the more expensive and sophisticated proprietary GIS packages.36 MapWindow files are compatible with the proprietary packages, so they could be imported into other GIS packages for finishing. For this paper, however, the goal was to create simple maps that a novice could replicate. Therefore Publisher and Paint were used for finalizing the maps, instead of a sophisticated GIS package. It was relatively easy to create the maps. First, I drew a sample floor plan of a fictional library computer lab in Excel and imported it into MapWindow as a JPEG file. I then overlaid polygons (shapes that represent area units such as chairs and tables) onto the floor plan and saved two shapefiles, one for tables and one for computers. A shapefile is a basic storage file used in most GIS pack- ages. For each of those shapefiles I created an attribute table (basically, a linked spreadsheet) using fictitious data representing use of the tables and computers at 9 and 11 a.m. and 1, 3, 5, and 7 p.m. on a sample day. The field cal- culator generated a final column summing the total use of each table and computer for the fictitious sample day. I then created maps depicting the use of both tables and computers at each of the sample time periods (see figure 2) and for the total use (see figure 3). Benefits of Gis-created displays for library managers The maps presented here are not based on actual data, but are meant to demonstrate the capabilities of GISs for spa- tially representing the use of a library facility. This could be done on a grander scale using an entire library floor plan and data collected during a longer sample period (e.g., a full week). These maps can serve several purposes for Figure 1. Process diagram for creating the sample maps Figure 2. Example maps depicting use of tables and computers in a fictional library computer lab, by hour GeOGRAPHic iNFORMAtiON sYsteMs: tOOls FOR DisPlAYiNG iN-liBRARY use DAtA | MANDel 51 library managers, specifically regarding the marketing of library services and the justification of library funding. Mapping data obtained from library “sweeps” can help identify the popularity of different areas of the library at different times of the day, different days of the week, or different times of the year. Once the library has identified the most popular areas, this information can be used to market library materials and services. For example, a highly populated area would be an ideal loca- tion over which to install ceiling-mounted signs that the library could use for marketing services and programs. Or the library could purchase a book display table similar to those used in bookstores and install it in the middle of a frequently populated area. The library could stock the table with seasonally relevant books and other materials (e.g., tax guidebooks in March and April) and track the circulation of these materials to determine the degree to which placement on the display table resulted in increased borrowing of those materials. In addition to helping the library market its materials and services, mapping in-library use can provide visual evidence of the library’s value. Public libraries often rely on reference and circulation transaction data, gate counts, and programming attendance statistics to justify their exis- tence. These measures, although valuable and important, do not include many other ways that patrons use libraries, such as sitting and reading, studying, group work, and socializing. During “seating sweeps,” the observers can record any and all uses they observe, including any that may not have been anticipated. All of these uses could then be mapped, providing a richer picture of how a pub- lic library is used and stronger justification of the library’s value. These maps may be easier for funding and supervis- ing authorities to understand than textual explanations or graphs and charts of statistical analyses. n Conclusion From a review of the literature, it is clear that GISs are increasingly being used in LIS research as data-analysis and display tools. GISs are being used to analyze patron and materials data as well as studies combining com- bined multiple uses of GISs. Patron analysis has included service-area-population analysis and branch-consolida- tion planning. Analysis of library materials has been used for space management, visualizing the spatial distribu- tion of in-library book use, and visual representation of facility-use measurements. This paper has proposed collecting in-library use data according to Given and Leckie’s “seating sweeps” method and visually displaying that data via a GIS. Examples of such visual displays were provided to facilitate the reader’s understanding of the possibilities afforded by using a GIS in LIS research, as well as the scalable nature of the method. Librarians and library staff can produce maps similar to the examples in this paper with minimal GIS training and background. The literature review and example figures offered in this paper show the capa- bilities of GISs for analyzing and graphically presenting library-use data. GISs are tools that can facilitate library facility evaluations, in-library use research, and library valuation and justification. References 1. Denice Adkins and Denyse K. Sturges, “Library Service Planning with GIS and Census Data,” Public Libraries 43, no. 3 (2004): 165–70; Karen Hertel and Nancy Sprague, “GIS and Cen- sus Data: Tools for Library Planning,” Library Hi Tech 25, no. 2 (2007): 246–59; Wolfgang F. E. Preiser and Xinhao Wang, “Assess- ing Library Performance with GIS and Building Evaluation Meth- ods,” New Library World 107, no. 1224–25 (2006): 193–217. Figure 3. Example map depicting total use of tables and computers in a fictional library computer lab for a sample day 52 iNFORMAtiON tecHNOlOGY AND liBRARies | MARcH 2010 2. Hertel and Sprague, “GIS and Census Data”; Preiser and Wang, “Assessing Library Performance.” 3. Jingfeng Xia, “Library Space Management: A GIS Pro- posal,” Library Hi Tech 22, no. 4 (2004): 375–82; Xia, “Using GIS to Measure In-Library Book-Use Behavior,” Information Technology & Libraries 23, no. 4 (2004): 184–91; Xia, “Visualizing Occupancy of Library Study Space with GIS Maps,” New Library World 106, no. 1212–13 (2005): 219–33. 4. Preiser and Wang, “Assessing Library Performance.” 5. Hertel and Sprague, “GIS and Census Data”; Preiser and Wang, “Assessing Library Performance.” 6. Preiser and Wang, “Assessing Library Performance”; Xia, “Library Space Management”; Xia, “Using GIS to Measure”; Xia, “Visualizing Occupancy.” 7. Lisa M. Given and Gloria J. Leckie, “‘Sweeping’ the Library: Mapping the Social Activity Space of the Public Library,” Library & Information Science Research 25, no. 4 (2003): 365–85. 8. “Jackson Rejects Levy to Reopen Libraries,” American Libraries 38, no. 7 (2007): 24–25; “May Levy Set for Jackson County Libraries Closing in April,” American Libraries 38, no. 3 (2007): 14; “Tax Reform Has Florida Bracing for Major Budget Cuts,” American Libraries 38, no. 8 (2007): 21. 9. Anne Morris and Elizabeth Barron, “User Consultation in Public Library Services,” Library Management 19, no. 7 (1998): 404–15; Susan L. Silver and Lisa T. Nickel, Surveying User Activ- ity as a Tool for Space Planning in an Academic Library (Tampa: Univ. of South Florida Library, 2002); James Simon and Kurt Schlichting, “The College Connection: Using Academic Support to Conduct Public Library Services,” Public Libraries 42, no. 6 (2003): 375–78. 10. Given and Leckie, “‘Sweeping’ the Library”; Christie M. Koontz, Dean K. Jue, and Keith Curry Lance, “Collecting Detailed In-Library Usage Data in the U.S. Public Libraries: The Methodology, the Results and the Impact,” in Proceedings of the Third Northumbria International Conference on Performance Measurement in Libraries and Information Services (Newcastle, UK: University of Northumbria, 2001): 175–79; Koontz, Jue, and Lance, “Neighborhood-Based In-Library Use Performance Measures for Public Libraries: A Nationwide Study of Majority- Minority and Majority White/Low Income Markets Using Personal Digital Data Collectors,” Library & Information Science Research 27, no. 1 (2005): 28–50. 11. Cheryl Bryan, Managing Facilities for Results: Optimizing Space for Services (Chicago: ALA, 2007); Anders C. Dahlgren, Public Library Space Needs: A Planning Outline (Madison, Wis.: Department of Public Instruction, 1988); William W. Sannwald and Robert S. Smith, eds., Checklist of Library Building Design Considerations (Chicago: ALA, 1988). 12. Brenda Dervin, “Useful Theory for Librarianship: Com- munication, Not Information,” Drexel Library Quarterly 13, no. 3 (1977): 16–32; Morris and Barron, “User Consultation”; Pre- iser and Wang, “Assessing Library Performance”; Simon and Schlichting, “The College Connection”; Norman Walzer, Karen Stott, and Lori Sutton, “Changes in Public Library Services,” Illinois Libraries 83, no. 1 (2001): 47–52. 13. Bradley Wade Bishop, “Use of Geographic Information Systems in Marketing and Facility Site Location: A Case Study of Douglas County (Colo.) Libraries,” Public Libraries 47, no. 5: 65–69; David Jones, “People Places: Public Library Build- ings for the New Millennium,” Australasian Public Libraries & Information Services 14, no. 3 (2001): 81–89; Nolan Lushington, Libraries Designed for Users: A 21st Century Guide (New York: Neal-Schuman, 2001); Shannon Mattern, “Form for Function: the Architecture for New Libraries,” in The New Downtown Library: Designing with Communities (Minneapolis: Univ. of Minnesota Pr., 2007), 55–83. 14. United Nations, Department of Economic and Social Affairs, Statistics Division, Handbook on Geographical Information Systems and Mapping (New York: United Nations, 2000). 15. Adkins and Sturges, “Library Service Planning”; Bishop, “Use of Geographic Information Systems”; Hertel and Sprague, “GIS and Census Data”; Christie Koontz, “Using Geographic Information Systems for Estimating and Profiling Geographic Library Market Areas,” in Geographic Information Systems and Libraries: Patrons, Maps, and Spatial Information, ed. Linda C. Smith and Mike Gluck (Urbana–Champaign: Univ. of Illinois Pr., 1996): 181–93; Preiser and Wang, “Assessing Library Perfor- mance.” 16. Christie M. Koontz, Dean K. Jue, and Bradley Wade Bishop, “Public Library Facility Closure: An Investigation of Reasons for Closure and Effects on Geographic Market Areas,” Library & Information Science Research 31, no. 2 (2009): 84–91. 17. Xia, “Library Space Management”; Xia, “Using GIS to Measure”; Xia, “Visualizing Occupancy.” 18. Adkins and Sturges, “Library Service Planning.” 19. Bishop, “Use of Geographic Information Systems”; Jones, “People Places”; Koontz, Jue, and Lance, “Collecting Detailed In- Library Usage Data”; Koontz, Jue, and Lance, “Neighborhood- Based In-Library Use”; Morris and Barron, “User Consultation”; Simon and Schlichting, “The College Connection”; Walzer, Stott, and Sutton, “Changes in Public Library Services.” 20. Preiser and Wang, “Assessing Library Performance.” 21. Given and Leckie, “‘Sweeping’ the Library;” Christie M. Koontz, “Retail Interior Layout for Libraries,” Marketing Library Services 19, no. 1 (2005): 3–5. 22. Hertel and Sprague, “GIS and Census Data.” 23. Ibid., 247. 24. Xia, “Library Space Management.” 25. Xia, “Using GIS to Measure.” 26. Ibid., 186. 27. Xia, “Visualizing Occupancy.” 28. Adkins and Sturges, “Library Service Planning”; Her- tel and Sprague, “GIS and Census Data”; Preiser and Wang, “Assessing Library Performance.” 29. Hertel and Sprague, “GIS and Census Data”; Preiser and Wang, “Assessing Library Performance.” 30. Koontz, Jue, and Bishop, “Public Library Facility Clo- sure”; Preiser and Wang, “Assessing Library Performance.” 31. Xia, “Library Space Management”; Xia, “Using GIS to Measure”; Xia, “Visualizing Occupancy.” 32. Hertel and Sprague, “GIS and Census Data”; Preiser and Wang, “Assessing Library Performance.” 33. Given and Leckie, “‘Sweeping’ the Library”; Koontz, Jue, and Lance, “Collecting Detailed In-Library Usage Data”; Koontz, Jue, and Lance, “Neighborhood-Based In-Library Use”; Silver and Nickel, Surveying User Activity; Jones, “People Places”; Lushington, Libraries Designed for Users. 34. Given and Leckie, “‘Sweeping’ the Library.” 35. Xia, “Visualizing Occupancy.” 36. For more information or to download MapWindow GIS, see http://www.mapwindow.org/ 3166 ---- EDITORIAL | TRuITT 3 Marc TruittEditorial W elcome to 2009! It has been unseasonably cold in Edmonton, with daytime “highs”—I use the term loosely— averaging around -25°C (that’s -13°F, for those of you ITAL readers living in the States) for much of the last three weeks. Factor in wind chill (a given on the Canadian Prairies), and you can easily subtract another 10°C. As a result, we’ve had more than a few days and nights where the adjusted temperature has been much closer to -40°, which is the same in either Celsius or Fahrenheit. While my boss and chief librarian is fond of saying that “real Canadians don’t even button their shirts until it gets to minus forty,” I’ve yet to observe such a feat of derring-do by anyone at much less than twenty below <grin>. Even your editor’s two Labrador retrievers—who love cooler weather—are reluctant to go out in such cold, with the result that both humans and pets have all been coping with bouts of cabin fever since before Christmas. n So, when is it “too cold” for a server room? Why, you may reasonably ask, am I belaboring ITAL readers with the details of our weather? Over the week- end we experienced near-simultaneous failures of both cooling systems in our primary server room (SR1), which meant that nearly all of our library IT services, including our OPAC (which we host for a consortium of twenty area libraries), a separate OPAC for Edmonton Public Library, our website, and access to licensed e-resources, e-mail, files, and print servers had to be shut down. Temperature readings in the room soared from an average of 20–22°C (68–71.5°F) to as much as 37°C (98.6°F) before settling out at around 30°C (86°F). We spent much of the weekend and beginning of this week relocating servers to all man- ner of places while the cooling system gets fixed. I imag- ine that next we may move one into each staff person’s under-heated office, where they’ll be able to perform double duty as high-tech foot warmers! All of this happened, of course, while the temperature outside the building hovered between -20° and -25°C. This is not the first time we’ve experienced a failure of our cooling systems during extremely cold weather. Last winter we suffered a series of problems with both the systems in SR1 and in our secondary room a few feet away. The issues we had then were not the same as those we’re living through now, but they occurred, as now, at the coldest time of the year. This seeming dichotomy of an overheated server environment in the depths of winter is not a matter of accident or coincidence; indeed, while it may seem counterintuitive, the fact is that many, if not all, of our cooling woes can be traced to the cold outside. The simple explanation is that extreme cold weather stresses and breaks things, including HVAC systems. As we’ve tried to analyze this incident, it appears likely that our troubles began when the older of our two systems in SR1 developed a coolant leak at some point after its last preventive maintenance servicing in August. Fall was mild here, and we didn’t see the onset of really severe cold weather until early to mid-December. Since the older system is mainly intended for failover of the newer one, and since both systems last received routine service recently, it is possible that the leak could have developed at any time since, although my supposition is that it may be itself a result of the cold. In any case, all seemed well because the newer cool- ing system in SR1 was adequate to mask the failure of the older unit, until it suffered a controller board failure that took it offline last weekend. But, with the failure of the new system on Saturday, all IT services provided from this room had to be brought down. After a night spent try- ing to cool the room with fans and a portable cooling unit, we succeeded in bringing the two OPACs and other core services back online by Sunday, but the coolant leak in the old system was not repaired until midday Monday. Today is Friday, and we’ve limped along all week on about 60 percent of the cooling normally required in SR1. We hope to have the parts to repair the newer cooling system early next week (fingers crossed!). Some interesting lessons have emerged from this incident, and while probably not many of you regularly deal with -30°C winters, I think them worth sharing in the hope that they are more generally applicable than our winter extremes are: 1. Document your servers and the services that reside on them. We spent entirely too much time in the early hours of this event trying to relate servers and ser- vices. We in information technology (IT) may think of shutting down or powering up servers “Fred,” “Wilma,” “Betty,” and “Barney,” but, in a crisis, what we generally should be thinking of is whether or not we can shut down e-mail, file-and-print ser- vices, or the integrated library system (ILS) (and, if the latter, whether we shut down just the underlying database server or also the related staff and public services). Perhaps your servers have more obvious names than ours, in which case, count yourself for- tunate. But ours are not so intuitively named—there is a perfectly good reason for this, by the way—and with distributed applications where the database Marc Truitt (marc.truitt@ualberta.ca) is Associate Director, Bibliographic and Information Technology Services, University of Alberta Libraries, Edmonton, Alberta, Canada, and Editor of ITAL. 4 INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2009 may reside here, the application there, and the Web front end yet somewhere else, I’d be surprised if your situation isn’t as complex as ours. And bear in mind that documentation of dependencies goes two ways: Not only do you want to know that “Barney” is hosting the ILS’s Oracle database, but you also want to know all of the servers that should be brought up for you to offer ILS–related services. 2. Prioritize your services. If your cooling system (or other critical server-room utility) were sud- denly only operating at 50 percent of your normal required capacity, how would you quickly decide which services to shut down and which to leave up? I wrote in this space recently that we’ve been thinking about prioritized services in the context of disaster recovery and business continuity, but this week’s incident tells me that we’re not really there yet. Optimally, I think that any senior member of my on-call staff should be empowered in a given critical situation to bring down services on the basis of a predefined set of service priorities. 3. Virtualize, virtualize, virtualize. If we are at all typi- cal of large libraries in the Association of Research Libraries (and I think we are), then it will come as no surprise that we seem to add new services with alarming frequency. I suspect that, as with most places, we tend to try and keep things simple at the server end by hosting new services on sepa- rate, dedicated servers. The resulting proliferation of new servers has led to ever-greater strains on power, cooling, and network infrastructures in a facility that was significantly renovated less than two years ago. And I don’t see any near-term likeli- hood that this will change. We are, consequently, in the very early days of investigating virtualization technology as a means of reducing the number of physical boxes and making much better use of the resources—especially processor and RAM— available to current-generation hardware. I’m hop- ing that someone among our readership is farther along this path than we and will consider submit- ting to ITAL a “how we done it” on virtualization in the library server room very soon! 4. Sometimes low-tech solutions work . . . No one here has failed to observe the irony of an overheated server room when the temperature just steps away is 30° below. Our first thought was how simple and elegant a solution it would be to install duct- ing, an intake fan, and a damper to the outside of the building. Then, the next time our cooling failed in the depths of winter, voila!, we could solve the problem with a mere turn of the damper control. 5. . . . and sometimes they don’t. Not quite, it seems. When asked, our university facilities experts told us that an even greater irony than the one we currently have would be the requirement for Can$100,000 in equipment to heat that -30°C outside air to around freezing so that we wouldn’t freeze pipes and other indoor essentials if we were to adopt the “low-tech” approach and rely on Mother Nature. Oh, well . . . n In memoriam Most of the snail mail I receive as editor consists of advertisements and press releases from various firms providing IT and other services to libraries. But a few months ago a thin, hand-addressed envelope, post- marked Pittsburgh with no return address, landed on my desk. Inside were two slips of paper clipped from a recent issue of ITAL and taped together. On one was my name and address; the other was a mailing label for Jean A. Guasco of Pittsburgh, an ALA Life Member and ITAL subscriber. Beside her name, in red felt-tip pen, someone had written simply “deceased.” I wondered about this for some time. Who was Ms. Guasco? Where had she worked, and when? Had she published or otherwise been active professionally? If she was a Life Member of ALA, surely it would be easy to find out more. It turns out that such is not the case, the wonders of the Internet notwithstanding. My obvious first stop, Google, yielded little other than a brief notice of her death in a Pittsburgh-area newspaper and an entry from a digi- tized September 1967 issue of Special Libraries that identi- fied her committee assignment in the Special Libraries Assocation and the fact that she was at the time the chief librarian at McGraw-Hill, then located in New York. As a result of checking WorldCat, where I found a listing for her master’s thesis, I learned that she graduated from the now-closed School of Library Service at Columbia University in 1953. If she published further, there was no mention of it on Google. My subsequent searches under her name in the standard online LIS indexes drew blanks. From there, the trail got even colder. McGraw-Hill long ago forsook New York for the wilds of Ohio, and it seems that we as a profession have not been very good at retaining for posterity our directories of those in the field. A friend managed to find listings in both the 1982–83 and 1984–85 volumes of Who’s Who in Special Libraries, but all these did was confirm what I already knew: Ms. Guasco was an ALA Life Member, who by then lived in Pittsburgh. I’m guessing that she was then retired, since her death notice gave her age as eighty-six years. Of her professional career before that, I’m sad that I must say I was able to learn no more. 3167 ---- 16 INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2009 Mathew J. Miles and Scott J. Bergstrom Classification of Library Resources by Subject on the Library Website: Is There an Optimal Number of Subject Labels? The number of labels used to organize resources by subject varies greatly among library websites. Some librarians choose very short lists of labels while others choose much longer lists. We conducted a study with 120 students and staff to try to answer the following question: What is the effect of the number of labels in a list on response time to research questions? What we found is that response time increases gradually as the number of the items in the list grow until the list size reaches approximately fifty items. At that point, response time increases significantly. No asso- ciation between response time and relevance was found. I t is clear that academic librarians face a daunting task drawing users to their library’s Web presence. “Nearly three-quarters (73%) of college students say they use the Internet more than the library, while only 9% said they use the library more than the Internet for informa- tion searching.”1 Improving the usability of the library websites therefore should be a primary concern for librar- ians. One feature common to most library websites is a list of resources organized by subject. Libraries seem to use similar subject labels in their categorization of resources. However, the number of subject labels varies greatly. Some use as few as five subject labels while others use more than one hundred. In this study we address the following ques- tion: What is the effect of the number of subject labels in a list on response times to research questions? n Literature review McGillis and Toms conducted a performance test in which users were asked to find a database by navigating through a library website. They found that participants “had difficulties in choosing from the categories on the home page and, subsequently, in figuring out which data- base to select.”2 A review of relevant research literature yielded a number of theses and dissertations in which the authors compared the usability of different library websites. Jeng in particular analyzed a great deal of the usability testing published concerning the digital library. The following are some of the points she summarized that were highly relevant to our study: n User “lostness”: Users did not understand the structure of the digital library. n Ambiguity of terminology: Problems with wording accounted for 36 percent of usability problems. n Finding periodical articles and subject-specific databases was a challenge for users.3 A significant body of research not specific to libraries provides a useful context for the present research. Miller’s landmark study regarding the capacity of human short- term memory showed as a rule that the span of immedi- ate memory is about 7 ± 2 items.4 Sometimes this finding is misapplied to suggest that menus with more than nine subject labels should never be used on a webpage. Subsequent research has shown that “chunking,” which is the process of organizing items into “a collection of ele- ments having strong associations with one another, but weak associations with elements within other chunks,”5 allows human short-term memory to handle a far larger set of items at a time. Larson and Czerwinski provide important insights into menuing structures. For example, increasing the depth (the number of levels) of a menu harms search performance on the Web. They also state that “as you increase breadth and/or depth, reaction time, error rates, and perceived complexity will all increase.”6 However, they concluded that a “medium condition of breadth and depth outperformed the broadest, shallow web structure overall.”7 This finding is somewhat contrary to a previous study by Snowberry, Parkinson, and Sisson, who found that when testing structures of 26, 43, 82, 641 (26 means two menu items per level, six levels deep), the 641 structure grouped into categories proved to be advantageous in both speed and accuracy.8 Larson and Czerwinksi rec- ommended that “as a general principle, the depth of a tree structure should be minimized by providing broad menus of up to eight or nine items each.”9 Zaphiris also corroborated that previous research con- cerning depth and breadth of the tree structure was true for the Web. The deeper the tree structure, the slower the user performance.10 He also found that response times for expandable menus are on average 50 percent longer than sequential menus.11 Both the research and current practices are clear concerning the efficacy of hierarchical menu structures. Thus it was not a focus of our research. The focus instead was on a single-level menu and how the number and characteristics of subject labels would affect search response times. n Background In preparation for this study, library subject lists were col- lected from a set of thirty library websites in the United Mathew J. Miles (milesm@byui.edu) is Systems Librarian and Scott J. Bergstrom (bergstroms@byui.edu) is Director of Institutional Research at Brigham Young University–Idaho in Rexburg. CLASSIFICATION OF LIBRARY RESOuRCES BY SuBJECT ON THE LIBRARY wEBSITE | MILES AND BERGSTROM 17 States, Canada, and the United Kingdom. We selected twelve lists from these websites that were representative of the entire group and that varied in size from small to large. To render some of these lists more usable, we made slight modifications. There were many similarities between label names. n Research design Participants were randomly assigned to one of twelve experimental groups. Each experimental group would be shown one of the twelve lists that were selected for use in this study. Roughly 90 percent of the participants were students. The remaining 10 percent of the participants were full-time employees who worked in these same departments. The twelve lists ranged in number of labels from five to seventy-two: n Group A: 5 subject labels n Group B: 9 subject labels n Group C: 9 subject labels n Group D: 23 subject labels n Group E : 6 subject labels n Group F: 7 subject labels n Group G: 12 subject labels n Group H: 9 subject labels n Group I: 35 subject labels n Group J: 28 subject labels n Group K: 49 subject labels n Group L: 72 subject labels Each participant was asked to select a subject label from a list in response to eleven different research ques- tions. The questions are listed below: 1. Which category would most likely have informa- tion about modern graphical design? 2. Which category would most likely have informa- tion about the Aztec Empire of ancient Mexico? 3. Which category would most likely have informa- tion about the effects of standardized testing on high school classroom teaching? 4. Which category would most likely have informa- tion on skateboarding? 5. Which category would most likely have informa- tion on repetitive stress injuries? 6. Which category would most likely have informa- tion about the French Revolution? 7. Which category would most likely have informa- tion concerning Walmart’s marketing strategy? 8. Which category would most likely have information on the reintroduction of wolves into Yellowstone Park? 9. Which category would most likely have informa- tion about the effects of increased use of nuclear power on the price of natural gas? 10. Which category would most likely have informa- tion on the Electoral College? 11. Which category would most likely have informa- tion on the philosopher Emmanuel Kant? The questions were designed to represent a variety of subject areas that library patrons might pursue. Each sub- ject list was printed on a white sheet of paper in alphabetical order in a single column, or double columns when needed. We did not attempt to test the subject lists in the context of any Web design. We were more interested in observing the effect of the number of labels in a list on response time inde- pendent of any Web design. Each participant was asked the same eleven questions in the same order. The order of ques- tions was fixed because we were not interested in testing for the effect of order and wanted a uniform treatment, thereby not introducing extraneous variance into the results. For each question, the participant was asked to select a label from the subject list under which they would expect to find a resource that would best provide information to answer the question. Participants were also instructed to select only a single label, even if they could think of more than one label as a possible answer. Participants were encour- aged to ask for clarification if they did not fully understand the question being asked. Recording of response times did not begin until clarification of the question had been given. Response times were recorded unbeknownst to the partici- pant. If the participant was simply unable to make a selec- tion, that was also recorded. Two people administered the exercise. One recorded response times; the other asked the questions and recorded label selections. Relevance rankings were calculated for each possible combination of labels within a subject list for each ques- tion. For example, if a subject list consisted of five labels, for each question there were five possible answers. Two library professionals—one with humanities expertise, the other with sciences expertise—assigned a relevance rank- ing to every possible combination of question and labels within a subject list. The rankings were then averaged for each question–label combination. n Results The analysis of the data was undertaken to determine whether the average response times of participants, adjusted by the different levels of relevance in the subject list labels that prevailed for a given question, were signifi- cantly different across the different lists. In other words, would the response times of participants using a particu- lar list, for whom the labels in the list were highly relevant 18 INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2009 to the question, be different from students using the other lists for whom the labels in the list were also highly relevant to the question? A separate univariate general linear model analysis was conducted for each of the eleven questions. The analyses were conducted separately because each ques- tion represented a unique search domain. The univariate general linear model pro- vided a technique for testing whether the average response times associated with the different lists were significantly dif- ferent from each other. This technique also allowed for the inclusion of a cova- riate—relevance of the subject list labels to the question—to determine whether response times at an equivalent level of relevance was different across lists. In the analysis model, the depen- dent variable was response time, defined as the time needed to select a subject list label. The covariate was relevance, defined as the perceived match between a label and the question. For example, a label of “Economics” would be assessed as highly relevant to the question, what is the current unemployment rate? The same label would be assessed as not relevant for the question, what are the names of four moons of Saturn? The main factor in the model was the actual list being presented to the participant. There were twelve lists used in this study. The statistical model can be summarized as follows: response time = list + relevance + (list × relevance) + error The general linear model required that the following conditions be met: First, data must come from a ran- dom sample from a normal population. Second, all vari- ances with each of the groupings are the same (i.e., they have homoscedasticity). An examination of whether these assumptions were met revealed problems both with nor- mality and with homoscedasticity. A common technique— logarithmic transformation—was employed to resolve these problems. Accordingly, response-time data were all converted to common logarithms. An examination of assumptions with the transformed data showed that all questions but three met the required conditions. The three 0.70 0.80 0.90 1.00 1.10 1.20 0.50 0.60 Avg Log Performance Trend Figure 1. The overall average of average search times for the eight questions for all experimental groups (i.e., lists) questions (5, 6, and 7) were excluded from subsequent analysis. n Conclusions The series of graphs in the appendix show the average response times, adjusted for relevance, for eight of the eleven questions for all twelve lists (i.e., experimental groups). Three of the eleven questions were excluded from the analysis because of heteroscedascity. An inspec- tion of these graphs shows no consistent pattern in response time as the number of the items in the lists increase. Essentially, this means that, for any given level of relevance, the number of items of the list does not affect response time significantly. It seems that for a single ques- tion, characteristics of the categories themselves are more important than the quantity of categories in the list. The response times using a subject list with twenty-eight labels is similar to the response times using a list of six labels. A statistical comparison of the mean response time for each CLASSIFICATION OF LIBRARY RESOuRCES BY SuBJECT ON THE LIBRARY wEBSITE | MILES AND BERGSTROM 19 group with that of each of the other groups for each of the questions largely confirms this. There were very few statistically significant different comparisons. The spikes and valleys of the graphs in the appendix are generally not significantly different. However, when the average response time associated with all lists is combined into an overall average from all eight questions, a somewhat clearer picture emerges (see figure 1). Response times increase gradually as the number of the items in the list increase until the list size reaches approximately fifty items. At that point, response time increases significantly. No association was found between response time and relevance. A fast response time did not necessarily yield a relevant response, nor did a slow response time yield an irrelevant response. n Observations We observed that there were two basic patterns exhibited when participants made selections. The first pattern was the quick selection—participants easily made a selection after performing an initial scan of the available labels. Nevertheless, a quick selection did not always mean a relevant selection. The second pattern was the delayed selection. If participants were unable to make a selection after the initial scan of items, they would hesitate as they struggled to determine how the question might be reclas- sified to make one of the labels fit. We did not have access to a high-tech lab, so we were unable to track eye move- ment, but it appeared that the participants began scan- ning up and down the list of available items in an attempt to make a selection. The delayed selection seemed to be a combination of two problems: First, none of the avail- able labels seemed to fit. Second, the delay in scanning increased as the list grew larger. It’s possible that once the list becomes large enough, scanning begins to slow the selection process. A delayed selection did not necessarily yield an irrelevant selection. The label names themselves did not seem to be a significant factor affecting user performance. We did test three lists, each with nine items and each having differ- ent labels, and response times were similar for the three lists. A future study might compare a more extensive number of lists with the same number of items with different labels to see if label names have an effect on response time. This is a particular challenge to librarians in classifying the digital library, since they must come up with a few labels to classify all possible subjects. Creating eleven questions to span a broad range of subjects is also a possible weakness of the study. We had to throw out three questions that violated the assump- tions of the statistical model. We tried our best to select questions that would represent the broad subject areas of science, arts, and general interest. We also attempted to vary the difficulty of the questions. A different set of questions may yield different results. References 1. Steve Jones, The Internet Goes to College, ed. Mary Madden (Washington, D.C.: Pew Internet and American Life Project, 2002): 3, www.pewinternet.org/pdfs/PIP_College_Report.pdf (accessed Mar. 20, 2007). 2. Louise McGillis and Elaine G. Toms, “Usability of the Academic Library Web Site: Implications for Design,” College & Research Libraries 62, no. 4 (2001): 361. 3. Judy H. Jeng, “Usability of the Digital Library: An Evalu- ation Model” (PhD diss., Rutgers University, New Brunswick, New Jersey): 38–42. 4. George A. Miller, “The Magical Number Seven Plus or Minus Two: Some Limits on Our Capacity for Processing Infor- mation,” Psychological Review 63, no. 2 (1956): 81–97. 5. Fernand Gobet et al., “Chunking Mechanisms in Human Learning,” Trends in Cognitive Sciences 5, no. 6 (2001): 236–43. 6. Kevin Larson and Mary Czerwinski, “Web Page Design: Implications of Memory, Structure and Scent for Informa- tion Retrieval” (Los Angeles: ACM/Addison-Wesley, 1998): 25, http://doi.acm.org/10.1145/274644.274649 (accessed Nov. 1, 2007). 7. Ibid. 8. Kathleen Snowberry, Mary Parkinson, and Norwood Sis- son, “Computer Display Menus,” Ergonomics 26, no 7 (1983): 705. 9. Larson and Czerwinski, “Web Page Design,” 26. 10. Panayiotis G. Zaphiris, “Depth vs. Breath in the Arrange- ment of Web Links,” www.soi.city.ac.uk/~zaphiri/Papers/hfes .pdf (accessed Nov. 1, 2007). 11. Panayiotis G. Zaphiris, Ben Shneiderman, and Kent L. Norman, “Expandable Indexes Versus Sequential Menus for Searching Hierarchies on the World Wide Web,” http:// citeseer.ist.psu.edu/rd/0%2C443461%2C1%2C0.25%2CDow nload/http://coblitz.codeen.org:3125/citeseer.ist.psu.edu/ cache/papers/cs/22119/http:zSzzSzagrino.orgzSzpzaphiriz SzPaperszSzexpandableindexes.pdf/zaphiris99expandable.pdf (accessed Nov. 1, 2007). 20 INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2009 APPENDIx. Response times by question by group 0.00 0.20 0.40 0.60 0.80 1.00 1.20 GR P A (5 it em s) GR P E (6 it em s) GR P F (7 it em s) GR P B (9 it em s) GR P C (9 it em s) GR P H (9 it em s) GR P G (1 2 ite m s) GR P D (2 3 ite m s) GR P J (2 8 ite m s) GR P I (3 5 ite m s) GR P K (4 9 ite m s) GR P L (7 2 ite m s) 0.00 0.20 0.40 0.60 0.80 1.00 1.20 GR P A (5 it em s) GR P E (6 it em s) GR P F (7 it em s) GR P B (9 it em s) GR P C (9 it em s) GR P H (9 it em s) GR P G (1 2 ite m s) GR P D (2 3 ite m s) GR P J (2 8 ite m s) GR P I (3 5 ite m s) GR P K (4 9 ite m s) GR P L (7 2 ite m s) 0.00 0.20 0.40 0.60 0.80 1.00 1.20 GR P A (5 it em s) GR P E (6 it em s) GR P F (7 it em s) GR P B (9 it em s) GR P C (9 it em s) GR P H (9 it em s) GR P G (1 2 ite m s) GR P D (2 3 ite m s) GR P J (2 8 ite m s) GR P I (3 5 ite m s) GR P K (4 9 ite m s) GR P L (7 2 ite m s) 0.00 0.20 0.40 0.60 0.80 1.00 1.20 1.40 GR P A (5 it em s) GR P E (6 it em s) GR P F (7 it em s) GR P B (9 it em s) GR P C (9 it em s) GR P H (9 it em s) GR P G (1 2 ite m s) GR P D (2 3 ite m s) GR P J (2 8 ite m s) GR P I (3 5 ite m s) GR P K (4 9 ite m s) GR P L (7 2 ite m s) 0.00 0.20 0.40 0.60 0.80 1.00 1.20 GR P A (5 it em s) GR P E (6 it em s) GR P F (7 it em s) GR P B (9 it em s) GR P C (9 it em s) GR P H (9 it em s) GR P G (1 2 ite m s) GR P D (2 3 ite m s) GR P J (2 8 ite m s) GR P I (3 5 ite m s) GR P K (4 9 ite m s) GR P L (7 2 ite m s) 0.00 0.20 0.40 0.60 0.80 1.00 1.20 GR P A (5 it em s) GR P E (6 it em s) GR P F (7 it em s) GR P B (9 it em s) GR P C (9 it em s) GR P H (9 it em s) GR P G (1 2 ite m s) GR P D (2 3 ite m s) GR P J (2 8 ite m s) GR P I (3 5 ite m s) GR P K (4 9 ite m s) GR P L (7 2 ite m s) 0.00 0.20 0.40 0.60 0.80 1.00 1.20 1.40 1.60 GR P A (5 it em s) GR P B (9 it em s) GR P C (9 it em s) GR P D (2 3 ite m s) GR P E (6 it em s) GR P F (7 it em s) GR P G (1 2 ite m s) GR P H (9 it em s) GR P I (3 5 ite m s) GR P J (2 8 ite m s) GR P K (4 9 ite m s) GR P L (7 2 ite m s) 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 GR P A (5 it em s) GR P E (6 it em s) GR P F (7 it em s) GR P B (9 it em s) GR P C (9 it em s) GR P H (9 it em s) GR P G (1 2 ite m s) GR P D (2 3 ite m s) GR P J (2 8 ite m s) GR P I (3 5 ite m s) GR P K (4 9 ite m s) GR P L (7 2 ite m s) Question 1 Question 8 Question 2 Question 9 Question 3 Question 10 Question 4 Question 11 3165 ---- 2 INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2009 Andrew K. Pace President’s Message: LITA Now Andrew K. Pace (pacea@oclc.org) is LITA President 2008/2009 and Executive Director, Networked Library Services at OCLC Inc. in Dublin, Ohio. A t the time of this writing, my term as LITA presi- dent is half over; by the time of publication, I will be in the home stretch—a phrase that, to me, always connotes relief and satisfaction that is never truly realized. I hope that this time between ALA conferences is a time of reflection for the LITA board, committees, inter- est groups, and the membership at large. Various strate- gic planning sessions are, I hope, leading us down a path of renewal and regeneration of the division. Of course, the world around us will have its effect—in particular, a political and economic effect. First, the politics. I was asked recently to give my opinion about where the new administration should focus its attention regarding library technology. I had very little time to think of a pithy answer to this ques- tion, so I answered with my gut that the United States needs to continue its investment in IT infrastructure so that we are on par with other industrialized nations while also lending its aid to countries that are lagging behind. Furthermore, I thought it an apt time to redress issues of data privacy and retention. The latter is often far from our minds in a world more connected, increasingly through wireless technology, and with a user base that, as one privacy expert put it, would happily trade a DNA sample for an Extra Value Meal. I will resist the urge to write at greater length a treatise on the Bill of Rights and its status in 2008. I will hope, however, that LITA’s Technology and Access and Legislation and Regulation committees will feel reinvigorated post–election and post–inauguration to look carefully at the issues of IT policy. Our penchant for new tools should always be guided and tempered by the implementation and support of policies that rational- ize their use. As for the economy, it is our new backdrop. One anecdotal view of this is the number of e-mails I’ve received from committee appointees apologizing that they will not be able to attend ALA conferences as planned because of the economic downturn and local cuts to library budgets. Libraries themselves are in a paradoxical situation—increasing demand for the free services that libraries offer while simultaneously facing massive budget cuts that support the very collections and programs people are demanding. What can we do? Well, I would suggest that we look at library technology through a lens of efficiency and cost savings, not just from a perspective of what is cool or trendy. When it comes to running systems, we need to keep our focus on end-user satisfaction while consider- ing total cost of ownership. And if I may be selfish for a moment, I hope that we will not abandon our profes- sional networks and volunteer activities. While we all make sacrifices of time, money, and talent to support our profession, it is often tempting when economic times are hard to isolate ourselves from the professional networks that sustain us in times of plenty. Politics and economics? Though I often enjoy being cynical, I also try to make lemonade from lemons when- ever I can. I think there are opportunities for libraries to get their own economic bailout in supporting public works and emphasizing our role in contributing to the public good. We should turn our “woe-are-we” tenden- cies that decry budget cuts and low salaries into champi- oned stories of “what libraries have done for you lately.” And we should go back to the roots of IT, no matter how mythical or anachronistic, and think about what we can do technically to improve systemwide efficiencies. I encourage the membership to stay involved and reengage, whether through direct participation in LITA activities or through a closer following of the activities in the ALA Office of Information Technology Policy (OITP, www.ala.org/ala/aboutala/offices/oitp) and the ALA Washington Office itself. There is much to follow in the world that affects our profession, and so many are doing the heavy lifting for us. All we need to do sometimes is pay attention. Make fun of me if you want for stealing a campaign phrase from Richard Nixon, but I kept coming back to it in my head. In short, Library Information Technology— now more than ever. 3170 ---- LANECONNEx | KETCHELL ET AL. 31 LaneConnex: An Integrated Biomedical Digital Library Interface Debra S. Ketchell, Ryan Max Steinberg, Charles Yates, and Heidi A. Heilemann This paper describes one approach to creating a search application that unlocks heterogeneous content stores and incorporates integrative functionality of Web search engines. LaneConnex is a search interface that identifies journals, books, databases, calculators, bioinformatics tools, help information, and search hits from more than three hundred full-text heterogeneous clinical and biore- search sources. The user interface is a simple query box. Results are ranked by relevance with options for filtering by content type or expanding to the next most likely set. The system is built using component-oriented program- ming design. The underlying architecture is built on Apache Cocoon, Java Servlets, XML/XSLT, SQL, and JavaScript. The system has proven reliable in production, reduced user time spent finding information on the site, and maximized the institutional investment in licensed resources. M ost biomedical libraries separate searching for resources held locally from external database searching, requiring clinicians and researchers to know which interface to use to find a specific type of information. Google, Amazon, and other Web search engines have shaped user behavior and expectations.1 Users expect a simple query box with results returned from a broad array of content ranked or categorized appropriately with direct links to content, whether it is an HTML page, a PDF document, a streaming video, or an image. Biomedical libraries have transitioned to digital journals and reference sources, adopted OpenURL link resolvers, and created institutional repositories. However, students, clinicians, and researchers are hindered from maximizing this content because of proprietary and het- erogeneous systems. A strategic challenge for biomedical libraries is to create a unified search for a broad spectrum of licensed, open-access, and institutional content. n Background Studies show that students and researchers will use the search path of least cognitive resistance.2 Ease and speed are the most important factors for using a particular search engine. A University of California report found that academic users want one search tool to cover a wide information universe, multiple formats, full-text avail- ability to move seamlessly to the item itself, intelligent assistance and spelling correction, results sorted in order of relevance, help navigating large retrievals by logical subsetting and customization, and seamless access any- time, anywhere.3 Studies of clinicians in the patient-care environment have documented that effort is the most important factor in whether a patient-care question is pursued.4 For researchers, finding and using the best bio- informatics tool is an elusive problem.5 In 2005, the Lane Medical Library and Knowledge Management Center (Lane) at the Stanford University Medical Center provided access to an expansive array of licensed, institutional, and open-access digital content in support of research, patient care, and education. Like most of its peers, Lane users were required to use scores of different interfaces to search external databases and find digital resources. We created a local metasearch application for clinical reference content, but it did not integrate result sets from disparate resources. A review of federated-search software in the marketplace found that products were either slow or they limited retrieval when faced with a broad spectrum of biomedical content. We decided to build on our existing application architecture to create a fast and unified interface. A detailed analysis of Lane website-usage logs was conducted before embarking on the creation of the new search application. Key points of user failure in the exist- ing search options were spelling errors that could easily be corrected to avoid zero results; lack of sufficient intui- tive options to move forward from a zero-results search or change topics without backtracking; lack of use of existing genre or role searches; confusion about when to use the resource, OpenURL resolver, or PubMed search to find a known item; and results that were cognitively difficult to navigate. Studies of the Web search engine and the PubMed search log concurred with our usage- log analysis: A single term search is the most common, with three words maximum entered by typical users.6 A PubMed study found that 22 percent of user queries were for known items rather than for a general subject, con- firming our own log analysis findings that the majority of searches were for a particular source item.7 Search-term analysis revealed that many of our users were entering partial article citations (e.g., author, date) in any query Debra S. Ketchell (debra.ketchell@gmail.com) is the for- mer Associate Dean for Knowledge Management and Library Director; Ryan Max Steinberg (ryan.max.steinberg@stanford .edu) is the Knowledge Integration Programmer/Architect; Charles Yates (charles.yates@stanford.edu) is the Systems Software Developer; and Heidi A. Heilemann (heidi.heilemann@stanford .edu) is the former Director for Research & Instruction and cur- rent Associate Dean for Knowledge Management and Library Director at the Lane Medical Library & Knowledge Management Center, Information Resources & Technology, Stanford University School of Medicine, Stanford, California. 32 INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2009 box expecting that article databases would be searched concurrently with the resource database. Our displayed results were sorted alphabetically, and each version of an item was displayed separately. For the user, this meant a cluttered list with redundant title information that increased their cognitive effort to find meaningful items. Overall, users were confronted with too many choices upfront and too few options after retrieving results. Focus groups of faculty and students were conducted in 2005. Attendees wanted local information integrated into the proposed single search. Local information included content such as how-to information, expertise, seminars, grand rounds, core lab resources, drug formulary, patient handouts, and clinical calculators. Most of this content is restricted to the Stanford user population. Users consis- tently described their need for a simple search interface that was fast and customized to the Stanford environ- ment. In late 2005, we embarked on a project to design a search application that would address both existing points of failure in the current system and meet the expressed need for a comprehensive discovery-and- finding tool as described in focus groups. The result is an application called LaneConnex. n Design objectives The overall goal of LaneConnex is to create a simple, fast search across multiple licensed, open-access, and special-object local knowledge sources that depackages and reaggregates information on the basis of Stanford institutional roles. The content of Lane’s digital collec- tion includes forty-five hundred journal titles and forty- two thousand other digital resources, including video lectures, executable software, patient handouts, bioin- formatics tools, and a significant store of digitized his- torical materials as a result of the Google Books program. Media types include HTML pages, PDF documents, JPEG images, MP3 audio files, MPEG4 videos, and executable applications. More than three hundred reference titles have been licensed specifically for clinicians at the point of care (e.g., UpToDate, eMedicine, STAT-Ref, and Micromedex Clinical Evidence). Clinicians wanted their results to reflect subcomponents of a package (e.g., results from the Micromedex patient handouts). Other clinical content is institutionally managed (e.g., institutional formulary, lab test database, or patient handouts). More than 175 bio- medical research tools have been licensed or selected from open-access content. The needs of biomedical researchers include molecular biology tools and software, biomedi- cal literature databases, citation analysis, chemical and engineering databases, expertise-finding tools, laboratory tools and supplies, institutional-research resources, and upcoming seminars. The specific objectives of the search application are the following: n The user interface should be fast, simple, and intui- tive, with embedded suggestions for improving search results (e.g., Did you mean? Didn’t find it? Have you tried?). n Search results from disparate local and external systems should be integrated into a single display based on popular search-engine models familiar to the target population. n The query-retrieval and results display should be separated and reusable to allow customization by role or domain and future expansion into other institutional tools. n Resource results should be ranked by relevance and filtered by genre. n Metasearch results should be hit counts and fil- tered by category for speed and breadth. Results should be reusable for specific views by role. n Finding a known article or journal should be streamlined and directly link to the item or “get item” option. n The most popular search options (PubMed, Google, and Lane journals) should be ubiquitous. n Alternative pathways should be dynamic and interactive at the point of need to avoid backtrack- ing and dead ends. n User behavior should be tracked by search term, resource used, and user location to help the library make informed decisions about licensing, meta- data, and missing content. n Off-the-shelf software should be used when avail- able or appropriate with development focused on search integration. n The application should be built upon existing metadata-creation systems and trusted Web- development technologies. Based on these objectives, we designed an application that is an extension of existing systems and technolo- gies. Resources are acquired and metadata are provided using the Voyager integrated library system (ILS). The SFX OpenURL link resolver provides full-text article access and expands the title search beyond biomedicine to all online journals at Stanford. EZproxy provides seamless off-campus access. WebTrends provides usage tracking. Movable Type is used to create FAQ and help information. A locally developed metasearch application provides a cross search with hit results from more than three hundred external and internal full-text sources. The technologies used to build LaneConnex and integrate all of these systems include Extensible Stylesheet Language LANECONNEx | KETCHELL ET AL. 33 Transformations (XSLT), Java, JavaScript, the Apache Cocoon project, and Oracle. n Systems Description Architecture LaneConnex is built on a principle of separation of concerns. The Lane content owner can directly change the inclusion of search results, how they are displayed, and additional path-finding information. Application programmers use Java, JavaScript, XSLT, and Structured Query Language (SQL) to create components that generate and modify the search results. The merger of content design and search results occurs “just in time” in the user’s browser. We use component-oriented programming design whereby services provided within the application are defined by simple contracts. In LaneConnex, these com- ponents (called “transformers”) consume XML informa- tion and, after transforming it in some way, pass it on to some other component. A particular contract can be fulfilled in different ways for different purposes. This component architecture allows for easy extension of the underlying Apache Cocoon application. If LaneConnex needs to transform some XML data that is not possible with built-in Cocoon transformers, it is a simple matter to create a software component that does what is needed and fulfills the transformer contract. Apache Cocoon is the underlying architecture for LaneConnex, as illustrated in figure 1. This Java Servlet is an XML–publishing engine that is built upon a compo- nent framework and uses a pipeline-processing model. A declarative language uses pattern matching to associate sets of processing components with particular request URLs. Content can come from a variety of sources. We use content from the local file system, network file sys- tem, HTTP, and a relational database. The XSLT language is used extensively in the pipelines and gives fine control of individual parts of the documents being processed. The end of processing is usually an XHTML document but can be any common MIME type. We use Cocoon to separate areas of concern so things like content, look and feel, and processing can all be managed as separate entities by different groups of people with little effect on another area. This separation of concerns is manifested by template documents that contain most of the HTML content common to all pages and are then combined with content documents within a processing pipeline. The declarative nature of the sitemap language and XSLT facilitate rapid development with no need to redeploy the entire application to make changes in its behavior. The LaneConnex search is composed of several com- ponents integrated into a query-and-results interface: Oracle resource metadata, full-text metasearch application, Movable Type blogging software, “Did you mean?” spell checker, EZproxy remote access, and WebTrends tracking. n Full-text Metasearch Integration of results from Lane’s metasearch applica- tion illustrates Cocoon’s many strengths. When a user searches LaneConnex, Cocoon sends his or her query to the metasearch application, which then dispatches the request to multiple external, full-text search engines and content stores. Some examples of these external resources are UpToDate, Access Medicine, Micromedex, PubMed, and MD Consult. The metasearch application interacts with these external resources through Jakarta Commons HTTP clients. Responses from external resources are turned into W3C Document Object Model (DOM) objects, and XPath expressions are used to resolve hit counts from the DOM objects. As result counts are returned, they are added to an XML–based result list and returned to Cocoon. The power of Cocoon becomes evident as the XML– based metasearch result list is combined with a separate display template. This template-based approach affords content curators the ability to directly add, group, and describe metasearch resources using the language and look that is most meaningful to their specific user communities. For example, there are currently eight metasearch templates curated by an informationist in partnership with a target community. Curating these tem- plates requires little to no assistance from programmers. In Lane’s 2005 interface, a user’s request was sent to the metasearch application, and the application waited five seconds before responding to give external resources a chance to return a result. Hit counts in the user interface included a link to refresh and retrieve more results from external resources that had not yet responded. Usability studies showed this to be a significant user barrier, since the refresh link was rarely clicked. The initial five second delay also gave users the impression that the site was slow. The LaneConnex application makes heavy use of JavaScript to solve this problem. After a user makes her initial request, JavaScript is used to poll the metasearch application (through Cocoon) on the user’s behalf, pop- ping in result counts as external resources respond. This adds a level of interactivity previously unavailable and makes the metasearch piece of LaneConnex much more successful than its previous version. Resource metadata LaneConnex replaces the catalog as the primary discov- ery interface. Metadata describing locally owned and 34 INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2009 licensed resources (journals, databases, books, videos, images, calculators, and software applications) are stored in the library’s current system of record, an instance of the Voyager ILS. LaneConnex makes no attempt to replace Voyager ’s strengths as an application for the selection, acquisition, description, and management of access to library resources. It does, however, replace Voyager ’s discovery interface. To this end, metadata for about eight thousand digital resources is extracted from Voyager ’s Oracle database, converted into MARCXML, processed with XSLT, and stored in a simple relational database (six tables and twenty-nine attributes) to sup- port fast retrieval speed and tight control over search syntax. This extraction process occurs nightly, with incremental updates every five minutes. The Oracle Text search engine provides functionality anticipated by our Internet-minded users. Key features are speed and relevance-ranked results. A highly refined results rank- ing insures that the logical title appears in the first few results. A user ’s query is parsed for wildcard, Boolean, proximity, and phrase operators, and then translated into an SQL query. Results are then transformed into a display version. Related services LaneConnex compares a user’s query terms against a dictionary. Each query is sent to a Cocoon spell-checking component that returns suggestions where appropri- ate. This component currently uses the Simple Object Figure 1. LaneConnex Architecture. LANECONNEx | KETCHELL ET AL. 35 Access Protocol (SOAP)–based spell- ing service from Google. Google was chosen over the National Center for Biotechnology Information (NCBI) spelling service because of the breadth of terms entered by users; however, Cocoon’s component-oriented archi- tecture would make it trivial to change spell checkers in the future. Each query is also compared against Stanford’s OpenURL link resolver (FindIt@Stanford). Client-side JavaScript makes a Cocoon-mediated query of FindIt@Stanford. Using XSLT, FindIt@Stanford responses are turned into JavaScript Object Notation (JSON) objects and popped into the interface as appropriate. Although the vast majority of LaneConnex searches result in zero FindIt@Stanford results, the convenience of searching all of Lane’s systems in a single, unified interface far outweighs the effort of implementation. A commercial analytics tool called WebTrends is used to collect Web statis- tics for making data-centric decisions about interface changes. WebTrends uses client-side JavaScript to track specific user click events. Libraries need to track both on-site clicks (e.g., the user clicked on “Clinical Portal” from the home page) and off-site clicks (e.g., the user clicked on “Yamada’s Gastroenterology” after doing a search for “IBS”). To facilitate off-site click capture, WebTrends requires every external link to include a snippet of JavaScript. Requiring content creators to input this code by hand would be error prone and tedious. LaneConnex automatically supplies this code for every class of link (search or static). This specialized WebTrends method provides Lane with data to inform both interface design and licensing decisions. n Results LaneConnex version 1.0 was released to the Stanford biomedical community in July 2006. The current applica- tion can be experienced at http://lane.stanford.edu. The Figure 2. LaneConnex Resource Search Results. Resource results are ranked by rel- evance. Single word titles are given a higher weight in the ranking algorithm to insure they are displayed in the first five results. Uniform titles are used to co-locate versions (e.g., the three instances of Science from different producers). Journals titles are linked to their respective impact factor page in the ISI Web of Knowledge. Digital formats that require spe- cial players or restrictions are indicated. The metadata searched for eJournals, Databases, eBooks, Biotools, Video, and medCalcs are Lane’s digital resources extracted from the inte- grated library system into a searchable Oracle database. The first “All” tab is the combined results of these genres and the Lane Site help and information. Figure 3. LaneConnex Related Services Search Enhancements. LaneConnex includes a spell checker to avoid a common failure in user searches. AJAx services allow the inclusion of search results from other sources for common zero results failures. For example, the Stanford link resolver database is simultaneously searched to insure online journals outside the scope of biomedicine are presented as a linked result for the user. production version has proven reliable over two years. Incremental user focus groups have been employed to improve the interface as issues arose. A series of vignettes will be used to illustrate how the current version of 36 INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2009 the “SUNetID login” is required. n User query: “new yokrer.” A faculty member is looking for an arti- cle in the New Yorker for a class reading assignment. He makes a typing error, which invokes the “Did you mean?” function (see figure 3). He clicks on the correct spelling. No results are found in the resource search, but a simul- taneous search of the link-resolver database finds an instance of this title licensed for the campus and displays a clickable link for the user. n User query: “pathway analy- sis.” A post–doc is looking for infor- mation on how to share an Ingenuity pathway. Figure 4 illustrates the inte- gration of the locally created Lane FAQs. FAQs comprise a broad spec- trum of help and how-to information as described by our focus groups. Help text is created in the Movable Type blog software, and made searchable through the LaneConnex application. The Movable Type interface lowers the barrier to HTML content creation by any staff member. More complex answers include embedded images and videos to enable the user to see exactly how to do a particular proce- dure. Cocoon allows for the syndica- tion of subsets of this FAQ content back into static HTML pages where it can be displayed as both category-specific lists or as the text for scroll-over help for a link. Having a single store of help information insures the content is updated once for all instances. n User query: “uterine cancer kapp.” A resident is looking for a known article. LaneConnex simultaneously searches PubMed to increase the likelihood of user success (see figure 5). Clicking on the PubMed tab retrieves the results in the native interface; however, the user sees the PubMed@Stanford ver- sion, which includes embedded links to the article based on our OpenURL link resolver. The ability to retrieve results from bibliographic databases that includes article resolution insures that our biomedical community is always using the correct URL to insure maximum full-text article access. User testing in 2007 found that adding the three most frequently used sources (PubMed, Google, and Lane Catalog) into our one-box LaneConnex search was a significant time saver. It addresses LaneConnex meets the design objectives from the user’s perspective. n User query: “science.” A graduate student is look- ing for the journal Science. The LaneConnex results are listed in relevance order (see figure 2). Single- word titles are given a higher weight in the rank- ing algorithm to insure they are displayed in the first five results. Results from local metadata are displayed by uniform title. For example, Lane has three instances of the journal Science, and each version is linked to the appropriate external store. Brief notes provide critical information for particu- lar resources. For example, restricted local patient education documents and video seminars note that Figure 4. Example of Integration of Local Content Stores. help information is managed in Moveable Type and integrated into LaneConnex search results. LANECONNEx | KETCHELL ET AL. 37 the expectation on the part of our users that they could search for an article or a journal title in a single search box without first selecting a database. n User query: “serotonin pul- monary hypertension.” A medical student is looking for the correlation of two topics. Clicking on the “Clinical” tab, the student sees the results of the clinical metasearch in fig- ure 6. Metasearch results are deep searches of sources within licensed packages (e.g., text- books in MD Consult or a spe- cific database in Micromedex), local content (e.g., Stanford’s lab-test database), and open- access content (e.g., NCBI databases). PubMed results are tailored strategies tiered by evidence. For example, the evidence-summaries strategy retrieves results from twelve clinical-evidence resources (e.g., BUJ, Clinical Evidence, and Cochrane Systematic Reviews) that link to the full-text licensed by Stanford. An example of the bioresearch metasearch is shown in figure 7. Content selected for this audience includes literature databases, funding sources, patents, structures, clinical trials, protocols, and Stanford expertise integrated with gene, protein, and phe- notype tools. User testing revealed that many users did not click on the “Clinical” tab. The clinical metasearch was originally developed for the Clinical portal page and focused on clinicians in practice; however, the results needed to be exposed more directly as part of the LaneConnex search. Figure 8 illustrates the “Have you tried?” feature that displays a few relevant clinical-content sources without requiring the user to select the “Clinical” tab. This fea- ture is managed by the SmartSearch component of the LaneConnex system. SmartSearch sends the user’s query terms to PubMed, extracts a subset of articles associated with those terms, extracts the MeSH headings for those articles, and computes the frequency of headings in the articles to determine the most likely MeSH terms associ- ated with the user’s query terms. These MeSH terms are mapped to MeSH terms associated with each metasearch resource. Preliminary evaluation indicates that the clini- cal content is now being discovered by more users. Figure 5. Example of Integration of Popular Search Engines into LaneConnex Results. Three of the most popular searches based on usage analysis are included at the top level. PubMed and google are mapped to Lane’s link resolver to retrieve the full article. Creating or editing metasearch templates is a curator- driven task. Programming is only required to add new sources to the metasearch engine. A curator may choose from more than three hundred sources to create a dis- cipline-based layout using general templates. Names, categories, and other description information are all at the curator ’s discretion. While developing new sub- specialty templates, we discovered that clinicians were confused by the difference in layout of their specialty portal and their metasearch results (e.g., the Cardiology portal used the generic clinical metasearch). To address this issue, we devised an approach that merges a portal and metasearch into a single entity as illustrated in figure 9. A combination of the component-oriented architecture of LaneConnex and JavaScript makes the integration of metasearch results into a new template patterned after a portal easy to implement. This strategy will enable the creation of templates contextually appropriate to knowl- edge requests originating from electronic medical-record systems in the future. Direct user feedback and usage statistics confirm that search is now the dominant mode of navigation. The amount of time each user spends on the website has dropped since the release of version 1.0. We speculate that the integrated search helps our users find relevant 38 INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2009 information more efficiently. Focus groups with students are uniformly positive. Graduate students like the ability to find digital articles using a single search box. Medical students like the clinical metasearch as an easy way to look up new topics in texts and customized PubMed searches. Bioengineering students like the ability to easily look up patient care–related topics. Pediatrics residents and attend- ings have championed the develop- ment of their portal and metasearch focused on their patient population. Medical educators have commented on their ability to focus on the best information sources. n Discussion A review of websites in 2007 found that most biomedical libraries had sep- arate search interfaces for their digital resources, library catalog, and exter- nal databases. Biomedical libraries are implementing metasearch software to cross search proprietary data- bases. The University of California, Davis is using the MetaLib software to federate searching multiple bib- liographic databases.8 The University of South California and Florida State University are using WebFeat soft- ware to search clinical textbooks.9 The Health Sciences Library System at the University of Pittsburgh is using Vivisimo to search clinical textbooks and bioresearch tools.10 Academic libraries are introducing new “resource shopping” applications, such as the Endeca project at North Carolina State University, the Summa project at the University of Aarhus, and the VuFind project at Villanova University.11 These systems offer a single query box, faceted results, spell checking, recom- mendations based on user input, and Asynchronous JavaScript and XML (AJAX) for live status information. We believe our approach is a practi- cal integration for our biomedical com- munity that bridges finding a resource and finding a specific item through Figure 6. Integration of metasearch results into LaneConnex. Results from two general, role-based metasearches (Bioresearch and Clinical) are included in the LaneConnex interface. The first image shows a clinician searching LaneConnex for serotonin pulmonary hypertension. Selecting the Clinical tab presents the clinical content metasearch display (second image), and is placed deep inside the source by selecting a title (third image). LANECONNEx | KETCHELL ET AL. 39 a metasearch of multiple databases. The LaneConnex application searches across digital resources and external data stores simultaneously and pres- ents results in a unified display. The limitation to our approach is that the metasearch returns only hit counts rather than previews of the specific content. Standardization of results from external systems, particularly receipt of XML results, remains a chal- lenge. Federated search engines do integrate at this level, but are usually slow or limit the number of results. True integration awaits Health Level Seven (HL7) Clinical Decision Support standards and National Information Standards Organization (NISO) MetaSearch initiative for query and retrieval of specific content.12 One of the primary objectives of LaneConnex is speed and ease of use. Ranking and categorization of results has been very successful in the eyes of the user community. The integration of metasearch results has been par- ticularly successful with our pediatric specialty portal and search. However, general user understanding of how the clinical and biomedical tabs related to the genre tabs in LaneConnex has been problematic. We reviewed Web engines and found a similar challenge in presenting disparate format results (e.g., video or image search results) or lists of hits from different systems (e.g., NCBI’s Entrez search results).13 We are continuing to develop our new specialty portal-and-search model and our SmartSearch term-mapping com- ponent to further integrate results. n Conclusion LaneConnex is an effective and open- ended search infrastructure for inte- grating local resource metadata and full-text content used by clinicians and biomedical researchers. Its effective- ness comes from the recognition that users prefer a single query box with relevance or categorically organized results that lead them to the most likely Figure 7. Example of a Bioresearch Metasearch. Figure 8. The SmartSearch component embeds a set of the metasearch results into the LaneConnex interface as “have you tried?” clickable links. These links are the equivalent of selecting the title from a clinical metasearch result. The example search for atypical malig- nant rhabdoid tumor (a rare childhood cancer) invokes oncology and pediatric textbook results. These texts and PubMed provide quick access for a medical student or resident on the pediatric ward. Figure 9. Example of a Clinical Specialty Portal with Integrated Metasearch. Clinical portal pages are organized so metasearch hit counts can display next to content links if a user executes a search. This approach removes the dissonance clinicians felt existed between separate portal page and metasearch results in version 1.0. 40 INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2009 answer to a question or prospects in their exploration. The application is based on separation of concerns and is easily extensible. New resources are constantly emerg- ing, and it is important that libraries take full advantage of existing and forthcoming content that is tailored to their user population regardless of the source. The next major step in the ongoing development of LaneConnex is becoming an invisible backend application to bring content directly into the user’s workflow. n Acknowledgements The authors would like to acknowledge the contribu- tions of the entire LaneConnex technical team, in par- ticular Pam Murnane, Olya Gary, Dick Miller, Rick Zwies, and Rikke Ogawa for their design contributions, Philip Constantinou for his architecture contribution, and Alain Boussard for his systems development contributions. References 1. Denise T. Covey, “The Need to Improve Remote Access to Online Library Resources: Filling the Gap between Com- mercial Vendor and Academic User Practice,” Portal Libraries and the Academy 3 no.4 (2003): 577–99; Nobert Lossau, “Search Engine Technology and Digital Libraries,” D-Lib Magazine 10 no. 6 (2004), www.dlib.org/dlib/june04/lossau/06lossau.html (accessed Mar. 1, 2008); OCLC, “College Students’ Perception of Libraries and Information Resource,” www.oclc.org/reports/ perceptionscollege.htm (accessed Mar 1, 2008); and Jim Hender- son, “Google Scholar: A Source for Clinicians,” Canadian Medical Association Journal 12 no. 172 (2005). 2. Covey, “The Need to Improve Remote Access to Online Library Resources”; Lossau, “Search Engine Technology and Digital Libraries”; OCLC, “College Students’ Perception of Libraries and Information Resource.” 3. Jane Lee, “UC Health Sciences Metasearch Exploration. Part 1: Graduate Student Gocus Group Findings,” UC Health Sciences Metasearch Team, www.cdlib.org/inside/assess/ evaluation_activities/docs/2006/draft_gradReport_march2006. pdf (accessed Mar. 1, 2008). 4. Karen K. Grandage, David C. Slawson, and Allen F. Shaughnessy, “When Less is More: a Practical Approach to Searching for Evidence-Based Answers,” Journal of the Medical Library Association 90 no. 3 (2002): 298–304. 5. Nicola Cannata, Emanuela Merelli, and Russ B. Altman, “Time to Organize the Bioinformatics Resourceome,” PLos Com- putational Biology 1 no. 7 (2005): e76. 6. Craig Silverstein et al., “Analysis of a Very Large Web Search Engine Query Log,” www.cs.ucsb.edu/~almeroth/ classes/tech-soc/2005-Winter/papers/analysis.pdf (accessed Mar. 1, 2008); Anne Aula, “Query Formulation in Web Informa- tion Search,” www.cs.uta.fi/~aula/questionnaire.pdf (accessed Mar. 1, 2008); Jorge R. Herskovic, Len Y. Tanaka, William Hersh, and Elmer V. Bernstam, “A Day in the Life of PubMed: Analysis of a Typical Day’s Query Log,” Journal of the American Medical Informatics Association 14 no. 2 (2007): 212–20. 7. Herskovic, “A Day in the Life of PubMed.” 8. Davis Libraries University of California, “QuickSearch,” http://mysearchspace.lib.ucdavis.edu/ (accessed Mar. 1, 2008). 9. Eileen Eandi, “Health Sciences Multi-eBook Search,” Norris Medical Library Newsletter (Spring 2006), Norris Medical Library, University of Southern California, www.usc.edu/hsc/ nml/lib-information/newsletters.html (accessed Mar. 1, 2008); Maguire Medical Library, Florida State University, “WebFeat Clinical Book Search,” http://med.fsu.edu/library/tutorials/ webfeat2_viewlet_swf.html (accessed Mar. 1, 2008). 10. Jill E. Foust, Philip Bergen, Gretchen L. Maxeiner, and Peter N. Pawlowski, “Improving E-Book Access via a Library- Developed Full-Text Search Tool,” Journal of the Medical Library Association 95 no. 1 (2007): 40–45. 11. North Carolina State University Libraries, “Endeca at the NCSU Libraries,” www.lib.ncsu.edu/endeca (accessed Mar. 1, 2008); Hans Lund, Hans Lauridsen, and Jens Hofman Han- sen, “Summa—Integrated Search,” www.statsbiblioteket.dk/ publ/summaenglish.pdf (accessed Mar. 1, 2008); Falvey Memo- rial Library, Villanova University, “VuFind,” www.vufind.org (accessed Mar. 1, 2008). 12. See the Health Level Seven (HL7) Clinical Decision Sup- port working committee activities, in particular the Infobutton Standard Proposal at www.hl7.org/Special/committees/dss/ index.cfm and the NISO Metasearch Initiative documentation at www.niso.org/workrooms/mi (accessed Mar 1, 2008). 13. National Center for Biotechnology Information (NCBI) Entrez cross-database search, www.ncbi.nlm.nih.gov/Entrez (accessed Mar. 1, 2008). ACRL 5 ALCTS 15 LITA cover 2, cover 3 Jaunter cover 4 Index to Advertisers 3168 ---- 6 INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2009 Paul T. Jaeger and Zheng Yan One Law with Two Outcomes: Comparing the Implementation of CIPA in Public Libraries and Schools Though the Children’s Internet Protection Act (CIPA) established requirements for both public libraries and public schools to adopt filters on all of their computers when they receive certain federal funding, it has not attracted a great amount of research into the effects on libraries and schools and the users of these social insti- tutions. This paper explores the implications of CIPA in terms of its effects on public libraries and public schools, individually and in tandem. Drawing from both library and education research, the paper examines the legal background and basis of CIPA, the current state of Internet access and levels of filtering in public librar- ies and public schools, the perceived value of CIPA, the perceived consequences of CIPA, the differences in levels of implementation of CIPA in public libraries and public schools, and the reasons for those dramatic differences. After an analysis of these issues within the greater policy context, the paper suggests research questions to help provide more data about the challenges and questions revealed in this analysis. T he Children’s Internet Protection Act (CIPA) estab- lished requirements for both public libraries and public schools to—as a condition for receiving cer- tain federal funds—adopt filters on all of their computers to protect children from online content that was deemed potentially harmful.1 Passed in 2000, CIPA was initially implemented by public schools after its passage, but it was not widely implemented in public libraries until the 2003 Supreme Court decision (United States v. American Library Association) upholding the law’s constitutional- ity.2 Now that CIPA has been extensively implemented for five years in libraries and eight years in schools, it has had time to have significant effects on access to online information and services. While the goal of filter- ing requirements is to protect children from potentially inappropriate content, filtering also creates major edu- cational and social implications because filters also limit access to other kinds of information and create different perceptions about schools and libraries as social institu- tions. Curiously, CIPA and its requirements have not attracted a great amount of research into the effects on schools, libraries, and the users of these social institu- tions. Much of the literature about CIPA has focused on practical issues—either recommendations on implement- ing filters or stories of practical experiences with filtering. While those types of writing are valuable to practitioners who must deal with the consequences of filtering, there are major educational and societal issues raised by filter- ing that merit much greater exploration. While relatively small bodies of research have been generated about CIPA’s effects in public libraries and public schools,3 thus far these two strands of research have remained separate. But it is the contention of this paper that these two strands of research, when viewed together, have much more value for creating a broader understanding of the educational and societal implications. It would be impossible to see the real consequences of CIPA without the development of an integrative picture of its effects on both public schools and public libraries. In this paper, the implications of CIPA will be explored in terms of effects on public libraries and public schools, individually and in tandem. Public libraries and public schools are generally considered separate but related public sphere entities because both serve core educa- tional and information-provision functions in society. Furthermore, the fact that public schools also contain school library media centers highlights some very inter- esting points of intersection between public libraries and school libraries in terms of the consequences of CIPA: While CIPA requires filtering of computers throughout public libraries and public schools, the presence of school library media centers makes the connection between libraries and schools stronger, as do the teaching roles of public libraries (e.g., training classes, workshops, and evening classes). n The legal road to CIPA History Under CIPA, public libraries and public schools receiving certain kinds of federal funds are required to use filtering programs to protect children under the age of seventeen from harmful visual depictions on the Internet and to provide public notices and hearings to increase public awareness of Internet safety. Senator John McCain (R-AZ) sponsored CIPA, and it was signed into law by President Bill Clinton on December 21, 2000. CIPA requires that filters at public libraries and public schools block three specific types of content: (1) obscene material (that Paul T. Jaeger (pjaeger@umd.edu) is Assistant Professor at the College of Information Studies and Director of the Center for Information Policy and Electronic government of the University of Maryland in College Park. Zheng Yan (zyan@uamail.albany .edu) is Associate Professor at the Department of Educational and Counseling Psychology in the School of Education of the State University of New York at Albany. ONE LAw wITH TwO OuTCOMES | JAEGER AND YAN 7 which appeals to prurient interests only and is “offen- sive to community standards”); (2) child pornography (depictions of sexual conduct and or lewd exhibitionism involving minors); and (3) material that is harmful to minors (depictions of nudity and sexual activity that lack artistic, literary, or scientific value). CIPA focused on “the recipients of Internet transmission,” rather than the send- ers, in an attempt to avoid the constitutional issues that undermined the previous attempts to regulate Internet content.4 Using congressional authority under the spending clause of Article I, section 8 of the U.S. Constitution, CIPA ties the direct or indirect receipt of certain types of federal funds to the installation of filters on library and school computers. Therefore each public library and school that receives the applicable types of federal funding must implement filters on all computers in the library and school buildings, including computers that are exclusively for staff use. Libraries and schools had to address these issues very quickly because the Federal Communications Commission (FCC) mandated certifi- cation of compliance with CIPA by funding year 2004, which began in Summer 2004.5 CIPA requires that filters on computers block three specific types of content, and each of the three cat- egories of materials has a specific legal meaning. The first type—obscene materials—is statutorily defined as depicting sexual conduct that appeals only to prurient interests, is offensive to community standards, and lacks serious literary, artistic, political, or scientific value.6 Historically, obscene speech has been viewed as being bereft of any meaningful ideas or educational, social, or professional value to society.7 Statutes regulating speech as obscene have to do so very carefully and specifically, and speech can only be labeled obscene if the entire work is without merit.8 If speech has any educational, social, or professional importance, even for embody- ing controversial or unorthodox ideas, it is supposed to receive First Amendment protection.9 The second type of content—child pornography—is statutorily defined as depicting any form of sexual conduct or lewd exhi- bitionism involving minors.10 Both of these types of speech have a long history of being regulated and being considered as having no constitutional protections in the United States. The third type of content that must be filtered— material that is harmful to minors—encompasses a range of otherwise protected forms of speech. CIPA defines “harmful to minors” as including any depiction of nudity, sexual activity, or simulated sexual activity that has no serious literary, artistic, political, or scientific value to minors.11 The material that falls into this third category is constitutionally protected speech that encompasses any depiction of nudity, sexual activity, or simulated sexual activity that has serious literary, artistic, political, or scientific value to adults. Along with possibly includ- ing a range of materials related to literature, art, science, and policy, this third category may involve materials on issues vital to personal well-being such as safe sexual practices, sexual identity issues, and even general health care issues such as breast cancer. In addition to the filtering requirements, section 1731 also prescribes an Internet awareness strategy that public libraries and schools must adopt to address five major Internet safety issues related to minors. It requires librar- ies and schools to provide reasonable public notice and to hold at least one public hearing or meeting to address these Internet safety issues. Requirements for schools and libraries CIPA includes sections specifying two major strategies for protecting children online (mainly in sections 1711, 1712, 1721, and 1732) as well as sections describing vari- ous definitions and procedural issues for implementing the strategies (mainly in sections 1701, 1703, 1731, 1732, 1733, and 1741). Section 1711 specifies the primary Internet protec- tion strategy—filtering—in public schools. Specifically, it amends the Elementary and Secondary Education Act of 1965 by limiting funding availability for schools under section 254 of the Communication Act of 1934. Through a compliance certification process within a school under supervision by the local educational agency, it requires schools to include the operation of a technology protec- tion measure that protects students against access to visual depictions that are obscene, are child pornography, or are harmful to minors under the age of seventeen. Likewise, section 1712 specifies the same filtering strategy in public libraries. Specifically, it amends section 224 of the Museum and Library Service Act of 1996/2003 by limiting funding availability for libraries under sec- tion 254 of the Communication Act of 1934. Through a compliance certification process within a library under supervision by the Institute of Museum and Library Services (IMLS), it requires libraries to include the opera- tion of a technology protection measure that protects stu- dents against access to visual depictions that are obscene, child pornography, or harmful to minors under the age of seventeen. Section 1721 is a requirement for both libraries and schools to enforce the Internet safety policy with the Internet safety policy strategy and the filtering technol- ogy strategy as a condition of universal service discounts. Specifically, it amends section 254 of the Communication Act of 1934 and requests both schools and libraries to monitor the online activities of minors, operate a tech- nical protection measure, provide reasonable public notice, and hold at least one public hearing or meeting to address the Internet safety policy. This is through the 8 INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2009 certification process regulated by the FCC. Section 1732, titled the Neighborhood Children’s Internet Protection Act (NCIPA), amends section 254 of the Communication Act of 1934 and requires schools and libraries to adopt and implement an Internet safety policy. It specifies five types of Internet safety issues: (1) access by minors to inappropriate matter on the Internet; (2) safety and security of minors when using e-mail, chat rooms, and other online communications; (3) unauthor- ized access; (4) unauthorized disclosure, use, and dis- semination of personal information; and (5) measures to restrict access to harmful online materials. From the above summary, it is clear that (1) the two protection strategies of CIPA (the Internet filtering strat- egy and safety policy strategy) were equally enforced in both public schools and public libraries because they are two of the most important social institutions for children’s Internet safety; (2) the nature of the implementation mechanism is exactly the same, using the same federal funding mechanisms as the sole financial incentive (lim- iting funding availability for schools and libraries under section 254 of the Communication Act of 1934) through a compliance certification process to enforce the imple- mentation of CIPA; and (3) the actual implementation procedure differs in libraries and schools, with schools to be certified under the supervision of local educational agencies (such as school districts and state departments of education) and with libraries to be certified within a library under the supervision of the IMLS. Economics of CIPA The Universal Service program (commonly known as E–Rate) was established by the Telecommunications Act of 1996 to provide discounts, ranging from 20 to 90 percent, to libraries and schools for telecommunications services, Internet services, internal systems, and equip- ment.12 The program has been very successful, provid- ing approximately $2.25 billion dollars a year to public schools, public libraries, and public hospitals. The vast majority of E-Rate funding—about 90 percent—goes to public schools each year, with roughly 4 percent being awarded to public libraries and the remainder going to hospitals.13 The emphasis on funding schools results from the large number of public schools and the size- able computing needs of all of these schools. But even 4 percent of the E-Rate funding is quite substantial, with public libraries receiving more than $250 million between 2000 and 2003.14 Schools received about $12 billion in the same time period.15 Along with E-Rate funds, the Library Services and Technology Act (LSTA) program adminis- tered by the IMLS provides money to each state library agency to use on library programs and services in that state, though the amount of these funds is considerably lower than E-Rate funds. The American Library Association (ALA) has noted that the E-Rate program has been particularly significant in its role of expanding online access to students and to library patrons in both rural and underserved com- munities.16 In addition to the effect on libraries, E-Rate and LSTA funds have significantly affected the lives of individuals and communities. These programs have contributed to the increase in the availability of free public Internet access in schools and libraries. By 2001, more than 99 percent of public school libraries provided students with Internet access.17 By 2007, 99.7 percent of public library branches were connected to the Internet, and 99.1 percent of public library branches offered pub- lic Internet access.18 However, only a small portion of libraries and schools used filters prior to CIPA.19 Since the advent of computers in libraries, librarians typically had used informal monitoring practices for computer users to ensure that nothing age inappropriate or morally offensive was publicly visible.20 Some individual school and library systems, such as in Kansas and Indiana, even developed formal or informal statewide Internet safety strategies and approaches.21 why were only libraries and schools chosen to protect children’s online safety? While there are many social institutions that could have been the focus of CIPA, the law places the requirements specifically on public libraries and public schools. If Congress was so interested in protecting children from access to harmful Internet content, it seems that the law would be more expansive and focused on the content itself rather than filtering access to the content. However, earlier laws that attempted to regulate access to Internet content failed legal challenges specifically because they tried to regulate content. Prior to the enactment of CIPA, there were a num- ber of other proposed laws aimed at preventing minors from accessing inappropriate Internet content. The Communications Decency Act (CDA) of 1996 prohib- ited the sending or posting of obscene material through the Internet to individuals under the age of eighteen.22 However, the Supreme Court found the CDA to be unconstitutional, stating that the law violated free speech under the First Amendment. In 1998, Congress passed the Child Online Protection Act (COPA), which prohibited commercial websites from displaying material deemed harmful to minors and imposed criminal penalties on Internet violators.23 A three-panel judge for the District Court for the Eastern District of Pennsylvania ruled that COPA’s focus on “contemporary community standards” violated the First Amendment, and the panel subsequently imposed an ONE LAw wITH TwO OuTCOMES | JAEGER AND YAN 9 injunction on COPA’s enforcement. CIPA’s force comes from Congress’s power under the spending clause; that is, Congress can legally attach requirements to funds that it gives out. Since CIPA is based on economic persuasion—the potential loss of funds for technology—the law can only have an effect on recipients of those funds. While regulating Internet access in other venues like coffee shops, Internet cafés, bookstores, and even individual homes would provide a more comprehensive shield to limit children’s access to certain online content, these institutions could not be reached under the spending clause. As a result, the burdens of CIPA fall squarely on public libraries and public schools. n The current state of filtering when did CIPA actually come into effect in libraries and schools? After overcoming a series of legal challenges that were ultimately decided by the Supreme Court, CIPA came into effect in full force in 2003, though 96 percent of public schools were already in compliance with CIPA in 2001. When the Court upheld the constitutionality of CIPA, the legal challenge by public libraries centered on the way the statute was written.24 The Court’s deci- sion states that the wording of the law does not place unconstitutional limitations on free speech in public libraries. To continue receiving federal dollars directly or indirectly through certain federal programs, public libraries and schools were required to install filtering technologies on all computers. While the case decided by the Supreme Court focused on public libraries, the decision virtually precludes public schools from making the same or related challenges.25 Before that case was decided, however, most schools had already adopted filters to comply with CIPA. As a result of CIPA, a public library or public school must install technology protection measures, better known as filters, on all of its computers if it receives n E-Rate discounts for Internet access costs, n E–Rate discounts for internal connections costs, n LSTA funding for direct Internet costs,26 or n LSTA funding for purchasing technology to access the Internet. The requirements of CIPA extend to public libraries, public schools, and any library institution that receives LSTA and E–Rate funds as part of a system, including state library agencies and library consortia. As a result of the financial incentives to comply, almost 100 percent of public schools in the United States have implemented the requirements of CIPA,27 and approximately half of public libraries have done so.28 How many public schools have implemented CIPA? According to the latest report by the Department of Education (see table 1), by 2005, 100 percent of public schools had implemented both the Internet filtering strategy and safety policy strategy. In fact, in 2001 (the first year CIPA was in effect), 96 percent of schools had implemented CIPA, with 99 percent filtering by 2002. When compared to the percentage of all public schools with Internet access from 1994 to 2005, Internet access became nearly universal in schools between 1999 and 2000 (95 to 98 percent), and one can see that the Internet access percentage in 2001 was almost the same as the CIPA implementation percentage. According to the Department of Education, the above estimations are based on a survey of 1,205 elementary and secondary schools selected from 63,000 elementary schools and 21,000 secondary and combined schools.29 After reviewing the design and administration of the sur- vey, it can be concluded that these estimations should be considered valid and reliable and that CIPA was immedi- ately and consistently implemented in the majority of the public schools since 2001.30 How many public libraries have implemented CIPA? In 2002, 43.4 percent of public libraries were receiving E-Rate discounts, and 18.9 percent said they would not apply for E-Rate discounts if CIPA was upheld.31 Since the Supreme Court decision upholding CIPA, the num- ber of libraries complying with CIPA has increased, as Table 1. Implementation of CIPA in public schools Year 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2005 Access (%) 35 50 65 78 89 95 98 99 99 100 100 Filtering (%) 96 99 97 100 10 INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2009 have the number of libraries not applying for E-Rate funds to avoid complying with CIPA. However, unlike schools, there is no exact count of how many libraries have filtered Internet access. In many cases, the libraries themselves do not filter, but a state library, library con- sortium, or local or state government system of which they are a part filters access from beyond the walls of the library. In some of these cases, the library staff may not even be aware that such filtering is occurring. A number of state and local governments have also passed their own laws to encourage or require all libraries in the state to filter Internet access regardless of E-Rate or LSTA funds.32 In 2008, 38.2 percent of public libraries were filtering access within the library as a result of directly receiving E-Rate funding.33 Furthermore, 13.1 percent of libraries were receiving E-Rate funding as a part of another orga- nization, meaning that these libraries also would need to comply with CIPA’s requirements.34 As such, the number of public libraries filtering access is now at least 51.3 percent, but the number will likely be higher as a result of state and local laws requiring libraries to filter as well as other reasons libraries have implemented filters. In contrast, among libraries not receiving E-Rate funds, the number of libraries now not applying for E-Rate inten- tionally to avoid the CIPA requirements is 31.6 percent.35 While it is not possible to identify an exact number of public libraries that filter access, it is clear that libraries overall have far lower levels of filtering than the 100 per- cent of public schools that filter access. E-Rate and other program issues The administration of the E-Rate program has not occurred without controversy. Throughout the course of the program, many applicants for and recipients of the funding have found the program structure to be obtuse, the application process to be complicated and time con- suming, and the administration of the decision-making process to be slow.36 As a result, many schools and librar- ies find it difficult to plan ahead for budgeting purposes, not knowing how much funding they will receive or when they will receive it.37 There also have been larger difficulties for the program. Following revelations about the uses of some E-Rate awards, the FCC suspended the program from August to December 2004 to impose new accounting and spending rules for the funds, delaying the distribution of over $1 billion in funding to libraries and schools.38 News inves- tigations had discovered that certain school systems were using E-Rate funds to purchase more technology than they needed or could afford to maintain, and some school systems failed to ever use technology they had acquired.39 While the administration of the E-Rate program has been comparatively smooth since, the temporary suspension of the program caused serious short-term problems for, and left a sense of distrust of, the program among many recipients.40 Filtering issues During the 1990s, many types of software filtering prod- ucts became available to consumers, including server- side filtering products (using a list of server-selected blocked URLs that may or may not be disclosed to the user), client-side filtering (controlling the blocking of specific content with a user password), text-based content-analysis filtering (removing illicit content of a website using real-time analysis), monitoring and time- limiting technologies (tracking a child’s online activi- ties and limiting the amount of time he or she spends online), and age-verification systems (allowing access to webpages by passwords issued by a third party to an adult).41 But because filtering software companies make the decisions about how the products work, content and collection decisions for electronic resources in schools and public libraries have been taken out of the hands of librarians, teachers, and local communities and placed in the trust of proprietary software products.42 Some filtering programs also have specific political agendas, which many organizations that purchase them are not aware of.43 In a study of over one million pages, for every webpage blocked by a filter as advertised by the software vendor, one or more pages were blocked inappropriately, while many of the criteria used by the filtering products go beyond the criteria enumerated in CIPA.44 Filters have significant rates of inappropriately block- ing materials, meaning that filters misidentify harmless materials as suspect and prevent access to harmless items (e.g., one filter blocked access to the Declaration of Independence and the Constitution).45 Furthermore, when libraries install filters to comply with CIPA, in many instances the filters will frequently be blocking text as well as images, and (depending on the type of filter- ing product employed) filters may be blocking access to entire websites or even all the sites from certain Internet service providers. As such, the current state of filtering technology will create the practical effect of CIPA restrict- ing access to far more than just certain types of images in many schools and libraries.46 n Differences in the perceived value of CIPA and filtering Based on the available data, there clearly is a sizeable contrast in the levels of implementation of CIPA between ONE LAw wITH TwO OuTCOMES | JAEGER AND YAN 11 schools and libraries. This difference raises a number of questions: For what reasons has CIPA been much more widely implemented in schools? Is this issue mainly value driven, dollar driven, both, or neither in these two public institutions? Why are these two institutions so dif- ferent regarding CIPA implementation while they share many social and educational similarities? Reasons for nationwide full implementation in schools There are various reasons—from financial, population, social, and management issues to computer and Internet availability—that have driven the rapid and compre- hensive implementation of filters in public schools. First, public schools have to implement CIPA because of societal pressures and the lobbying of parents to ensure students’ Internet safety. Almost all users of computers in schools are minors, the most vulnerable groups for Internet crimes and child pornography. Public schools in America have been the focus of public attention and scru- tiny for years, and the political and social responsibility of public schools for children’s Internet safety is huge. As a result, society has decided these students should be most strongly protected, and CIPA was implemented immediately and most widely at schools. Second, in contrast to public libraries (which average slightly less than eleven computers per library outlet), the typical number of computers in public schools ranges from one hundred to five hundred, which are needed to meet the needs of students and teachers for daily learning and teaching. Since the number of computers is quite large, the financial incentives of E-Rate funding are substantial and critical to the operation of the schools. This situation provides administrators in schools and school districts with the incentive to make decisions to implement CIPA as quickly and extensively as possible. Furthermore, the amount of money that E-Rate provides for schools in terms of technology is astounding. As was noted earlier, schools received over $12 billion from 2000 to 2003 alone. Schools likely would not be able to provide the necessary computers for students and teachers with- out the E-Rate funds. Third, the actual implementation procedure differs in schools and libraries: Schools are certified under the supervision of the local educational agencies such as school districts and state departments of education; libraries are certified within a library organization under the supervision of the IMLS. In other words, the cer- tification process at schools is directly and effectively controlled by school districts and state departments of education, following the same fundamental values of protecting children. The resistance to CIPA in schools has been very small in comparison to libraries. The primary concern raised has been the issue of educational equality. Concerns have been raised that filters in schools may create two classes of students—ones with only filtered access at school and ones who also can get unfiltered access at home.47 Reasons for more limited implementation in libraries In public libraries, the reasons for implementing CIPA are similar to those of public schools in many ways. Public libraries provide an average of 10.7 computers in each of the approximately seven thousand public libraries in the United States, which is a lot of technology that needs to be supported. The E-Rate and LSTA funds are vital to many libraries in the provision of computers and the Internet. Furthermore, with limited alternative sources of funding, the E-Rate and LSTA funds are hard to replace if they are not available. Given that the public libraries have become the guarantor of public access to comput- ing and the Internet, libraries have to find ways to ensure that patrons can access the Internet.48 Libraries also have to be concerned about protect- ing and providing a safe environment for younger patrons. While libraries serve patrons of all ages, one of the key social expectations of libraries is the provision of educational materials for children and young adults. Children’s sections of libraries almost always have com- puters in them. Much of the content blocked by filters is of little or no education value. As such, “defending unfil- tered Internet access was quite different from defending Catcher in the Rye.”49 Nevertheless, many libraries have fought against the filtering requirements of CIPA because they believe that it violates the principles of librarianship or for a number of other reasons. In 2008, 31.6 percent of public libraries refused to apply for E-Rate or LSTA funds specifically to avoid CIPA requirements, a substantial increase from the 15.3 percent of libraries that did not apply for E-Rate because of CIPA in 2006.50 As a result of defending patron’s rights to free access, the libraries that are not applying for E-Rate funds because of the requirements of CIPA are being forced to turn down the chance for fund- ing to help pay for Internet access in order to preserve community access to the Internet. Because many librar- ies feel that they cannot apply for E-Rate funds, local and regional discrepancies are occurring in the levels of Internet access that are available to patrons of public libraries in different parts of the country.51 For adult patrons who wish to access material on computers with filters, CIPA states that the library has the option of disabling the filters for “bona fide research or other lawful purposes” when adult patrons request such disabling. The law does not require libraries to 12 INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2009 disable the filters for adult patrons, and the criteria for disabling of filters do not have a set definition in the law. The potential problems in the process of having the filters disabled are many and significant, including librarians not allowing the filters to be turned off, librarians not knowing how to turn the filters off, the filtering software being too complicated to turn off without injuring the performance of the workstation in other applications, or the filtering software being unable to be turned off in a reasonable amount of time.52 It has been estimated that approximately 11 million low-income individuals rely on public libraries to access online information because they lack Internet access at home or work.53 The E-Rate and LSTA programs have helped to make public libraries a trusted community source of Internet access, with the public library being the only source of free public Internet access available to all community residents in nearly 75 percent of communities in the United States.54 Therefore usage of computers and the Internet in public libraries has continued to grow at a very fast pace over the past ten years.55 Thus public librar- ies are torn between the values of providing safe access for younger patrons and broad access for adult patrons who may have no other means of accessing the Internet. n CIPA, public policy, and further research While the diverse implementations, effects, and levels of acceptance of CIPA across schools and libraries dem- onstrate the wide range of potential ramifications of the law, surprisingly little consideration is given to major assumptions in the law, including the appropriateness of the requirements to different age groups and the nature of information on the Internet. CIPA treats all users as if they are the same level of maturity and need the same level of protection as a small child, as evidenced by the require- ment that all computers in a library or school have filters regardless of whether children use a particular computer. In reality, children and adults interact in different social, physical, and cognitive ways with computers because of different developmental processes.56 CIPA fails to recognize that children as individual users are active processors of information and that children of different ages are going to be affected in divergent ways by filtering programs.57 Younger children benefit from more restrictive filters while older children benefit from less restrictive filters. Moreover, filtering can be compli- mented by encouragement of frequent positive Internet usage and informal instruction to encourage positive use. Finally, children of all ages need a better understanding of the structure of the Internet to encourage appropriate caution in terms of online safety. The Internet represents a new social and cultural environment in which users simultaneously are affected by the social environment and also construct that environment with other users.58 CIPA also is based on fundamental misconceptions about information on the Internet. The Supreme Court’s decision upholding CIPA represents several of these mis- conceptions, adopting an attitude that ‘we know what is best for you’ in terms of the information that citizens should be allowed to access.59 It assumes that schools and libraries select printed materials out of a desire to protect and censor rather than recognizing the basic reality that only a small number of print materials can be afforded by any school or library. The Internet frees schools and libraries from many of these costs. Furthermore, the Court assumes that libraries should censor the Internet as well, ultimately upholding the same level of access to information for adult patrons and librarians in public libraries as students in public schools. These two major unexamined assumptions in the law certainly have played a part in the difficulty of implementing CIPA and in the resistance to the law. And this does not even address the problems of assuming that public libraries and public schools can be treated interchangeably in crafting legislation. These problem- atic assumptions point to a significantly larger issue: In trying to deal with the new situations created by the Internet and related technology, the federal government has significantly increased the attention paid to informa- tion policy.60 Over the past few years, government laws and standards related to information have begun to more clearly relate to social aspects of information technolo- gies such as the filtering requirements of CIPA.61 But the social, economic, and political ramifications for decisions about information policy are often woefully underexam- ined in the development of legislation.62 This paper has documented that many of the reasons for and statistics about CIPA implementation are avail- able by bringing together information from different social institutions. The biggest questions about CIPA are about the societal effects of the policy decisions: n Has CIPA changed the education and information- provision roles of libraries and schools? n Has CIPA changed the social expectations for libraries and schools? n Have adult patron information behaviors changed in libraries? n Have minor patron information behaviors changed in libraries? n Have student information behaviors changed in school? n How has CIPA changed the management of librar- ies and schools? n Will Congress view CIPA as successful enough to merit using libraries and schools as the means of enforcing other legislation? ONE LAw wITH TwO OuTCOMES | JAEGER AND YAN 13 But these social and administrative concerns are not the only major research questions raised by the imple- mentation of CIPA. Future research about CIPA not only needs to focus on the individual, institutional, and social effects of the law. It must explore the lessons that CIPA can provide to the process of creating and implementing information policies with significant societal implications. The most significant research issues related to CIPA may be the ones that help illuminate how to improve the legislative process to better account for the potential consequences of regulating information while the legislation is still being developed. Such cross-disciplinary analyses would be of great value as information becomes the center of an increasing amount of legislation, and the effects of this legislation have continually wider consequences for the flow of information through society. It could also be of great benefit to public schools and libraries, which, if CIPA is any indication, may play a large role in future legislation about public Internet access. References 1. Children’s Internet Protection Act (CIPA), Public Law 106- 554. 2. United States v. American Library Association, 539 U.S. 154 (2003). 3. American Library Association, Libraries Connect Communi- ties: Public Library Funding & Technology Access Study 2007–2008 (Chicago: ALA, 2008); Paul T. Jaeger, John Carlo Bertot, and Charles R. McClure, “The Effects of the Children’s Internet Protection Act (CIPA) in Public Libraries and its Implications for Research: A Statistical, Policy, and Legal Analysis,” Journal of the American Society for Information Science and Technology 55, no. 13 (2004): 1131–39; Paul T. Jaeger et al., “Public Libraries and Internet Access Across the United States: A Comparison by State from 2004 to 2006,” Information Technology and Librar- ies 26, no. 2 (2007): 4–14; Paul T. Jaeger et al., “CIPA: Decisions, Implementation, and Impacts,” Public Libraries 44, no. 2 (2005): 105–9; Zheng Yan, “Limited Knowledge and Limited Resources: Children’s and Adolescents’ Understanding of the Internet,” Journal of Applied Developmental Psychology (forthcoming); Zheng Yan, “Differences in Basic Knowledge and Perceived Education of Internet Safety between High School and Undergraduate Students: Do High School Students Really Benefit from the Children’s Internet Protection Act?” Journal of Applied Develop- mental Psychology (forthcoming); Zheng Yan, “What Influences Children’s and Adolescents’ Understanding of the Complexity of the Internet?,” Developmental Psychology 42 (2006): 418–28. 4. Martha M. McCarthy, “Filtering the Internet: The Chil- dren’s Internet Protection Act,” Educational Horizons 82, no, 2 (Winter 2004): 108. 5. Federal Communications Commission, In the Matter of Federal–State Joint Board on Universal Service: Children’s Internet Protection Act, FCC order 03-188 (Washington, D.C.: 2003). 6. CIPA. 7. Roth v. United States, 354 U.S. 476 (1957). 8. Miller v. California, 413 U.S. 15 (1973). 9. Roth v. United States. 10. CIPA. 11. CIPA. 12. Telecommunications Act of 1996, Public Law 104-104 (Feb. 8, 1996). 13. Paul T. Jaeger, Charles R. McClure, and John Carlo Ber- tot, “The E-Rate Program and Libraries and Library Consortia, 2000–2004: Trends and Issues,” Information Technology & Libraries 24, no. 2 (2005): 57–67. 14. Ibid. 15. Ibid. 16. American Library Association, “U.S. Supreme Court Arguments on CIPA Expected in Late Winter or Early Spring,” press release, Nov. 13, 2002, www.ala.org/ala/aboutala/hqops/ pio/pressreleasesbucket/ussupremecourt.cfm (accessed May 19, 2008). 17. Kelly Rodden, “The Children’s Internet Protection Act in Public Schools: The Government Stepping on Parents’ Toes?” Fordham Law Review 71 (2003): 2141–75. 18. John Carlo Bertot, Paul T. Jaeger, and Charles R. McClure, “Public Libraries and the Internet 2007: Issues, Implications, and Expectations,” Library & Information Science Research 30 (2008): 175–184; Charles R. McClure, Paul T. Jaeger, and John Carlo Bertot, “The Looming Infrastructure Plateau?: Space, Funding, Connection Speed, and the Ability of Public Libraries to Meet the Demand for Free Internet Access,” First Monday 12, no. 12 (2007), www.uic.edu/htbin/cgiwrap/bin/ojs/index.php/fm/ article/view/2017/1907 (accessed May 19, 2008). 19. McCarthy, “Filtering the Internet.” 20. Leigh S. Estabrook and Edward Lakner, “Managing Inter- net Access: Results of a National Survey,” American Libraries 31, no. 8 (2000): 60–62. 21. Alberta Davis Comer, “Studying Indiana Public Librar- ies’ Usage of Internet Filters,” Computers in Libraries (June 2005): 10–15; Thomas M. Reddick, “Building and Running a Collabora- tive Internet Filter is Akin to a Kansas Barn Raising,” Computers in Libraries 20, no. 4 (2004): 10–14. 22. Communications Decency Act of 1996, Public Law 104-104 (Feb. 8, 1996). 23. Child Online Protection Act (COPA), Public Law 105-277 (Oct. 21, 1998). 24. United States v. American Library Association. 25. R. Trevor Hall and Ed Carter, “Examining the Constitu- tionality of Internet Filtering in Public Schools: A U.S. Perspec- tive,” Education & the Law 18, no. 4 (2006): 227–45; McCarthy “Filtering the Internet.” 26. Library Services and Technology Act, Public Law 104-208 (Sept. 30, 1996). 27. John Wells and Laurie Lewis, Internet Access in U.S. Public Schools and Classrooms: 1994–2005, special report prepared at the request of the National Center for Education Statistics, Nov. 2006. 28. American Library Association, Libraries Connect Commu- nities; John Carlo Bertot, Charles R. McClure, and Paul T. Jaeger, “The Impacts of Free Public Internet Access on Public Library Patrons and Communities,” Library Quarterly 78, no. 3 (2008): 285–301; Jaeger et al., “CIPA.” 29. Wells and Lewis, Internet Access in U.S. Public Schools and Classrooms. 14 INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2009 30. Ibid. 31. Jaeger, McClure, and Bertot, “The E-Rate Program and Libraries and Library Consortia.” 32. Jaeger et al., “CIPA.” 33. American Library Association, Libraries Connect Commu- nities. 34. Ibid. 35. Ibid. 36. Jaeger, McClure, and Bertot, “The E-Rate Program and Libraries and Library Consortia.” 37. Ibid. 38. Norman Oder, “$40 Million in E-Rate Funds Suspended: Delays Caused as FCC Requires New Accounting Standards,” Library Journal 129, no. 18 (2004): 16; Debra Lau Whelan, “E-Rate Funding Still Up in the Air: Schools, Libraries Left in the Dark about Discounted Funds for Internet Services,” School Library Journal 50, no. 11 (2004): 16. 39. Ken Foskett and Paul Donsky, “Hard Eye on City Schools’ Hardware,” Atlanta Journal-Constitution, May 25, 2004; Ken Fos- kett and Jeff Nesmith, “Wired for Waste: Abuses Tarnish E-rate Program,” Atlanta Journal-Constitution, May 24, 2004. 40. Jaeger, McClure, and Bertot, “The E-Rate Program and Libraries and Library Consortia.” 41. Department of Commerce, National Telecommunication and Information Administration, Children’s Internet Protection Act: Study of Technology Protection Measures in Section 1703, report to Congress (Washington, D.C.: 2003). 42. McCarthy, “Filtering the Internet.” 43. Paul T. Jaeger and Charles R. McClure, “Potential Legal Challenges to the Application of the Children’s Internet Protec- tion Act (CIPA) in Public Libraries: Strategies and Issues,” First Monday 9, no. 2 (2004), www.firstmonday.org/issues/issue9_2/ jaeger/index.html (accessed May 19, 2008). 44. Electronic Frontier Foundation, Internet Blocking in Public Schools (Washington, D.C.: 2004), http://w2.eff.org/Censor ship/Censorware/net_block_report (accessed May 19, 2008). 45. Adam Horowitz, “The Constitutionality of the Children’s Internet Protection Act,” St. Thomas Law Review 13, no. 1 (2000): 425–44. 46. Tanessa Cabe, “Regulation of Speech on the Internet: Fourth Time’s the Charm?” Media Law and Policy 11 (2002): 50–61; Adam Goldstein, “Like a Sieve: The Child Internet Pro- tection Act and Ineffective Filters in Libraries,” Fordham Intel- lectual Property, Media, and Entertainment Law Journal 12 (2002): 1187–1202; Horowitz, “The Constitutionality of the Children’s Internet Protection Act”; Marilyn J. Maloney and Julia Morgan, “Rock and A Hard Place: The Public Library’s Dilemma in Pro- viding Access to Legal Materials on the Internet While Restrict- ing Access to Illegal Materials,” Hamline Law Review 24, no. 2 (2001): 199–222; Mary Minow, “Filters and the Public Library: A Legal and Policy Analysis,” First Monday 2, no. 12 (1997), www .firstmonday.org/issues/issue2_12/minnow (accessed May 19, 2008); Richard J. Peltz, “Use ‘the Filter You Were Born with’: The Unconstitutionality of Mandatory Internet Filtering for Adult Patrons of Public Libraries,” Washington Law Review 77, no. 2 (2002): 397–479. 47. McCarthy, “Filtering the Internet.” 48. John Carlo Bertot et al., “Public Access Computing and Internet Access in Public Libraries: The Role of Public Libraries in E-Government and Emergency Situations,” First Monday 11, no. 9 (2006), www.firstmonday.org/issues/issue11_9/bertot (accessed May 19, 2008); John Carlo Bertot et al., “Drafted: I want You to Deliver E-Government,” Library Journal 131, no. 13 (2006): 34–39; Paul T. Jaeger and Kenneth R. Fleischmann, “Public Libraries, Values, Trust, and E-Government,” Informa- tion Technology and Libraries 26, no. 4 (2007): 35–43. 49. Doug Johnson, “Maintaining Intellectual Freedom in a Filtered World,” Learning & Leading with Technology 32, no. 8 (May 2005): 39. 50. Bertot, McClure, and Jaeger, “The Impacts of Free Public Internet Access on Public Library Patrons and Communities.” 51. Jaeger et al., “Public Libraries and Internet Access Across the United States.” 52. Paul T. Jaeger et al., “The Policy Implications of Internet Connectivity in Public Libraries,” Government Information Quar- terly 23, no. 1 (2006): 123–41. 53. Goldstein, “Like a Sieve.” 54. Bertot, McClure, and Jaeger, “The Impacts of Free Public Internet Access on Public Library Patrons and Communities”; Jaeger and Fleischmann, “Public Libraries, Values, Trust, and E-Government.“ 55. Bertot, Jaeger, and McClure, “Public Libraries and the Internet 2007”; Charles R. McClure et al., “Funding and Expen- ditures Related to Internet Access in Public Libraries,” Informa- tion Technology & Libraries (forthcoming). 56. Zheng Yan and Kurt W. Fischer, “How Children and Adults Learn to Use Computers: A Developmental Approach,” New Directions for Child and Adolescent Development 105 (2004): 41–61. 57. Zheng Yan, “Age Differences in Children’s Understand- ing of the Complexity of the Internet,” Journal of Applied Devel- opmental Psychology 26 (2005): 385–96; Yan, “Limited Knowledge and Limited Resources”; Yan, “Differences in Basic Knowledge and Perceived Education of Internet Safety”; Yan, “What Influ- ences Children’s and Adolescents’ Understanding of the Com- plexity of the Internet?” 58. Patricia Greenfield and Zheng Yan, “Children, Adoles- cents, and the Internet: A New Field of Inquiry in Developmen- tal Psychology,” Developmental Psychology 42 (2006): 391–93. 59. John N. Gathegi, “The Public Library as a Public Forum: The (De)Evolution of a Legal Doctrine,” Library Quarterly 75 (2005): 12. 60. Sandra Braman, “Where Has Media Policy Gone? Defin- ing the Field in the 21st Century,” Communication Law and Policy 9, no. 2 (2004): 153–82; Sandra Braman, Change of State: Informa- tion, Policy, & Power (Cambridge, Mass.: MIT Pr., 2007); Charles R. McClure and Paul T. Jaeger, “Government Information Policy Research: Importance, Approaches, and Realities,” Library & Information Science Research 30 (2008): 257–64; Milton Mueller, Christiane Page, and Brendan Kuerbis, “Civil Society and the Shaping of Communication-Information Policy: Four Decades of Advocacy,” Information Society 20, no. 3 (2004): 169–85. 61. Paul T. Jaeger, “Information Policy, Information Access, and Democratic Participation: The National and International Implications of the Bush Administration’s Information Politics,” Government Information Quarterly 24 (2007): 840–59. 62. McClure and Jaeger, “Government Information Policy Research.” 3169 ---- A SEMANTIC MODEL OF SELECTIvE DISSEMINATION OF INFORMATION | MORALES-DEL-CASTILLO ET AL. 21 A Semantic Model of Selective Dissemination of Information for Digital Libraries J. M. Morales-del-Castillo, R. Pedraza-Jiménez, A. A. Ruíz, E. Peis, and E. Herrera-Viedma In this paper we present the theoretical and methodo- logical foundations for the development of a multi-agent Selective Dissemination of Information (SDI) service model that applies Semantic Web technologies for spe- cialized digital libraries. These technologies make pos- sible achieving more efficient information management, improving agent–user communication processes, and facilitating accurate access to relevant resources. Other tools used are fuzzy linguistic modelling techniques (which make possible easing the interaction between users and system) and natural language processing (NLP) techniques for semiautomatic thesaurus genera- tion. Also, RSS feeds are used as “current awareness bul- letins” to generate personalized bibliographic alerts. N owadays, one of the main challenges faced by information systems at libraries or on the Web is to efficiently manage the large number of docu- ments they hold. Information systems make it easier to give users access to relevant resources that satisfy their information needs, but a problem emerges when the user has a high degree of specialization and requires very specific resources, as in the case of researchers.1 In “tra- ditional” physical libraries, several procedures have been proposed to try to mitigate this issue, including the selec- tive dissemination of information (SDI) service model that make it possible to offer users potentially interesting documents by accessing users’ personal profiles kept by the library. Nevertheless, the progressive incorporation of new information and communication technologies (ICTs) to information services, the widespread use of the Internet, and the diversification of resources that can be accessed through the Web has led libraries through a process of reinvention and transformation to become “digital” libraries.2 This reengineering process requires a deep revision of work techniques and methods so librarians can adapt to the new work environment and improve the services provided. In this paper we present a recommendation and SDI model, implemented as a service of a specialized digital library (in this case, specialized in library and informa- tion science), that can increase the accuracy of accessing information and the satisfaction of users’ information needs on the Web. This model is built on a multi-agent framework, similar to the one proposed by Herrera-Viedma, Peis, and Morales-del-Castillo,3 that applies Semantic Web technologies within the specific domain of special- ized digital libraries in order to achieve more efficient information management (by semantically enriching dif- ferent elements of the system) and improved agent–agent and user–agent communication processes. Furthermore, the model uses fuzzy linguistic model- ling techniques to facilitate the user–system interaction and to allow a higher grade of automation in certain procedures. To increase improved automation, some natural language processing (NLP) techniques are used to create a system thesaurus and other auxiliary tools for the definition of formal representations of information resources. In the next section, “Instrumental basis,” we briefly analyze SDI services and several techniques involved in the Semantic Web project, and we describe the prelimi- nary methodological and instrumental bases that we used for developing the model, such as fuzzy linguistic model- ling techniques and tools for NLP. In “Semantic SDI serv- ice model for digital libraries,” the bulk of this work, the application model that we propose is presented. Finally, to sum up, some conclusive data are highlighted. n Instrumental basis Filtering techniques for SDI services Filtering and recommendation services are based on the application of different process-management techniques that are oriented toward providing the user exactly the information that meets his or her needs or can be of his or her interest. In textual domains, these services are usu- ally developed using multi-agent systems, whose main aims are n to evaluate and filter resources normally repre- sented in XML or HTML format; and n to assist people in the process of searching for and retrieving resources.4 J. M. Morales-del-Castillo (josemdc@ugr.es) is Assistant Professor of Information Science, Library and Information Science Department, University of granada, Spain. R. Pedraza- Jiménez (rafael.pedraza@upf.edu) is Assistant Professor of Information Science, Journalism and Audiovisual Communication Department, Pompeu Fabra University, Barcelona, Spain. A. A. Ruíz (aangel@ugr.es) is Full Professor of Information Science, Library and Information Science Department, University of granada. E. Peis (epeis@ugr.es) is Full Professor of Information Science, Library and Information Science Department, University of granada. E. Herrera-viedma (viedma@decsai.ugr.es) is Senior Lecturer in Computer Science, Computer Science and Artificial Intelligence Department, University of granada. 22 INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2009 Traditionally, these systems are classified as either content-based recommendation systems or collaborative recommendation systems.5 Content-based recommen- dation systems filter information and generate recom- mendations by comparing a set of keywords defined by the user with the terms used to represent the content of documents, ignoring any information given by other users. By contrast, collaborative filtering systems use the information provided by several users to recommend documents to a given user, ignoring the representation of a document’s content. It is common to group users into different categories or stereotypes that are characterized by a series of rules and preferences, defined by default, that represent the information needs and common behav- ioural habits of a group of related users. The current trend is to develop hybrids that make the most of content-based and collaborative recommendation systems. In the field of libraries, these services usually adopt the form of SDI services that, depending on the profile of subscribed users, periodically (or when required by the user) generate a series of information alerts that describe the resources in the library that fit a user’s interests.6 SDI services have been studied in different research areas, such as the multi-agent systems development domain,7 and, of course, the digital libraries domain.8 Presently, many SDI services are implemented on Web platforms based on a multi-agent architecture where there is a set of intermediate agents that compare users’ profiles with the documents, and there are input-output agents that deal with subscriptions to the service and display generated alerts to users.9 Usually, the information is struc- tured according to a certain data model, and users’ profiles are defined using a series of keywords that are compared to descriptors or the full text of the documents. Despite their usefulness, these services have some deficiencies: n The communication processes between agents, and between agents and users, are hindered by the dif- ferent ways in which information is represented. n This heterogeneity in the representation of infor- mation makes it impossible to reuse such informa- tion in other processes or applications. A possible solution to these deficiencies consists of enriching the information representation using a common vocabulary and data model that are understandable by humans as well as by software agents. The Semantic Web project takes this idea and provides the means to develop a universal platform for the exchange of information.10 Semantic web technologies The Semantic Web project tries to extend the model of the present Web by using a series of standard languages that enable enriching the description of Web resources and make them semantically accessible.11 To do that, the project basis itself on two fundamental ideas: (1) resources should be tagged semantically so that informa- tion can be understood both by humans and comput- ers, and (2) intelligent agents should be developed that are capable of operating at a semantic level with those resources and that infer new knowledge from them (shift- ing from the search of keywords in a text to the retrieval of concepts).12 The semantic backbone of the project is the Resource Description Framework (RDF) vocabulary, which pro- vides a data model to represent, exchange, link, add, and reuse structured metadata of distributed information sources, thereby making them directly understandable by software agents.13 RDF structures the information into individual assertions (e.g., “resource,” “property,” and “property value triples”) and uniquely character- izes resources by means of Uniform Resource Identifiers (URIs), allowing agents to make inferences about them using Web ontologies or other, simpler semantic struc- tures, such as conceptual schemes or thesauri.14 Even though the adoption of the Semantic Web and its application to systems like digital libraries is not free from trouble (because of the nature of the technologies involved in the project and because of the project’s ambi- tious objectives,15 among other reasons), the way these technologies represent the information is a significant improvement over the quality of the resources retrieved by search engines, and it also allows the preservation of platform independence, thus favouring the exchange and reuse of contents.16 As we can see, the Semantic Web works with infor- mation written in natural language that is structured in a way that can be interpreted by machines. For this reason, it is usually difficult to deal with problems that require operating with linguistic information that has a certain degree of uncertainty (e.g., when quantifying the user’s satisfaction in relation to a product or service). A possible solution could be the use of fuzzy linguistic modelling techniques as a tool for improving system–user commu- nication. Fuzzy linguistic modelling Fuzzy linguistic modelling supplies a set of approxi- mate techniques appropriate for dealing with qualitative aspects of problems.17 The ordinal linguistic approach is defined according to a finite set of tags (S) completely ordered and with odd cardinality (seven or nine tags): { }{ }T,=Hi,s=S i …∈ 0, The central term has a value of approximately 0.5, and the rest of the terms are arranged symmetrically around A SEMANTIC MODEL OF SELECTIvE DISSEMINATION OF INFORMATION | MORALES-DEL-CASTILLO ET AL. 23 it. The semantics of each linguistic term is given by the ordered structure of the set of terms, considering that each linguistic term of the pair (si, sT-i) is equally informative. Each label si is assigned a fuzzy value defined in the inter- val [0,1] that is described by a linear trapezoidal property function represented by the 4-tupla (ai, bi, αi, βi). (The two first parameters show the interval where the property value is 1.0; the third and fourth parameters show the left and right limits of the distribution.) Additionally, we need to define the following properties: 1.–The set is ordered: si ≥ sj if i ≥ j. 2.–There is the negation operator: Neg(si ) = sj, with j = T - i. 3.–Maximization operator: MAX(si, sj) = si if si ≥ sj. 4.–Minimization operator: MIN(si, sj) = si if si ≤ sj. It also is necessary to define aggregation operators, such as Linguistic Weighted Averaging (LWA),18 capable of and operating with and combining linguistic information. Focusing on facilitating the interaction between users and system, the other starting objective is to achieve the development and implementation of the model proposed in the most automated way possible. To do this, we use a basic auxiliary tool—a thesaurus—that, among other tasks, assists users in the creation of their profile and ena- bles automating the alerts generation. That is why it is critical to define the way in which we create this tool, and in this work we propose a specific method for the semiautomatic development of thesauri using NLP techniques. NLP techniques and other automating tools NLP consists of a series of linguistic techniques, statistic approaches, and machine learning algorithms (mainly clustering techniques) that can be used, for example, to summarize texts in an automatic way, to develop automatic translators, and to create voice recognition software. Another possible application of NLP would be the semiautomatic construction of thesauri using different techniques. One of them consists of determining the lexical relations between the terms of a text (mainly syn- onymy, hyponymy, and hyperonymy),19 and extracting terms that are more representative for the text’s specific domain.20 It is possible to elicit these relations by using linguistic tools, like Princeton’s WordNet (http://wordnet .princeton.edu) and clustering techniques. WordNet is a powerful multilanguage lexical data- base where each one of its entries is defined, among other elements, by their synonyms (synsets), hyponyms, and hyperonyms.21 As a consequence, once given the most important terms of a domain, WordNet can be used to create from them a thesaurus (after leaving out all terms that have not been identified as belonging or related to the domain of interest).22 This tool can also be used with clustering tech- niques—for example, to group documents of a collection in a set of nodes or clusters, depending on their similarity. Each of these clusters is described by the most representa- tive terms of their documents. These terms make up the most specific level of a thesaurus and are used to search in WordNet for their synonyms and most general terms, contributing (with the repetition of this procedure) to the bottom-up-development process of the thesaurus.23 Although there are many others, these are some of the most well-known techniques of semiautomatic thesau- rus generation (semiautomatic because, needless to say, the supervision of experts is necessary to determine the validity of the final result). For specialized digital libraries, we propose develop- ing, on a multi-agent platform and using all these tools, SDI services capable of generating alerts and recommendations for users according to their personal profiles. In particular, the model presented here is the result of several previous models merging, and its service is based on the definition of “current-awareness bulletins,” where users can find a basic description of the resources recently acquired by the library or those that might be of interest to them.24 n The Semantic SDI service model for digital libraries The SDI service includes two agents (an interface agent and a task agent) distributed in a four-level hierarchi- cal architecture: user level, interface level, task level and resource level. Its main components are a repository of full-text doc- uments (which make up the stock of the digital library) and a series of elements described using different RDF- based vocabularies: one or several RSS feeds that play a role similar to that of current-awareness bulletins in traditional libraries; a repository of recommendation log files that store the recommendations made by users about the resources, and a thesaurus that lists and hierarchi- cally relates the most relevant terms of the specialization domain of the library.25 Also, the semantics of each ele- ment (that is, its characteristics and the relations the ele- ment establishes with other elements in the system) are defined in a Web ontology developed in Web Ontology Language (OWL).26 Next, we describe these main elements as well as the different functional modules that the system uses to carry out its activity. Elements of the model There are four basic elements that make up the system: 24 INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2009 the thesaurus, user profiles, RSS feeds, and recommenda- tion log files. Thesaurus An essential element of this SDI service is the thesau- rus, an extensible tool used in traditional libraries that enables organizing the most relevant concepts in a specific domain, defining the semantic relations estab- lished between them, such as equivalence, hierarchical, and associative relations. The functions defined for the thesaurus in our system include helping in the indexing of RSS feeds items and in the generation of information alerts and recommendations. To create the thesaurus, we followed the method suggested by Pedraza-Jiménez, Valverde-Albacete, and Navia-Vázquez.27 The learning technique used for the creation of a the- saurus includes four phases: preprocessing of documents, parameterizing the selected terms, conceptualizing their lexical stems, and generating a lattice or graph that shows the relation between the identified concepts. Essentially, the aim of the preprocessing phase is to prepare the documents’ parameterization by removing elements regarded as superfluous. We have developed this phase in three stages: eliminating tags (stripping), standardizing, and stemming. In the first stage, all the tags (HTML, XML, etc.) that can appear in the collection of documents are eliminated. The second stage is the standardization of the words in the documents in order to facilitate and improve the parameterization process. At this stage, the acronyms and N-grams (bigrams and trigrams) that appear in the documents are identified using lists that were created for that purpose. Once we have detected the acronyms and N-grams, the rest of the text is standardized. Dates and numeri- cal quantities are standardized, being substituted with a script that identifies them. All the terms (except acro- nyms) are changed to small letters, and punctuation marks are removed. Finally, a list of function words is used to eliminate from the texts articles, determiners, auxiliary verbs, conjunctions, prepositions, pronouns, interjections, contractions, and grade adverbs. All the terms are stemmed to facilitate the search of the final terms and to improve their calculation during parameterization. To carry out this task, we have used Morphy, the stemming algorithm used by WordNet. This algorithm implements a group of functions that check whether a term is an exception that does not need to be stemmed and then convert words that are not exceptions to their basic lexical form. Those terms that appear in the documents but are not identified by Morphy are elimi- nated from our experiment. The parameterization phase has a minimum complex- ity. Once identified, the final terms (roots or bases) are quantified by being assigned a weight. Such weight is obtained by the application of the scheme term frequency- inverse document frequency (tf-idf), a statistic measure that makes possible the quantification of the importance of a term or N-gram in a document depending on its fre- quency of appearance and in the collection the document belongs to. Finally, once the documents have been parameter- ized, the associated meanings of each term (lemma) are extracted by searching for them in WordNet (specifically, we use WordNet 2.1 for UNIX-like systems). Thus we get the group of synsets associated with each word. The group of hyperonyms and hyponyms also are extracted from the vocabulary of the analyzed collection of documents. The generation of our thesaurus—that is, the identifi- cation of descriptors that better represent the content of documents, and the identification of the underlying rela- tions between them—is achieved using formal concept analysis techniques. This categorization technique uses the theory of lat- tices and ordered sets to find abstraction relations from the groups it generates. Furthermore, this technique ena- bles clustering the documents depending on the terms (and synonyms) it contains. Also, a lattice graph is gener- ated according to the underlying relations between the terms of the collection, taking into account the hypero- nyms and hyponyms extracted. In that graph, each node represents a descriptor (namely, a group of synonym terms) and clusters the set of documents that contain it, linking them to those with which it has any relation (of hyponymy or hyperonymy). Once the thesaurus is obtained by identifying its terms and the underlying relations between them, it is automatically represented using the Simple Knowledge Organization System (SKOS) vocabulary (see figure 1).28 user profiles User profiles can be defined as structured representations that contain personal data, interests, and preferences of users with which agents can operate to customize the SDI service. In the model proposed here, these profiles are basically defined with Friend of a Friend (FOAF), a specific RDF/XML for describing people (which favours the profile interoperability, since this is a widespread vocabulary supported by an OWL ontology) and another nonstandard vocabulary of our own to define fields not included in FOAF (see figure 2).29 Profiles are generated the moment the user is regis- tered in the system, and they are structured in two parts: a public profile that includes data related to the user’s identity and affiliation, and a private profile that includes the user’s interests and preferences about the topic of the alerts he or she wishes to receive. To define their preferences, users must specify key- words and concepts that best define their information A SEMANTIC MODEL OF SELECTIvE DISSEMINATION OF INFORMATION | MORALES-DEL-CASTILLO ET AL. 25 needs. Later, the system compares those concepts with the terms in the thesaurus using as a similarity measure the edit tree algorithm.30 This function matches character strings, then returns the term introduced (if there’s an exact match) or the lexically most similar term (if not). Consequently, if the suggested term satisfies user expectations, it will be added to the user’s profile together with its synonyms (if any). In those cases where the suggested term is not satisfactory, the system must have any tool or application that enables users to browse the thesaurus and select terms that bet- ter describe their needs. An exam- ple of this type of applications is ThManager (http://thmanager .sourceforge.net), a project of the Universidad de Zaragoza, Spain, that enables editing, visualiz- ing, and going through structures defined in SKOS. Each of the terms selected by the user to define his or her areas of interest has an associated lin- guistic frequency value (tagged as <freq>) that we call “satisfaction frequency.” It represents the regular- ity with which a particular prefer- ence value has been used in alerts positively evaluated by the user. This frequency measures the relative importance of the preferences stated by the user and allows the interface agent to generate a ranking list of results. The range of possible values for these frequencies is defined by a group of seven labels that we get from the fuzzy linguistic variable “Frequency,” whose expression domain is defined by the linguis- tic term set S = {always, almost_ always, often, occasionally, rarely, almost_never, never}, being the default value and “occasionally” being the central value. RSS feeds Thanks to the popularization of blogs, there has been wide- spread use of several vocabular- ies specifically designed for the syndication of contents (that is, for making accessible to other Internet users the content of a website by means of hyperlink lists called “feeds”). To create our current-awareness bulletin we use RSS 1.0, a vocabulary that enables managing hyperlinks lists in an easy and flexible way. It utilizes the RDF/XML syntax and data model and is easily extensible because of the use of <skos:Concept rdf:about=”7”> <skos:inScheme rdf:resource=”http://www.ugr.es/…/thes/”/> <skos:prefLabel xml:lang=”es”>Proceedings</skos:prefLabel> <skos:broader rdf:resource=”http://www.ugr.es/…/thes/668”/> <skos:narrower rdf:resource=”http://www.ugr.es/…/thes/286”/> <skos:narrower rdf:resource=”http://www.ugr.es/…/thes/830”/> </skos:Concept> Figure 1. Sample entry of a SKOS Core thesaurus <foaf:PersonalProfileDocument rdf:about=””> <foaf:maker rdf:resource=”#person”/> <foaf:primaryTopic rdf:resource=”#person”/> </foaf:PersonalProfileDocument> <foaf:Person rdf:ID=”user_09234”> <foaf:name>Diego Allione</foaf:name> <foaf:title>Sr.</foaf:title> <foaf:mbox_sha1sum>af9fa7601df46e95566</foaf:mbox_sha1sum> <foaf:homepage rdf:resource=”http://allione.org”/> <foaf:depiction rdf:resource=”allione.jpg”/> <foaf:phone rdf:resource=”tel:555-432-432”/> <dfss:topic> <dfss:pref rdf:nodeID=”pref_09234-1”> <rdfs:label>Library management</rdfs:label> <dfss:relev>0.83</dfss:relev> </dfss:pref> </dfss:topic> </foaf:Person> Figure 2. User profile sample 26 INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2009 modules that enable extending the vocabulary without modi- fying its core each time new describing elements are added. In this model several modules are used: the Dublin Core (DC) module to define the basic bib- liographic information of the items utilizing the elements established by the Dublin Core Metadata Initiative (http:// dublincore.org); the syndica- tion module to facilitate soft- ware agents synchronizing and updating RSS feeds; and the taxonomy module to assign topics to feeds items. The structure of the feeds comprises two areas: one where the channel itself is described by a series of basic metadata like a title, a brief description of the content, and the updating frequency; and another where the descriptions of the items that make up the feed (see figure 3) are defined (including elements such as title, author, sum- mary, hyperlink to the primary resource, date of creation, and subjects). Recommendation log file Each document in the repository has an associated recommendation log file in RDF that includes the listing of evaluations assigned to that resource by different users since the resource was added to the system. Each of the entries of the recom- mendation log files consists of a recommendation value, a URI that identifies the user that has done the recommendation, and the date of the record (see figure 4). The expression domain of the rec- ommendations is defined by the following set of five fuzzy linguistic labels that are extracted from the linguistic variable “Quality of the resource”: Q = {Very_low, Low, Medium, High, Very_high}. These elements represent the raw materials for the SDI service that enable it to develop its activity through four processes or functional modules: the pro- files updating process, RSS feeds generation process, alert generation process, and collaborative recommen- dation process. System processes Profiles updating process Since the SDI service’s functions are based on generating passive searches to RSS feeds from the preferences stored <recomm-log rdf:ID=”log-00528”> <doc rdf:resource=”http://doc.es/doc-0A15”/> <items_e> <item rdf:nodeID=”item-000A901”> <user rdf:resource=”http://user.es/001”/> <date>14/03/2007</date> <recomm>High</recomm> </item> </ítems_e> </recomm-log> Figure 4. Recommendation log file sample <item rdf:about=”http://www.ugr.es/…/doc-00000528”> <dc:creator>Escudero Sánchez, Manuel</dc:creator> <dc:creator>Fernández Cáceres, José Luis</dc:creator> <title>Broadcasting and the Internet http://eprints.rclis.org/…/AudioVideo_good.pdf This paper is about… 2002 REDOC, 8 (4), 2008 Virual communities Figure 3. RSS feed item sample in a user’s profile, updating the profiles becomes a critical task. User profiles are meant to store long-term prefer- ences, but the system must be able to detect any subtle change in these preferences over time to offer accurate recommendations. In our model, user profiles are updated using a simple mechanism that enables finding users’ implicit preferences by applying fuzzy linguistic techniques and taking into account the feedback users provide. Users are asked about their satisfaction degree (ej) in relation to the informa- tion alert generated by the system (i.e., whether the items A SEMANTIC MODEL OF SELECTIvE DISSEMINATION OF INFORMATION | MORALES-DEL-CASTILLO ET AL. 27 retrieved are interesting or not). This satisfaction degree is obtained from the linguistic variable “Satisfaction,” whose expression domain is the set of five linguistic labels: S’ = {Total, Very_high, High, Medium, Low, Very_low, Null}. This mechanism updates the satisfaction frequency associated with each user preference according to the satisfaction degree ej. It requires the use of a matching function similar to those used to model threshold weights in weighted search queries.31 The function proposed here rewards the frequencies associated with the preference val- ues present when resources assessed are satisfactory, and it penalizes them when this assessment is negative. Let ej { }T,=Hba,|Ss,s ba 0,...∈∈ S’ be the degree of satisfaction, and f j i l { }T,=Hba,|Ss,s ba 0,...∈∈ S the frequency of property i (in this case i = “Preference”) with value l, then we define the updating function g as S’x S→S: { } { } ( ) {=f,eg s