1764 ---- eDitoriAl AND tecHNoloGicAl WorkFloW tools to proMote WeBsite QuAlitY | MortoN-oWeNs 91 Emily G. Morton-Owens Editorial and Technological Workflow Tools to Promote Website Quality Everard and Galletta performed an experimental study with 232 university students to discover whether website flaws affected perception of site quality and trust. Their three types of flaws were incompleteness, language errors (such as spelling mistakes), and poor style in terms of “ambiance and aesthetics,” including readable format- ting of text. They discovered that subjects’ perception of flaws influenced their judgment of a site being high- quality and trustworthy. Further, they found that the first perceived error had a greater negative impact than addi- tional problems did, and they described website users as “quite critical, negative, and unforgiving.”5 Briggs et al. did two studies of users’ likelihood of accepting advice presented on a website. Of the three factors they considered—credibility, personalization, and predictability—credibility was the most influential in pre- dicting whether users would accept or reject the advice. “It is clear,” they report, “that the look and feel of a web site is paramount in first attracting the attention of a user and signaling the trustworthiness of the site. The site should be . . . free of errors and clutter.”6 Though none of these studies focuses on libraries or academic websites and though they use various metrics of trustworthiness, together they point to the importance of quality. Text quality and functional usability should be important to library website managers. Libraries ask users to entrust them to choose resources, answer questions, and provide research advice, so projecting competence and trustworthiness is essential. It is a challenge to balance the concern for quality with the desire to update the website frequently and with librarians’ workloads. This paper describes a solu- tion implemented in Drupal that promotes participation while maintaining quality. The editorial system described draws on the author’s prior experience working in book publishing at Penguin and Random House, showing how a system that ensures quality in print publishing can be adjusted to fit the needs of websites. ■■ Setting editing Most people think of editing in terms of improving the correctness of a document: fixing spelling or punctua- tion errors, fact-checking, and so forth. These factors are probably the most salient ones in the sense that they are Editor’s Note: This paper is adapted from a presen- tation given at the 2010 LITA Forum Library websites are an increasingly visible representa- tion of the library as an institution, which makes website quality an important way to communicate competence and trustworthiness to users. A website editorial work- flow is one way to enforce a process and ensure quality. In a workflow, users receive roles, like author or editor, and content travels through various stages in which grammar, spelling, tone, and format are checked. One library used a workflow system to involve librarians in the creation of content. This system, implemented in Drupal, an open- source content management system, solved problems of coordination, quality, and comprehensiveness that existed on the library’s earlier, static website. T oday, libraries can treat their websites as a significant point of user contact and as a way of compensating for decreases in traditional measures of library use, like gate counts and circulation.1 Websites offer more than just a gateway to journals; librarians also can consider instructional or explanatory webpages as a type of public service interaction.2 As users flock to the web to access elec- tronic resources and services, a library’s website becomes an increasingly prominent representation of the library. At the New York University Health Sciences Libraries (NYUHSL), for example, statistics for the 2009–10 aca- demic year showed 580,980 in-person visits for all five locations combined. By comparison, the website received 986,922 visits. In other words, the libraries received 70 percent more website visits than in-person visits. Many libraries conduct usability testing to determine whether their websites meet the functional needs of their users. A concern related to usability is quality: users form an impression of the library partly based on how it pres- ents itself via the website. As several studies outside the library arena have shown, users’ experience of a website leads them to attribute characteristics of competence and trustworthiness to the sponsoring organization. Tseng and Fogg, discussing non-web computer sys- tems, present “surface credibility” as one of the types of credibility affecting users. They suggest that “small computer errors have disproportionately large effects on perceptions of credibility.”3 In another paper by Fogg et al., “amateurism” is one of seven factors in a study of website credibility. The authors recommend that “organizations that care about credibility should be ever vigilant—and perhaps obsessive—to avoid small glitches in their websites. . . . Even one typographical error or a single broken link is damaging.”4 emily G. Morton-owens (emily.morton-owens@med.nyu.edu) is Web services librarian, new york university health sciences libraries, new york. 92 iNForMAtioN tecHNoloGY AND liBrAries | septeMBer 2011 happens when a page moves from one state to another. The very simple workflow in figure 1 shows two roles (author and editor) and three states (draft, approval, and published). There are two transitions with permissions attached to them. Only the author can decide when he or she is done working and make the transition from draft to approval. Only the editor can decide when the page is ready and make the transition from approval to pub- lished. (In these figures, dotted borders indicate states in which the con- tent is not visible to the public.) A book publishing workflow involves per- haps a dozen steps in which the manuscript passes between the author, his or her agent, and various editorial staff. A year can pass between receiving the manuscript and publishing the book. The reason for that careful, conservative process is that it is very difficult to fix a book once thousands of copies have been printed in hardcover. By contrast, con- sider a newspaper: a new version appears every day and contains corrections from previous editions. A newspaper workflow is hardly going to take a full year. A website is even more flexible than a newspaper because it can be fixed or improved at any time. The kind of multistep process used for books and newspapers is effective, but not practical for websites. A website should have a workflow for editorial quality con- trol, but it should be proportional to the format in terms of the number of steps, the length of the process, and the number of people involved. Alternate Workflow Models This paper focuses on a contributor/editor model in which multiple authors create material that is vetted by a central authority: the editor. Other models could be implemented with much the same tools. For example, in a peer-review system as is used for academic journals, there is a reviewer role, and an article could have states like “published,” “under review,” “con- ditionally accepted,” and so forth. most noticeable when neglected. Editors, however, have several other important roles. For example, they select what will be published. In book publishing, that involves rejecting the vast majority of material that is submitted. In many professional contexts, however, it means soliciting contributions and encouraging authors. Either way, the editor has a role in deciding what topics are relevant and what authors should be involved. Additionally, editors are often involved in presenting their products to audi- ences. In book publishing, that can mean weighing in on jacket designs or soliciting blurbs from popular authors. On websites, it might mean choosing templates or fonts. Editors want to make materials attractive and accessible to the right audience. Together, correctness, choice, and pre- sentation are the main concerns of an editor and together contribute to quality. Each of these ideas can be considered in light of library websites. Correctness means offering information that is current and free of errors, contradictions, and confusing omissions. It also means representing the organization well by having text that is well written and appropriate for the audience. Writing for the web is a special skill; people reading from screens have a tendency to skim, so text should be edited to be concise and preferably orga- nized into short chunks with “visible structure.”7 There is also good guidance available about using meaningful link words, action phrases, and “layering” to limit the amount of information presented at once.8 Of course, correctness also means avoiding the kind of obvious spelling and grammar mistakes that users find so detrimental. Choice probably will not involve rejecting submissions to the website. Instead, in a library context it could mean identifying information that should appear on the web- site and writing or soliciting content to answer that need. Presentation may or may not have a marketing aspect. A public library’s website may advertise events and emphasize community participation. As an academic medical library, NYUHSL has in some sense a captive audience, but it is still important to communicate to users that librarians understand their unique and high- level information needs and are qualified to partner with them. Workflow A workflow is a way to assign responsibility for achieving the goals of correctness, choice, and presentation. It breaks the process down into steps that ensure the appropriate people review the material. It also leaves a paper trail that allows participants to see the history and status of material. Workflow can alleviate the coordination problems that pre- vent a website from exhibiting the quality it should. A workflow is composed of states, roles, and transitions. Pages have states (like “draft” or “published”) and users have roles (like “contributor” or “editor”). A transition Figure 1. Very Basic Workflow eDitoriAl AND tecHNoloGicAl WorkFloW tools to proMote WeBsite QuAlitY | MortoN-oWeNs 93 effect was on the quality of the website, which contained mistakes and confusing information. ■■ Methods NYuHsl Workflow and solutions To resolve its web management issues, NYUHSL chose to work with the Drupal content management system (CMS). The ability to set up workflow and inventory content by date, subject, or author was a leading reason for that decision. Other reasons included usability of the backend for librarians, theming options, the scripting lan- guage the CMS uses (PHP), and Drupal’s popularity with other libraries and other NYU departments.9 NYUHSL’s Drupal environment has four main user roles: 1. Anonymous: These are visitors to the NYUHSL site who are not logged in (i.e., library users). They have no permissions to edit or manage content. They have no editorial responsibilities. 2. Library staff: This group includes all the staff content authors. Their role is to notice what content library users need and to contribute it. Staff have been encouraged to view website contributions as some- thing casual—more akin to writing an e-mail than writing a journal article. 3. Marketing team: This five-member group checks content that will appear on the homepage. Their mandate is to make sure that the content is accurate about library services and resources and represents the library well. Its members include both librarians and staff with relevant experience. 4. Administrators: There are three site admins; they have the most permissions because they also build the site and make changes to how it works. Two of the three admins have copyediting experience from prior jobs, so they are responsible for content approvals. They copyedit for spelling, grammar, and readability. Admins also check for malformed HTML created by the WYSIWYG (what you see is what you get) interface provided for authors, and they use their knowledge of other material on the site to look out for potential conflicts or add relevant links. Returning to the themes of correctness, choice, and presentation, it could be said that librarian authors are responsible for choice (deciding what to post), the mar- keting team is responsible for choice and presentation, and the administrators are responsible for all three. An important thing to understand is that each per- son in a role has the same permissions, and any one of In an upvoting system like Reddit (http://reddit .com), content is published by default, any user has the ability to upvote (i.e., approve) a piece of content, and the criterion for being featured on the front page is the number of approvals. In a moderation system, any user can submit content and the default behavior is for the moderator to approve anything that is not outright offensive. The moderator never edits, just chooses the state “approved” or the state “denied.” Moderation is often used to manage comments. Another model, not considered here, is to create separate “staging” and “production” websites. Content and features are piloted on the staging site before being pushed to the live site. (NYUHSL’s workflow occurs all on the live site.) Still, even in a staging/production sys- tem the workflow is implicit in choosing someone who has the permission and responsibility to push the staging site to the production site. problems at NYuHsl In 2007, the web services librarian position at NYUHSL had been open for nearly a year. Librarians who needed to post material to the website approached the head of library systems or the “sysadmin.” Both of them could post pages, but they did not proofread. Pages that became live on the website stayed: they were never systematically checked. If a librarian or user noticed a problem with a page, it was not clear who had the correct information or was responsible for fixing it. Often, pages that were found to be out-of-date would be delinked from other pages but were left on the server and thus findable via search engines or bookmarks. Because only a few people had FTP access to the server, but authored little content, the usernames shown on the server were useless for determining who was responsible for a page. Similarly, timestamps on the server were misleading; someone might fix one link on a page without reviewing the rest of it, so the page could have a recent timestamp but be full of outdated information. Even after a new web services librarian started in 2007, problems remained. The new librarian took over sole responsibility for posting content, which made the responsibility clearer but created a bottleneck, for exam- ple, if she went on vacation. Furthermore, in a library with five locations and about sixty full-time employees, it was hard for one person to do justice to all the libraries’ activities. If a page required editing, there was no way to keep track of whose turn it was to work on the document. There also was no automatic notification when a page was published. This made it possible for content to go astray and be forgotten. These problems added up to frustration for would-be content authors, a time drain for systems staff, and less time to create new content and sites. The most significant 94 iNForMAtioN tecHNoloGY AND liBrAries | septeMBer 2011 at the top of the homepage. Their appearance should not be delayed, so any staff author can publish one. Class sessions are specific dates, times, and loca- tions that a class is being offered. These posts are assembled from prewritten text, so there is no way to introduce errors and no reason to route them through an approval step. Figure 2 illustrates the main steps of the three cases. The names of the states are shown with arrows indicating which role can make each transition. Unlabeled arrows mean that any staff member can perform that step. Figure 3 shows how, at each approval step, content can be sent back to the author (with comments) for revi- sion. Although this happens rarely, it is important to have a way to communicate with the author in a way that is traceable by the workflow. Figure 4 illustrates the concept of retirement. NYUHSL needed a way to hide content from library users and search engines, but it is dangerous to allow library staff to delete content. Also, old content is sometimes useful to refer to or can even be republished if the need arises. Any library staff user can retire content if they recognize it as no longer relevant or appropriate. Additionally, library staff can resurrect retired content by resetting it to the draft state. That is, they cannot directly publish retired content (because they do not have permission to publish), but they can put it back on the path to being published by saving it as a draft, editing, and resubmitting for approval. Figure 5 shows that library staff do not really need to understand the details of workflow. For any new content, they only have two options: keep the content in the draft state or move it on to whatever next step is available. All them can perform an action. The five marketing team members do not vote on the content, nor do they all have to approve it; instead, any one of them, who happens to be at his workstation when they get a notification, is sufficient to perform the marketing team duty. Also, the marketing team members and administrators do not “self-approve”—no matter how good an editor someone may be, he or she is rarely good at editing her own work. NYUHSL’s workflow considers three cases: 1. Most types of content are reviewed by one of the administrators before going live. 2. Content types that appear on the homepage (i.e., at higher visibility) are reviewed by a member of the marketing team before being reviewed by an admin- istrator. 3. Two types of content do not go through any work- flow. Alerts are urgent messages that appear in red Figure 2. Approval Steps Figure 3. Returning Contents for Edits Figure 4. Retirement eDitoriAl AND tecHNoloGicAl WorkFloW tools to proMote WeBsite QuAlitY | MortoN-oWeNs 95 This may sound like a large volume of e-mail, but it does not appear to bother library staff. The subject line of every e-mail generated by the system is prefaced with “[HSL site]” for easy filtering. Also, every e-mail is signed with “Love, The NYUHSL website.” This started as a joke during testing but was retained because staff liked it so much. One described it as giving the site a “warm, fuzzy feeling.” Drupal Modules NYUHSL developers used a number of different Drupal modules to achieve the desired workflow functionality. A simple system could be achieved using fewer modules; the book Using Drupal offers a good walkthrough of Workflow, Actions, and Trigger.10 Of course, it also would be possible to implement these ideas in another CMS or in a home- grown system. This list does not describe how to configure each module because the features are constantly evolving; more information is available on the Drupal website.11 The Drupal modules used include: ■■ Workflow ■■ Actions ■■ Trigger ■■ Token ■■ Module Grants ■■ Wysiwyg, IMCE, IMCE Wysiwyg API Bridge ■■ Node Expire ■■ Taxonomy Role ■■ LDAP integration ■■ Rules ■■ Results participation Figure 6 shows the number of page revisions per person from July 14, 2009, to November 4, 2010. Since many pages are static and were created only once, but need to be updated regularly, a page creation and a page update count equally in this accounting, which was drawn from the node_revisions table in Drupal. It gives a general sense of content-related activity. A reasonable number of staff have logged in, includ- ing all of the librarians and a number of staff in key positions (such as branch managers). The black bars represent the administrators of the website. It is clear that the workflow system, while broadening participation, has hardly diffused primary responsibility of managing the website. The web services librarian and web manager have by far the most page edits, as they both write new content and edit content written by all other users. of the other options are hidden because staff do not have permission to perform them. The status of content in the workflow can be checked by clicking on the workflow tab of each page, but it also is tracked by notification e-mails. When the content enters a state requiring an approval, each person in that approv- ing role gets an e-mail letting them know something needs their attention. The e-mail includes a link directly to the editing page. For example, if a librarian writes a blog post and changes its state from “draft” to “ready for marketing approval,” he or she gets a confirmation e-mail that the post is in the marketing approval queue. The marketing team members each get an e-mail asking them to approve the post; only one needs to do so. Once someone has performed that approval, the marketing team members receive an e-mail letting them know that no further action is required. Now the content is in the “ready for approval” state and the author gets another e-mail notification. The administrators get a notification with a link to edit the post. Once an administrator gives the post final approval, the author gets an e-mail indicat- ing that the post is now live. The NYUHSL website workflow system also includes reminders. Each piece of content in the system has an author (authorship can be reassigned, so it is not neces- sarily the person who originally created the page). The author receives an e-mail every four months reminding him or her to check the content, revise it if necessary, and re-save it so that it gets a new timestamp. If the author does not do so, he or she will continue to get reminders until the task is complete. Also, the site administrators can refer to a list of content that is out of date and can follow up in person if needed. Note that reminders only apply to static content types like pages and FAQs, not to blog posts or event announcements, which are not expected to have permanent relevance. Figure 5. Workflow Choices for Library Staff Users 96 iNForMAtioN tecHNoloGY AND liBrAries | septeMBer 2011 check the status by clicking on the workflow tab. This eliminates the discouraging mystery of having con- tent get lost on the way to being published. ■■ Identifying “problem” content: The Node Expire module has been modified to send e-mail remind- ers about stale content; as a result, this “problem” Figure 7 shows the distribution of content updates once the web team members have been removed. It is clear that a small number of heroic contributors are responsible for the bulk of new content and updates, with other users logging on sporadically to address specific needs or problems. How editorial Workflow Addresses NYuHsl’s problems Different aspects of the NYUHSL editorial workflow address different website problems that existed before the move to a CMS. Together, the workflow features create a clearly defined track that marches contributed content along a path to publication while always making the history and status of that content clear. ■■ Keeping track of who wrote what when: This information is collected by the core Drupal software and visible on administrative pages. (Drupal also can be customized to display or sort this information in more convenient ways.) ■■ Preventing mistakes and incon- sistencies: This requires a human editor, but Drupal can be used to formalize that role, assign it to specific people, and ensure noth- ing gets published without being reviewed by an editor. ■■ Bottlenecks: NYUHSL eliminated bottlenecks that stranded content waiting for one person to post it by creating roles with multiple members, any one of whom can advance content to the next state. There is no step in the system that can be performed by only one person. ■■ Knowledge: The issue of having too much going on in the library for one person to report on was addressed by making it easier for more people to contribute. Drupal encourages this through its usability (especially a WYSIWYG editor), and workflow makes it safe by controlling how the contributions are posted. ■■ “Lost” content: When staff contribute content, they get e-mail notifications about its status and also can Figure 6. Number of Revisions by User each user is indicated by their employee type rather than by name. Figure 7. Number of Revisions by User, Minus Web Team each user is indicated by their employee type rather than by name eDitoriAl AND tecHNoloGicAl WorkFloW tools to proMote WeBsite QuAlitY | MortoN-oWeNs 97 places web content in the context of other communica- tion methods, like e-mail marketing, press releases, and social media.12 In her view, it is not enough to consider a website on its own; it has to be part of a complete strategy for communicating with an organization’s audi- ence. Libraries embarking on a website redesign would benefit from contemplating this larger array of strategic issues in addition to the nitty-gritty of creating a process to ensure quality. ■■ Conclusions NYUHSL differs from other libraries in its size, status as an academic medical library, level of IT staffing, and other ways. Some aspects of NYUHSL’s experience implement- ing editorial workflow will, however, likely be applicable to other libraries. It does not necessarily make sense to assign editorial responsibility to IT staff; instead, there may be someone on staff who has editorial or journalistic experience and could serve as the content approver. Many universities offer short copyediting courses, and a prospective website editor could attend such a course. Implementing a workflow system, especially in Drupal, requires a lot of detailed configuration. Developers should make sure the workflow concept is clearly mapped out in terms of states, roles, and transitions before attempting to build anything. Workflow can seem complicated to users too, so developers should endeavor to hide as much as possible from nonadministrators. Small mistakes in Drupal settings and permissions can cause confusing failures in the workflow system. For example, a user may find him or herself unable to advance a blog post from “draft” to “ready for approval,” or a state change from “ready for approval” to “live,” and may not actually cause the content to be published. It would save time in the long run to thoroughly test all the possibilities with volunteers who play each role before the site is in active use. Finally, when the workflow is in place, the website’s managers may find themselves doing less writing and fewer content updates. They have a new role, though: to curate the site and support staff who use the new tools. The concept of editing is not yet consistently applied to websites unless the site represents an organization that already relies on editors (like a newspaper)—but it is gaining recognition as a best practice. If the website is the most readily available public face of an institution, it should receive editorial attention just as a brochure or fundraising letter would. Workflow is one way that libraries can promote a higher level of quality and per- ceived competence and reliability through their website presence. content is usually addressed by library staff without the administrators/editors doing anything at all. The administrators also can access a page that lists all the content that has been marked as “expired” so they know with whom to follow up. ■■ Outdated content: Some content may be outdated and undesirable to show the public or be indexed by search engines, but be useful to librarians. It also is not safe to allow staff to delete content, as they may do so by accident. These issues are addressed by the notion of “retiring” content, which hides content by unpublish- ing it but does not delete it from the system. ■■ Future Work The workflow system sets up an environment that achieves NYUHSL’s goals, structurally speaking, but social (nontechnology) considerations prevent it from living up to its full potential. Not all of the librarians contribute regularly. This is partly because they are busy, and writing web content is not one of their job requirements. Another reason is that some staff are more comfortable using the system than others, a phenom- enon that reinforces itself as the expert users spend more time creating content and become even more expert. A third cause is that not all librarians may perceive that they have something useful to say. Reluctant con- tributors have no external motivation to increase their involvement. It would be helpful to formalize the role of librarians as content contributors. There is presently no librar- ian at NYUHSL whose job description includes writing content for the website; even the web services librar- ian is charged only with “coordinating, designing, and maintaining” sites. Ideally, every librarian job description would include working with users and would mention writing website content as an important forum for that. That said, it is not clear what metric could be used to judge the contributions fairly. It also is important to continue to emphasize the value of content contributions so that librarians are motivated and feel recognized. Even librarians whose specialties are not outreach-oriented (e.g., systems librarians) have expert knowledge that could be shared in, say, a short article on how to set up RSS feeds. Workflow is part of a group of concerns being called “content strategy.” This concept, which has grown in popularity since 2008, includes editorial quality along- side issues like branding/messaging, search engine optimization, and information architecture. A content strategist would be concerned with why content is meaningful in addition to how it is managed. In her brief, useful book on the topic, Kristina Halvorson 98 iNForMAtioN tecHNoloGY AND liBrAries | septeMBer 2011 5. Andrea Everard and Dennis F. Galletta, “How Presentation Flaws Affect Perceived Site Quality, Trust, and Intention to Purchase from an Online Store,” Journal of Management Information Systems 22 (2005–6): 79. 6. Pamela Briggs et al., “Trust in Online Advice,” Social Science Computer Review 20 (2002): 330. 7. Patrick J. Lynch and Sarah Horton “Online Style,” Web Style Guide, 3rd ed., http://webstyleguide.com/wsg3/9-edito- rial-style/3-online-style.html (accessed Dec. 1, 2010). 8. Janice (Ginny) Redish, Letting Go of the Words: Writing Web Content that Works (San Francisco: Morgan Kaufman, 2007). 9. Emily G. Morton-Owens, Karen L. Hanson, and Ian Walls, “Implementing Open-Source Software for Three Core Library Functions: A Stage-by-Stage Comparison,” Journal of Electronic Resources in Medical Libraries 8 (2011): 1–14. 10. Angela Byron et al., Using Drupal (Sebastopol, Calif.: O’Reilly, 2008). 11. All Drupal modules can be found via http://drupal.org/ project/modules. 12. Kristina Halvorson, Content Strategy for the Web (Berkeley, Calif.: New Riders, 2010). ■■ Acknowledgments Thank you to Jamie Graham, Karen Hanson, Dorothy Moore, and Vikram Yelanadu. References 1. Charles Martell, “The Absent User: Physical Use of Academic Library Collections and Services Continues to Decline 1995–2006,” Journal of Academic Librarianship 34 (2008): 400–407. 2. Jeanie M. Welch, “Who Says We’re Not Busy? Library Web Page Usage as a Measure of Public Service Activity,” Reference Services Review 33 (2005): 371–79. 3. B. J. Fogg and Hsiang Tseng, “The Elements of Computer Credibility” (paper presented at CHI ’99, Pittsburgh, Pennsylvania, May 15–20, 1999): 82. 4. B. J. Fogg et al., “What Makes Web Sites Credible? A Report on a Large Quantitative Study” (paper presented at SIGCHI ’01, Seattle, Washington, Mar. 31–Apr. 4, 2001): 67–68. 1765 ---- 86 iNForMAtioN tecHNoloGY AND liBrAries | septeMBer 2011 on technology and other decisions in my career. I know that I can post a question to LITA-L or ALA Connect and get a quick, diverse response to an inquiry. I know that I can call on my LITA colleagues to serve as references and reviewers as I move through my career. I also know that I can depend upon LITA to help keep me current and well informed about technology and how it is integrated into our libraries and lives. This also gives me an edge in my career. So much of the LITA experience is currently gained from attending meetings in person and making connec- tions—those of you who have attended the LITA Happy Hour can probably attest to this. For several years LITA has not had a requirement to attend meetings in person and allows for virtual participation in committees and interest groups. Several ad hoc methods have developed to allow members to attend meetings virtually. To better institution- alize the process two new taskforces have been formed to look at virtual participation in formal and informal LITA meetings. A broadcasting taskforce is charged with mak- ing a recommendation on the best ways to deliver business meetings and another taskforce is charged with investi- gating methods to deliver education and programming to members virtually. Both taskforces will pay careful attention to interaction and other attributes of in person gatherings that can be applied to virtual meetings so that we retain the connection-making experience. It is hard to assign monetary value to membership in an association, but we do so every time we make a deci- sion to join or renew membership. When I renew and pay annual dues to LITA I affirm that I am receiving value, and I do so without thinking. It is a given that I will renew. In addition to my library memberships I am a member of the Wildlife Conservation Society (the group behind the Bronx Zoo and several other zoos in NYC). Each year as I renew my membership I do a quick cost analysis calculat- ing how many times I visited the zoos and what it would have cost my family if we were not members. But before I can finish that exercise my mind begins to wander and I start to think about the excursions to the zoos-- camel rides, newborn animals—and those experiences and the memories created derail any cost recovery exercise. It is impossible to put monetary value on the wonderful experiences my family share during our visits to the zoo (incidentally it is more economical as well). I also feel some pride in contributing to an organization that does such wonderful programming and makes a real differ- ence for animals and our planet. I understand that my membership helps them do what they do best. I don’t do this cost analysis with LITA, but perhaps I should. The current price of LITA membership is sixty dollars per year, which is about sixteen cents per day. As members we need to ask ourselves if we are receiving in return what A s I write my first President’s Message for ITAL, I am wrapping up my year as vice president and the ALA Annual Conference is fast approaching. The past year has been a busy one—handling necessary division business, including meeting with my fellow ALA vice presidents, making committee appointments, planning an orientation for new board members, strategic planning, and attending various conferences and meet- ings to prepare me for my role as LITA president. I am lucky to follow such wonderful leaders as Karen Starr and Michelle Frisque, who have both helped ready me for the year ahead. My life outside of LITA has been equally busy. I started a new position as the director of Weill Cornell Medical College Library earlier this year and have a busy home life with two small children. As usual, I have been juggling quite a bit and often dropping a few balls. My mantra is that it is impossible to keep all the balls in the air all of the time, but when they do drop be careful not to let them roll so far away from you so that you lose sight of them. Eventually I pick them up and start juggling again. I know that I am not alone in this juggling exercise. LITA members have real jobs and friends, family, and other social responsibilities that keep us busy. So why do we give so much to our profession, including LITA? If you are like me, it is because we get so much in return. The importance of activity and leadership in national, profes- sional associations cannot be overrated. My experience in LITA and other professional library associations has given me an opportunity to hone leadership skills working with various committees and boards over the years. The achievements that I have made in my career have a direct correlation to my work with LITA. As libraries flatten organizational structures, LITA is one place where anyone can take on leadership roles, gaining valuable experience. Many members have agreed to take on leadership roles in the coming year by volunteering for committees and task- forces and accepting various appointments and I want to thank everyone who came forward. In the coming year I will be working with several officers and committees to develop orientations, mentoring initiatives, and leader- ship training for our members. I do appreciate that not everyone wants to take on a leadership role in LITA. The networking opportunities, both formal and informal, also have been extremely valu- able in my career. The people I have met in LITA have become colleagues I am comfortable turning to for advice colleen cuddy (colleen.cuddy@med.cornell.edu) is liTa Presi- dent 2011–12 and director of the samuel J. Wood library and C. V. starr Biomedical information Center at Weill Cornell Medical College, new york, new york Colleen Cuddy President’s Message: Reflections on Membership Continued on page 89 eDitoriAl | truitt 89 editorial.cfm (accessed July 13, 2011). 3. Begin with Fforde’s The Eyre Affair (2001) and proceed from there. If you are a librarian and are not quickly enchanted, you probably should consider a career change very soon! Thank you, Michele N! .youtube.com/watch?v=Sps6C9u7ras. Sadly, the rest of us must borrow or rent a copy. 2. Marc Truitt, “No More Silver Bullets, Please,” Information Technology & Libraries 29, no. 2 (June 2010), http://www.ala .org/ala/mgrps/divs/lita/publications/ital/292010/2902jun/ we give to the organization. The LITA Assessment and Research Committee recently surveyed membership to find out why people belong to LITA, this is an important step in helping LITA provide programming etc. that will be most beneficial to its users, but the decision on whether to be a LITA member I believe is more personal and doesn’t rest on the fact that a particular Drupal class is offered or that a particular speaker is a member of the Top Tech Trends panel. It is based on the overall experience that you have as a member, the many little things. I knew in just a few minutes of attending my first LITA open house 12 years ago that I had found my ALA home in LITA. I wish that everyone could have such a positive experience being a member of LITA. If your experience is less than positive how can it be more so? What are we doing right? What could we do differently? Please let me or another officer know, and/or volunteer to become more involved and create a more valuable experience for yourself and others. President’s Message continued from page 86 1766 ---- eDitoriAl | truitt 87 L ife out of balance. Those who saw it will surely recall the 1982 film that juxtaposed images of stun- ning natural beauty with scenes of humankind’s intrusion into the environment, all set to a score by Philip Glass. The title is a Hopi word meaning “life out of bal- ance,” “crazy life,” “life in turmoil,” “life disintegrating,” or “a state of life that calls for another way of living.” While the film, as I recall, relied mainly on images of urban landscapes, mines, power lines, etc., to make its point about our impact on the world around us, it did include as well images that had a technological focus, even if the pre–PC technology exemplars shown may seem somewhat quaint thirty years later.1 The sense that one is living in unbalanced, crazy, or tumultuous times is nothing new. Indeed, I think it’s fair to say that most of us—our eyes and perspectives firmly and narrowly riveted to the here and now—tend to believe that our own specific time is one of uniquely rapid and disorienting change. But just as there have been, and will be, periods of rapid technological change, social upheaval, etc.—“Been there, done that, got the t-shirt,” to recall the memorably pithy, if now slightly oh-so-aughts, slogan—so too have there been reactions to the conditions that characterized those times. A couple of very different but still pertinent examples come to mind. In the second half of the nineteenth century, a reaction against the social conservatism and shoddy, mass-pro- duced goods of the Victorian era began in England. Inspired by writer and designer William Morris, the Arts and Crafts movement emphasized simplicity, hand-made (as opposed to factory-made) objects, and social reform. By the turn of the century, the movement had migrated to the United States—memo to self: who were the leading lights of the movement in Canada?—finding expression in the “Mission-style” furniture of Gustav Stickley, the elegant art pottery of Rookwood, Marblehead, and Teco, and the social activism of Elbert Hubbard’s Roycrofters. Fast-forward another half-century to the mid-1960s and the counter-culture of that time, itself a reaction to the racism, sexism, militarism, and social regimentation of the preceding decade. For a brief period, experimentation with “alternative lifestyles,” resistance to the Vietnam war, and agitation for social, racial, and sexual change flourished. Whatever one’s views about, say, the flower children, civil rights demonstrations, or the wisdom of U.S. involvement in Vietnam, it’s well-nigh impossible to argue that the society that emerged from that time was not fundamentally different from the one that preceded it. That both of these “movements” ultimately were sub- sumed into the larger whole from which they sprang is only partly the issue. And my aim is not to romanticize either of these times, even as I confess to more than a pass- ing interest in and sympathy for both. Rather, my point is that their roots lay in a reaction to excesses—social, cultural, economic, political, even technological—that marked their times. They were the result of what might be termed “life out of balance.” In turn, their result, viewed through a longer lens, was a new balance, incorporating elements of the status quo ante and critical pieces from the movements themselves. Thesis —> Antithesis —> Synthesis. We find ourselves in such unbalanced times again today. Even without resort to over-hyped adjectives such as “transformational,” it is fair to say that we are in uncertain times. In libraries, budgets, staffing levels, and gate counts are in decline. The formats and means of information delivery are rapidly changing. Debates rage over whether we are merely in the business of delivering “information” or of preserving, describing, and imparting learning and knowledge. Perhaps worst of all, as our role in the society of which we are a part changes into some- thing we cannot yet clearly see, we fear “irrelevance.” What will happen when everyone around us comes to believe that “everything [at least, everything that’s impor- tant] is on the web” and that libraries and librarians no longer have a raison d’être? For much of the past decade and a half—some among us might argue even longer—we’ve reacted by taking the rat-in-the-wheel approach. To remain “relevant,” we’ve adopted practically every new fad or technology that came along, endlessly spinning the wheel faster and faster, adopting the tokens of society around us in the hope that by so doing we would stanch the bleeding of money, staff, patrons, and our own morale. As I’ve observed in this space previously,2 we’ve added banks of über-connected computers, clearing away book stacks to design technology-focused creative services and collab- orative spaces around them. We’ve treated books almost as smut, to be hidden away in “plain-brown-wrapper” compact storage facilities. We’ve reduced staffing, in the process outsourcing some services and automating others so that they become depersonalized, the library equiva- lent of a bank automated teller machine. We’ve forsaken collection building, preferring instead to rent access to resources we don’t own and to cede digitization control of those resources that we ostensibly do own. Where does it end? In a former job, I used to joke that my director’s vision of the library would not be fully real- ized until no one but the director and the library’s system administrator were left on staff and nothing but a giant super-server remained of the library. It seemed only black humor then. Today it’s just black. Marc Truitt Marc truitt (marc.truitt@ualberta.ca) is associate university librarian, Bibliographic and information Technology services, university of alberta libraries, edmonton, alberta, Canada, and editor of ITAL. Editorial: Koyaanisqatsi 88 iNForMAtioN tecHNoloGY AND liBrAries | septeMBer 2011 and intellectual rest. They are places of the imagi- nation. Play to these strengths. Those seeking to reimagine library spaces as refuges could hardly do better than to look to Jasper Fforde’s magical BookWorld in the Thursday Next series for inspira- tion.3 Stuffy academics and special libraries take note: Library magic is not something restricted to children’s rooms in public libraries. Walk through the glorious spaces of Yale’s Sterling Memorial Library or visit the Reading Room at the University of Alberta’s Rutherford Library—known to the present genera- tion of students as the “Harry Potter Room,” for its evocation of the Hogwarts School’s Great Hall—and then tell me that magic does not abound in such places. It’s present in all of our libraries, if we but have eyes to see and hearts to feel. ■■ The library was once a place for the individual. To contemplate. To do research. To know the peace and serenity of being alone. In recent years, as we’ve moved toward service models that emphasize collab- oration and groups, I think we’ve lost track of those who do not visit us to socialize or work in groups. We need to reclaim them by devoting as much attention to services and spaces aimed at those seeking alone- ness as we do at those seeking togetherness. The preceding list will probably brand me in the minds of some readers as anti-technology. I am not. After spending the greater part of my career working in library IT, I still can be amazed at what is possible. “Golly? We can do that?” But I firmly believe that library technology is not an end in itself. It is a tool, a service, whose pur- pose is to facilitate the delivery of knowledge, learning, and information that our collections and staff embody. Nothing more. That world view may make me seem old fashioned; if such be the case, count me proudly guilty. In the end, though, I come back to the question of bal- ance. There was a certain balance in and about libraries that prevailed before the most recent waves of techno- logical change began washing over libraries a couple of decades ago. Those waves disrupted but did not destroy the old balance. Instead, they’ve left us out of balance, in a state of Koyaanisqatsi. It’s time to find a new equilib- rium, one that respects and celebrates the strengths of our traditional services and collections while incorporating the best that new technologies have to offer. It’s time to synthesize the two into something better than either. It’s time for balance. References and Notes 1. Wikipedia, “Koyaanisqatsi,” http://en.wikipedia.org/ wiki/Koyaanisqatsi (accessed July 12, 2011). ITAL readers in the United States can view the entire film online at http://www More importantly, where has all this wheel spinning gotten us, other than continued decline and yet more hand-wringing and anguish about irrelevance? It’s time to recognize that we are living in a state of Koyaanisqatsi (life out of balance). And it’s up to us to do something new about it by creating a new balance. Here are a few perhaps out-of-the-box ideas that I think could help with establishing that balance. Spoiler alert: Some of these may seem just a bit retro. I can’t help it: my formative library years predate the Chicxulub asteroid impact. Anyway, here goes: ■■ Cease worrying so about “relevance.” Instead, iden- tify our niche: design services and collections that are “right” and uniquely ours, rather than pale reflections of fads that others can do better and that will eventu- ally pass. We are not Google. We are not Starbucks. We know that we cannot hope to beat these sorts of outfits at their games; perhaps less obvious is that we should be extremely wary of even partnering with them. Their agenda is not ours, and in any conflict between agendas, theirs is likely to prevail. We must identify something unique at which we excel. ■■ Find comfort in our own skins. Too many of us, I sense, are at some level uneasy with calling ourselves “librarians.” Perhaps this is so because so many of us came to the profession by this or that circuitous route, that is, that we intended to be something else and wound up as librarians. Get over it and wear the sensible shoes proudly. ■■ Stop trying to run away from or hide books. They are, after all, perceived as our brand. Is that such a bad thing? ■■ Quit designing core services and tools that are based on the assumption that our patrons are all lazy imbeciles who will otherwise flee to Google. The evidence suggests that those folks so inclined are already doing it anyway; why not instead aim at the segment that cares about provision of quality content and services—in collections, face-to-face instruction, and metadata? People can detect our arrogance and condescension on this point and will respond accord- ingly, either by being insulted and alienated or by acting as we depict them. ■■ Begin thinking about how to design and deliver ser- vices that are less reliant on technology. Technology has become, to borrow from Marx, the opiate of libraries and librarians; we rely on it to the exclusion of nontechnological approaches, even when the latter are available to us. Technology has become an end in itself, rather than a means to an end. ■■ Libraries are perceived by many as safe harbors and refuges from any number of storms. They are places of rest—not only of physical rest, but of emotional eDitoriAl | truitt 89 editorial.cfm (accessed July 13, 2011). 3. Begin with Fforde’s The Eyre Affair (2001) and proceed from there. If you are a librarian and are not quickly enchanted, you probably should consider a career change very soon! Thank you, Michele N! .youtube.com/watch?v=Sps6C9u7ras. Sadly, the rest of us must borrow or rent a copy. 2. Marc Truitt, “No More Silver Bullets, Please,” Information Technology & Libraries 29, no. 2 (June 2010), http://www.ala .org/ala/mgrps/divs/lita/publications/ital/292010/2902jun/ we give to the organization. The LITA Assessment and Research Committee recently surveyed membership to find out why people belong to LITA, this is an important step in helping LITA provide programming etc. that will be most beneficial to its users, but the decision on whether to be a LITA member I believe is more personal and doesn’t rest on the fact that a particular Drupal class is offered or that a particular speaker is a member of the Top Tech Trends panel. It is based on the overall experience that you have as a member, the many little things. I knew in just a few minutes of attending my first LITA open house 12 years ago that I had found my ALA home in LITA. I wish that everyone could have such a positive experience being a member of LITA. If your experience is less than positive how can it be more so? What are we doing right? What could we do differently? Please let me or another officer know, and/or volunteer to become more involved and create a more valuable experience for yourself and others. President’s Message continued from page 86 1767 ---- 90 iNForMAtioN tecHNoloGY AND liBrAries | septeMBer 2011 Michael Witt I ’ll never forget helping one of my relatives learn how to use his first computer. We ran through the basics: turning the computer and monitor on, pointing and clicking, typing, and opening and closing windows. I went away to college, and when I came back for the holidays, he happily showed off his new abilities to send emails and create spreadsheets and such. Despite his well-earned pride, I couldn’t help but notice that when he reached the edge of the desk with the mouse, he would use his other hand to place a photo album up against the desk and roll the mouse onto it, in order to reach the far right-hand side of the screen with the pointer. When I picked up his hand and the mouse and re-centered it on the desk for him, I think it blew his mind. He had been using the photo album to extend the reach of the mouse and pointer for months! It occurred to me that I should have spent more time with him, not just showing him what to do, but watching him do it. Those of us working in information technology have a tremendous impact on library staff productivity by vir- tue of the systems we select or develop and implement. People working in most facets of library operations trust and rely on our hardware and software to accomplish their daily work, for which we bear a significant burden of responsibility. Are they using the best possible tools for their work? Are they using them in the best way? A great deal of effort has gone into user-centered design and improving functionality for our patrons, but in this time of reduced budgets and changing staff roles, it is important to extend similar consideration to the systems that we provision for our co-workers. At its best, infor- mation technology has the ability to save time and add value to the library by creating efficiencies and empower- ing people to do better and new work. Whether we are evaluating new integrated library systems or choosing the default text editor for our workstations, we are presented with opportunities to learn more about how our libraries accomplish work “on the ground” and reconsider the role that technology can play in helping them. The phrase “eating your own dog food” is so com- mon in software development circles that some have begun using it as a verb. Developers engage in “dog- fooding” by using new software themselves, internally, to identify bugs and improve usability and functionality before releasing it to users. This is a regular practice of companies such as Microsoft1 and Google2. Setting aside any negative connotations for the moment (Why are people eating dog food? And exactly who are the “dogs” in this scenario?), there is a lot that we can learn by put- ting ourselves in the place of our users and experiencing our systems from their perspective. Perhaps the best way to do this is to walk around the building and spend time in each unit of the library, shadowing its staff and observing how they interact with systems to do their work. Try to learn their workflow and observe the tasks they perform—both online and offline. You don’t need to become an expert, but ideally you’d be able to try to perform some of the tasks yourself. In one case, we were able to identify and enable someone to design and run their own reports, which helped their unit make more timely decisions and eliminated the need for IT to run monthly reports on their behalf. If these tasks support user-facing interactions, you might get some good usability information in the process too. For example, I learned more about our library’s website by working chat reference for an hour a week than I did in two years of web development team meetings! Part of this process is attempting to feel our users’ pain, too. Do you use the same locked-down workstation image that you deploy to your staff desktops? There is also a tendency among IT staff to keep the newest and best machines for their own use and cycle older machines to other units. I understand—IT staff are working with databases and doing developing software, and so we benefit the most from higher-performing machines—but keep in mind that your co-workers likely have older, slower machines and take the lowest common denomina- tor hardware into account when selecting new software. By walking a mile in your users’ shows, you may gain a deeper appreciation and understanding of the other units of the library and how they work together. Because so much work is done on computers, people working in information technology can often see a broad picture of the activities of the library. We have the ability to make connections and identify potential points of integration, not only between machines but also between people and their work. References 1. Pascal G. Zachary, Showstopper! The Breakneck Race to Create Windows NT and the Next Generation at Microsoft (New York: Free Press, 1994): 129–56. 2. Stephen Levy, “Inside Google+: How The Search Giant Plans to Go Social,” http://www.wired.com/epicenter/2011/06/ inside-google-plus-social/all/1 (accessed July 12, 2011). Editorial Board Thoughts: Eating Our Own Dogfood Michael Witt (mwitt@purdue.edu) is the interdisciplinary re- search librarian and an assistant professor of library science at Purdue university in West lafayette, indiana. he serves on the editorial Board of ITAL. 1768 ---- FActors AFFectiNG uNiversitY liBrArY WeBsite DesiGN | kiM 99 Yong-Mi Kim Factors Affecting University Library Website Design factors include usability testing and institutional forces.5 Because website design studies are sparse, this study examines the success of technology utilization studies to further identify factors that are pertinent to website design in order to provide a comprehensive view of web design success factors. A review of literature related to university library website design will be offered in the next section. The research methods, which discuss the data collection strat- egies and the measurements used in the current study, will be followed by the literature review. The findings of the study will later be reported and discussed after the research methods section. The paper will then conclude with an overview of the implications the findings have for academia and managers. ■■ Literature Review This section offers an overview of the existing website design literature and relevant success factors. These fac- tors include institutional forces, supervisors’ technical knowledge and support, input from secondary sources, and input from users. Because the aforementioned ele- ments are identified as independent variables, this study also adopts them as such. Following existing studies, website success factors are identified from the utilitarian perspective.6 The dependent variables are (1) the extent to which website designers meet users’ needs, (2) the extent to which users perceive ULWR to be useful, and (3) their actual usage. In this manner, the evaluation of success is measured from different perspectives. This discussion of the independent and the dependent variables appears in the conceptual model, figure 1. institutional Forces Institutional forces refer to as organizations following other organizations practices to secure efficiency and legitimacy. Existing studies have identified three institutional forces: coercive, mimetic, and normative.7 Coercive force takes place when an organization pressures others to adopt a cer- tain practice. It is higher when an organization is a subset of another organization. In this research context, the uni- versity could be an agent of coercive force. Mimetic force refers to organizations following other organizations’ prac- tices, and it is especially common for organizations within the same industry group.8 Because organizations within Existing studies have extensively explored factors that affect users’ intentions to use university library website resources (ULWR); yet little attention has been given to factors affect- ing university library website design. This paper investi- gates factors that affect university library website design and assesses the success of the university library website from both designers’ and users’ perspectives. The findings show that when planning a website, univer- sity web designers consider university guidelines, review other websites, and consult with experts and other divi- sions within the library; however, resources and training for the design process are lacking. While website designers assess their websites as highly successful, user evaluations are somewhat lower. Accordingly, use is low, and users rely heavily on commercial websites. Suggestions for enhancing the usage of ULWR are provided. F rom a utilitarian perspective, a website evaluation is based on users’ assessments of the website’s instru- mental benefits.1 If a website helps users complete their tasks, they are likely to use the website. Following this line of reasoning, dominant research has reported that users are most likely to use university library website resources (ULWR) when they can help with user tasks.2 Although we know now that the utilitarian perspective should be applied to web design, not clear is the extent to which web designers consider users’ needs and, likewise, the extent to which users consider ULWR to be success- ful in terms of meeting their needs. Also not clear are what factors other than user needs influence university library website design. This is not a trivial issue because university libraries have invested a massive number of resources into providing web services and need to justify their investments to stakeholders (such as the university) by demonstrating their ability to meet users’ needs.3 Also important is the identification of these factors because web design and website performance are closely cor- related.4 As a consequence, investigating factors that influence successful university library website design and providing managerial guidance is a timely pursuit. Later, the objectives of this paper are twofold: 1. What factors influence university library website design? 2. To what extent do website designers and users con- sider the university library website to be successful? To explore these research questions, this study identi- fies factors influencing university library website design that have been reported in existing literature. These Yong-Mi kim (yongmi@ou.edu) is assistant Professor, school of library and information studies, university of oklahoma, Tulsa, oklahoma. 100 iNForMAtioN tecHNoloGY AND liBrAries | septeMBer 2011 Although it is a critical factor for website success, there is little evidence that website designers receive strong support from their supervisors. Research shows that supervisors’ lack of knowledge about websites inhibits user-centered website design.17 A respondent from Chen et al.’s study reports, “It’s really a pain trying to connect with our administration on the topic of Web design and usability, because even definitions are completely out the window” and “the dean and the associate directors know little about the need for usability and view it as a last minute check-off, so they can say that the Web site is tested and usable.”18 Lack of supervisor support inhibits website usability.19 input from secondary sources Website designers typically aggregate information from secondary sources rather than from users. Identified secondary sources are consultations with experts, other divisions within the library, webmasters, web commit- tees, and focus groups.20 The most widely used method is consultation with experts.21 Experts uncover techni- cal flaws and any obvious usability problems with a design,22 facilitate focus groups,23 and create new infor- mation architecture.24 Because they are experts, however, their ways of thinking may not be the same as users.’25 Research shows that 43 percent of the problems found by expert evaluators were actually false alarms and that 21 percent of users’ problems were missed by those evalu- ators. If this analysis is true, expert evaluators tend to miss and incorrectly identify more problems than they correctly identify;26 consequently, expert testing should not substitute for user testing.27 Another problem with secondary sources is that web committees “are ignorant about integrating design with usability and focus on their own agenda.”28 Nonetheless, because of the lack of avail- able resources to conduct more rigorous usability tests and the difficulty of collecting information directly from users, secondary sources such as expert evaluations are commonly used.29 input from users User input provides a great advantage for directly find- ing out users’ needs and integrating a user-centered design during the development stage.30 Often, informa- tion from secondary sources makes assumptions about users’ needs.31 To discover users’ genuine needs, design- ers can conduct a regular user survey and/or seek out users’ input.32 By surveying users’ needs, one can over- come criticism such as, “most websites are created with assumptions of more expert knowledge than the users may actually possess,” and can address users’ needs more effectively.33 Discovering users’ needs goes beyond usability testing because information obtained directly the same industry face similar problems or issues, mimetic decisions can reduce uncertainty and secure legitimacy.9 In this context, website designers may analyze and emulate other universities’ websites to claim that their websites are congruent with successful websites, thereby justifying their managerial practices. Normative force is associated with professionalism.10 Normative force occurs when the norms (e.g., equity, democracy, etc.) of the professional commu- nity are integrated into organizational decision-making. In a library setting, website designers may follow a set of value systems or go to conferences to discover ways to bet- ter deliver services. There is evidence that website designers follow other organizations.11 This phenomenon is known as isomor- phism. The appearance and the structure of websites show isomorphic patterns when an organization follows examples of other organizations’ websites or conforms to institutional pressures.12 Another study reports coercive forces in the design of university library websites; the parent institution exercises power over library website design by providing guidelines, and later, the design is not independent.13 supervisors’ technical knowledge and support Literature on supervisors’ knowledge of and support for technology has long been recognized as one of the most important technology success factors.14 If super- visors are knowledgeable about technology, they are likely to support and provide resources for training.15 Supervisors’ technical knowledge also serves as a signal for the importance of the utilization of technology within the organization; consequently, employees actively look for ways to utilize technology and vigorously adopt technology.16 Figure 1. Conceptual Model for Website Design Success FActors AFFectiNG uNiversitY liBrArY WeBsite DesiGN | kiM 101 March and May 2009. A total of 315 responses were col- lected (139 males and 176 female; 148 undergraduates, 101 master ’s, and 66 doctoral/faculty; Business 152, Human relations 51, Psychology 43, Engineering 41, Education 20, Other 8). Because detailed discussion of the user side of this sample appears elsewhere,36 it will not be repeated here to avoid redundancy. Because sparse research has been done in this area, the questionnaire and its measurements were created based on literature relating to the successful deployment of technology, but they were modified to fit into the website design context. Because of this modification, the finalized instrument was pretested and pilot tested before use in this study.37 The institutional forces are measured in three catego- ries: coercive isomorphism (i.e., following the university guidelines regarding website creation), mimetic iso- morphism (i.e., investigating other university websites and investigating commercial websites), and normative isomorphism (i.e., attending conferences). Following existing studies, supervisors’ knowledge and support are assessed by the web designer in two areas: the extent to which a supervisor is knowledgeable about technol- ogy and aware of the importance of technology. The supervisor ’s support for the website is measured by asking web designers about the extent to which their supervisors allocated resources and offered training. Input from secondary sources is measured by asking the extent to which website designers consult sources such as experts, other divisions, webmasters, and web com- mittees. Input from users is measured by the extent to which web designers collect information from website users. Finally website successes are measured by two categories: assessments made by the web designers and the website users themselves. The finalized measure- ments and the sources appear in table 1. ■■ Report of Findings This section reports the empirical findings of each cat- egory discussed in the previous section. Figure 2 shows institutional forces that influence university library web- site design. The first category is coercive force, the second category is mimetic forces, and the third category is normative force. It is clear that the majority of univer- sity library web designers (75 percent) comply with the guidelines given by the university, which is a measure- ment of coercive force; and also designers investigate other universities’ websites (75 percent) and commercial websites (59 percent), which is a measurement of mimetic forces; however, designers don’t appear to actively attend conferences that influence website design, which is a mea- surement of normative force. from users will reveal what users want and what should be done to meet their needs, thereby enhancing ULWR usage. However, research shows that this aspect is not actively integrated into web design due to the lack of sup- port from supervisors.34 Website success Success can be measured according to the website’s purpose: to what extent does the website meet users’ needs? In the university library website context, follow- ing a utilitarian perspective, researchers measured the success by the degree of ULWR integrated into users’ tasks and users’ frequent visits to the website.35 These two measurements, when combined with the designers’ perceptions of success, will allow one to measure the users’ and designers’ perspectives of website success. By measuring from these two sides, if there is a discrep- ancy between the two success outcomes, it will prompt designers to adjust their viewpoints to align their suc- cess measures with users. ■■ Research Methods This section discusses the sampling strategies and the measurements for the independent and the dependent variables. Because one of the contributions of this study is to compare users’ and designers’ perceptions of website success, the samples are drawn from two groups: one is from university library website designers and the other one is from university library users. For the designer side, it is directly collected from univer- sity library website designers; later, libraries without website designers within the library are excluded. The designer sample is identified from the publicly avail- able Yahoo academic library list (http://dir.yahoo.com/ reference/libraries). The list contains 448 academic libraries, including those outside the United States. The research assistant made a phone call to the libraries that reside in the United States and verified the existence of website designers within the library, which included 86 academic libraries. If a library had a website designer, the research assistant contacted the person and invited him or her to participate in the study. Because of difficul- ties contacting website designers, the research assistant was able to collect 16 responses between May 2009 and February 2010. Once the graduate assistant identified the unreachable designers, the researcher e-mailed those designers between January and April of 2010 and added 30 more responses to the dataset, which resulted in a total of 46 responses (a 54 percent response rate). For the user side, a survey questionnaire was sent to faculty, doctoral, master ’s, and undergraduate students between 102 iNForMAtioN tecHNoloGY AND liBrAries | septeMBer 2011 The second group of factors that affects website design is supervisors’ knowledge about technology and sup- port for the utilization of technology (see figure 3). Web designers have a somewhat mixed perception about their supervisors’ technical knowledge. More specifically, 37 percent of respondents responded that their supervisors do not have good knowledge about technology; 23 percent responded that their supervisors were somewhat knowl- edgeable about technology; and 40 percent responded that their supervisors have good knowledge about technol- ogy; thus, web designers have mixed evaluations about supervisors’ technical knowledge. Web designers reported that their supervisors’ perceptions of the importance of technology and websites are higher than their techni- cal knowledge. Approximately 60 percent of designers responded that their supervisors emphasize the impor- tance of technology and websites, and the remaining respondents answered that their supervisors are somewhat aware of the importance or do not value it at all. Table 1. Instrument Construct Operationalization Source Institutional forces Following university guidelines regarding website creations Investigating other university websites Investigating commercial websites Attending conferences 11, 12, 15 Supervisor’s technical knowledge and support Supervisor’s knowledge about technology Supervisor’s evaluation of the importance of technology Supervisor’s evaluation of the importance of website utilization Availability of website tools Availability of budgeting Availability of technical training Availability of website creation training 17, 22 Input from secondary sources Consulting with experts Consulting with other divisions within the library Consulting with webmasters Consulting with website committee Consulting with focus group 10, 25–26 Input from users Conducting user survey Utilizing users’ inputs 10 Website success measures from web designer We meet users’ needs We provide better services via the website We satisfy users’ needs We provide quality services Our library is overall successful 1, 2 Website success measures from website users It lets me finish my project more quickly It helps improve my productivity It helps enhance the quality of my project The extent to which users integrate website library resources into users’ tasks* Frequency of users’ visits to university library website** 3, 41, 43 all items are measured with a likert scale: 1 not really; 2: somewhat; and 3: greatly. * measured by percentage **measured by frequency Figure 2. Institutional Forces FActors AFFectiNG uNiversitY liBrArY WeBsite DesiGN | kiM 103 percent of respondents reported that they consult with web experts; over 70 percent responded that they inte- grate input from other divisions; and around 70 percent consult with webmasters. The utilization of secondary information sources for website creation is very high except for focus groups. The most widely used technique in this category is expert consultations followed by con- sultations with other divisions within the library. Web designers also consider input from webmasters and web committees. Figure 6 shows the extent to which website design- ers apply input directly derived from web users. Around half of respondents reported that they obtain information from user surveys, and around 70 percent responded that they consider users’ input collected via comments, feed- back, and complaints. Figure 4 shows the extent to which supervisors sup- port web designers. Fifty-five percent of respondents reported that they have good web creation tools; 44 per- cent responded that they have enough budget for website creation, and almost a similar rate of respondents (39 per- cent) reported that they do not have adequate budgets for website creation. The last two questions concerning train- ing show somewhat different results from the findings of the first two questions. The majority of web designers do not get technology-related or website creation-related training. Less than one-third of respondents reported that they receive enough technology-related and web creation- related training. The findings of the use of secondary sources show in figure 5 that web designers actively leverage such information sources for web design. By category, over 80 Figure 3. Supervisor’s Knowledge about Technology Figure 4. Supervisor’s Support Figure 5. Input from Secondary Sources Figure 6. Input from Users 104 iNForMAtioN tecHNoloGY AND liBrAries | septeMBer 2011 majority of users rely on commercial web resources for their academic tasks. ■■ Discussion Based on the study’s findings, this discussion will first cover the most influential factors first followed by the least influential elements in designing a university library website. First, the most influential factors for website designers are expert opinions and consultations with other divisions within the library. These may be the most impor- tant factors because relying on experts allows designers to discover users’ needs while saving costs. Web designers also consider input from webmasters and web committees. Coercive and mimetic forces are also highly significant factors affecting web designers. The university library is a subset of the university, and thus, designers may need to align themselves with university policy. Also, designers can claim legitimacy by imitating other successful univer- sity websites, thereby securing necessary resources and support for website creation; however, web designers are much less likely to imitate commercial websites. This find- ing is consistent with existing reports that organizations imitate other successful organizations’ managerial prac- tices that are within the same industry category.38 The least influential website creation factors are super- visors’ knowledge, which in turn impacts low budget allocations, and web designers’ technical training. This find- ing is consistent with successful technology deployment literature that shows supervisors’ technical knowledge is highly correlated with budget allocations.39 The lack of training for web designers does not appear to be improved since the last study, which was conducted in 2001;40 library ■■ Website Success Website success is evaluated from two sides: designer opinion and user opinion. Overall, designers evaluated their websites to be highly successful. They believe that they meet users’ needs, provide better services via the web, satisfy users’ needs, and provide quality services. Later, their evaluation of their website is extremely posi- tive, as reported in figure 7. Figure 8 shows users’ perceptions of the usefulness of ULWR. Users generally agree that ULWR are useful for their academic projects. More specifically, 55 percent responded that they are able to finish their tasks quickly because of the resources; 65 percent reported that they could increase their productivity; and 67 percent responded that they enhanced project quality thanks to the resources. On the other hand, a significant portion of respondents (more than 30 percent) do not think or have no opinions that ULWR are useful for their academic tasks. Figure 9 investigates how often users visit university library websites. Approximately 30 percent reported that they never visited or rarely visited the university library website. Thirty-two percent made a visit to the website a couple of times a month, and approximately 40 percent visited the library website a couple of times a week or daily. Figure 10 examines the users’ utilization of ULWR versus commercial website resources. The responses from 315 users show that they utilize commercial websites more than ULWR. Specifically, 46 percent of respondents reported that they use less than 20 percent of ULWR and only 8 percent utilize ULWR more than 80 percent. In contrast, 14 percent utilize less than 20 percent of com- mercial website resources, and 22 percent utilize more than 80 percent of commercial website resources. The Figure 7. Website Success Evaluated by Design Figure 8. Users’ Perceptions of Website Usefulness FActors AFFectiNG uNiversitY liBrArY WeBsite DesiGN | kiM 105 From a utilitarian perspective, web designers primarily need to consider the ability of the website to meet users’ needs. Usefulness again needs to be evaluated by users. According to user assessments ULWR are somewhat satisfactory but not strong enough to rely heavily on for academic projects. It is an alarming fact that users use commercial website resources at a much higher rate than ULWR. This is somewhat disturbing given that web designers strive to provide good services to users, and libraries have invested massive resources into providing online services. This study has implications for academia and practi- tioners. For academia, there has been sparse research on web design studies from a designer standpoint. It may be because of difficulties in collecting data directly from website designers. From this line of research, this study enhances the understanding of what factors influence university web design. Although university websites may be deemed successful, information managers should discover why the majority of users turn to commercial websites for their academic projects. Without addressing this problem, the existence of library websites may be compromised. Although there is evidence that libraries consider user input, it may not accurately represent all user populations because only extremely satisfied or extremely dissatisfied users tend to provide feedback;43 consequently, a regular survey may facilitate the utiliza- tion of ULWR. Finally, supervisors’ technical knowledge is found to be low. This problem may be alleviated as time goes on because new generations are more aware of the importance of technology. In the meantime, web designers are encouraged to actively communicate with supervisors about the value of the utilization of technol- ogy and seek more financial support. This study’s data have some limitations. Although the web designers are usually self-taught rather than formally trained.41 One promising finding, though, is that despite the relatively low technical knowledge held by supervi- sors, the respondents tend to rank highly when it comes to their perceptions of the importance of technology. Compared with other institutional forces, normative force is relatively low. This kind of institutional force is higher at the early stage of technology adoption. In other words, the majority of universities have already launched their websites and have established rules and policies, so libraries are already past this stage. Also, input from user surveys is relatively low. This may be because it is very costly, and they have other sources to turn to such as other universities’ successful websites. Website success evaluations by web designers and users show discrepancies. Overall, web designers evalu- ate their websites to be highly successful, while user ratings offer a different picture. This incongruity is a red flag in terms of ULWR usage. The majority of users report that they turn to commercial websites more than ULWR, and one-third never or rarely visit the university website. The disparity of the success between web designers and users may be attributed to the sources of information that website designers rely on. More specifically, existing studies report that input from experts and website com- mittees is incongruent with what users really want, while feedback from focus groups can assist in understanding users’ needs.42 ■■ Conclusions This study investigates the factors that website design- ers consider when designing university library websites. Figure 9. Frequency of Visits to University Library Websites Figure 10. University Library vs. Commercial Website 106 iNForMAtioN tecHNoloGY AND liBrAries | septeMBer 2011 Seriously in Information Systems Research,” MIS Quarterly 29, no. 4 (2005): 591–605. 9. Scott, Institutions and Organizations; DiMaggio and Powell, “The Iron Cage Revisited”; H. Haverman, “Follow the Leader: Mimetic Isomorphism and Entry into New Markets,” Administrative Science Quarterly 38, no. 4 (1993): 593–627. 10. Scott, Institutions and Organizations. 11. K. Lee, Dinesh Mirchandani, and Xinde Zhang, “An Investigation on Institutionalization of Web Sites of Firms,” The DATA BASE for Advances in Information Systems 41, no. 2 (2010): 70–88. 12. Lee, Mirchandani, and Zhang, “An Investigation on Institutionalization of Web Sites of Firms.” 13. R. Raward, “Academic Library Website Design Principles: Development of a Checklist,” Australian Academic & Research Libraries 32, no. 2 (2001): 123–36. 14. Y-M. Kim, An Investigation of the Effects of IT Investment on Firm Performance: The Role of Complementarity (Saarbrucken, Germany: VDM Verlag, 2008); P. Weill, “The Relationship between Investment in Information Technology and Firm Performance: A Study of the Valve Manufacturing Sector,” Information Systems Research 3, no. 4 (1992): 307–33. 15. A. Lederer and V. Sethi, “The Implementation of Strategic Information Systems Planning Methodologies,” MIS Quarterly (1988): 445–461; J. Thong, C. Yap, and K. Raman, “Top Management Support, External Expertise and Information Systems Implementation in Small Business,” Information Systems Research 7, no. 2 (1996): 248–67; M. Earl, “Experiences in Strategic Information Systems Planning,” MIS Quarterly (1993): 1–24; A. Boynton and R. Zmud, “Information Technology Planning in the 1990’s: Directions for Practice and Research,” MIS Quarterly 11, no. 1 (1987): 59–72. 16. S. Jarvenpaa and B. Ives, “Information Technology and Corporate Strategy: A View from the Top,” Information Systems Research 1, no. 4 (1990): 351–76. 17. Chen, Germain, and Yang, “An Exploration into the Practices of Library Web Usability in ARL Academic Libraries.” 18. Ibid. 19. J. Veldof and S. Nackerud, “Do You Have the Right Stuff? Seven Areas of Expertise for Successful Web Site Design in Libraries,” Internet Reference Services Quarterly 6, no. 1 (2001): 20. 20. Chen, Germain, Yang, “An Exploration into the Practices of Library Web Usability in ARL Academic Libraries”; R. Raward, “Academic Library Website Design Principles: Development of a Checklist,” Australian Academic & Research Libraries 32, no. 2 (2001): 123–36; J. Bobay et al., “Working With Consultants to Test Usability: The Indiana University Bloomington Experience,” in Usability Assessment of Library-Related Web Sites: Methods and Case Studies, ed. N. Campbell (Chicago: ALA, 2002): 60–76; H. King and C. Jannik, “Redesigning for Usability: Information Architecture and Usability Testing for Georgia Tech Library’s Website,” OCLC Systems & Services 21, no. 3 (2005): 235–43. 21. J. H. Spyridakis, J. B. Barrick, and E. Cuddihy, “Internet- Based Research: Providing a Foundation for Web-Design Guidelines,” IEEE Transactions on Professional Communication 48, no. 3 (2005): 242–60; T. A. Powell, Web Design: The Complete Reference (Berkeley, Calif.: Osborne/McGraw-Hill, 2002). 22. Powell, Web Design. 23. R. Tolliver, D. Carter, and S. Chapman, “Website Redesign and Testing with a Usability Consultant: Lessons Learned,” OCLC Systems & Services 21, no. 3 (2005): 156–66; L. VandeCreek, author tried to increase responses using various means, the number of responses does not allow one to use a sophisticated analytical technique such as regression. This study includes academic libraries with a web designer within the library; as a consequence, libraries without a web designer are not included. It is recommended to collect data from both groups and compare those with a designer (resource rich) and without a designer (resource poor), and discover underlying patterns of the factors impacting website designs and offer implications for aca- demia and managers. References 1. D. V. Parboteeah, J. S. Valacich and J. D. Wells, “The Influence of Website Characteristics on a Consumer’s Urge to Buy Impulsively,” Information Systems Research 20, no. 1 (2009): 60–78; M-H. Huang, “Designing Web Site Attributes to Induce Experiential Encounters,” Computers in Human Behavior 19 (2003): 425–42. 2. Y-M. Kim, “The Adoption of University Library Web Site Resources: A Multigroup Analysis,” Journal of the American Society for Information Science & Technology 61, no. 5 (2010): 978–93; O. Nov and C. Ye, “Users’ Personality and Perceived Ease of Use of Digital Libraries: The Case for Resistance to Change,” Journal of the American Society for Information Science & Technology 59 (2008): 845–51; N. Park et al., “User Acceptance of A Digital Library System in Developing Countries: An Application of the Technology Acceptance Model” International Journal of Information Management 29, no. 3 (2009): 196–209. 3. W. Hong et al., “Determinants of User Acceptance of Digital Libraries: An Empirical Examination of Individual Differences and System Characteristics,” Journal of Management Information Systems 18, no. 3 (2001–2): 97–124. 4. Parboteeah, Valacich and Wells, “The Influence of Website Characteristics; J. Palmer, “Web Site Usability, Design, and Performance Metrics,” Information Systems Research 13, no. 2 (2002): 151–67. 5. C. Burton, “Library Web site User Testing,” Collect & Undergraduate Libraries 9, (2002): 10; S. Ryan, “Library Web Site Administration: A Strategic Planning Model for the Smaller Academic Library,” Journal of Academic Librarianship 29, no. 4 (2003): 207–18; Y-H Chen, C.A. Germain., and H. Yang, “An Exploration into the Practices of Library Web Usability in ARL Academic Libraries,” Journal of the American Society for Information Science and Technology 60, no. 5 (2009): 953–68. 6. M-H Huang, “Designing Web Site Attributes to Induce Experiential Encounters,” Computers in Human Behavior 19 (2003): 425–42. 7. W. R. Scott, Institutions and Organizations (Thousand Oaks, Calif.: Sage Publications, Inc, 1995); P. DiMaggio and W. Powell, “The Iron Cage Revisited: Institutional Isomorphism and Collective Rationality in Organizational Fields,” American Sociological Review 48 (1983): 147–60. 8. W. R. Scott, Institutions and Organizations; H. Haverman, “Follow the Leader: Mimetic Isomorphism and Entry into New Markets,” Administrative Science Quarterly 38, no. 4 (1993): 593–627; M. W. Chiasson and E. Davidson,” Taking Industry FActors AFFectiNG uNiversitY liBrArY WeBsite DesiGN | kiM 107 “Usability Testing for Web Redesign: A UCLA Case Study,” OCLC Systems & Services 21, no. 3 (2005): 226–34; J. Ward, “Web Site Redesign: The University of Washington Libraries’ Experience,” OCLC Systems & Services 22, no. 3 (2006): 207–16. 32. Chen, Germain, and Yang, “An Exploration into the Practices of Library Web Usability in ARL Academic Libraries.” 33. Ibid. 34. Kim, “The Adoption of University Library Web Site Resources.” 35. Ibid. 36. Ibid. 37. Y-M. Kim, “Validation of Psychometric Research Instruments: The Case of Information Science,” Journal of the American Society for Information Science & Technology 60, no. 6 (2009): 1178–91. 38. H. Haverman, “Follow the Leader: Mimetic Isomorphism and Entry into New Markets,” Administrative Science Quarterly 38, no. 4 (1993): 593–627. 39. T. Teo and J. Ang, “An Examination of Major IS Planning Problems,” Information Journal of Information Management 21 (2001): 457–70. 40. R. Raward, “Academic Library Website Design Principles: Development of a Checklist,” Australian Academic & Research Libraries 32, no. 2 (2001): 123–36. 41. Ibid. 42. Chen, Germain, and Yang, “An Exploration into the Practices of Library Web Usability in ARL Academic Libraries”; Powell, Web Design; B. Bailey, “Heuristic Evaluations vs. Usability Testing,” UI Design Update Newsletter (2001), http:// www.humanfactors.com/downloads/jan01.asp (accessed June 15, 2011). 43. T. Hennig-Thurau et al., “Electronic Word-of-Mouth Via Consumer-Opinion Platforms: What Motivates Consumers to Articulate Themselves on the Internet?” Journal of Interactive Marketing 18, no. 1 (2004): 38–52. “Usability Analysis of Northern Illinois University Libraries’ Website: A Case Study,” OCLC Systems & Services 21, no. 3 (2005): 181–92. 24. Spyridakis, Barrick, and Cuddihy, “Internet-Based Research.” 25. B. Bailey, “Heuristic Evaluations vs. Usability Testing,” UI Design Update Newsletter (2001), http://www.humanfactors .com/downloads/jan01.asp (accessed June 10, 2011). 26. Powell, Web Design. 27. Chen, Germain, and Yang, “An Exploration into the Practices of Library Web Usability in ARL Academic Libraries.” 28. K.A. Saeed, Y. Hwang, and V. Grover, “Investigating the Impact of Web Site Value and Advertising on Firm Performance in Electronic Commerce,” International Journal of Electronic Commerce 7, no. 2 (2003): 119–41. 29. L. Manzari and J. Trinidad-Christensen, “User-Centered Design of a Web Site for Library and Information Science Students: Heuristic Evaluation and Usability Testing,” Information Technology & Libraries 25, no. 3 (2006): 163–69. 30. E. Abels, M. White, and K. Hahn, “Identifying User-Based Criteria for Web Pages,” Internet Research 7, no. 4 (1997): 252–56. 31. L. VandeCreek, “Usability Analysis of Northern Illinois University Libraries’ Website: A Case Study,” OCLC Systems & Services 21, no. 3 (2005): 181–92; M. Ascher, H. Lougee-Heimer, and D. Cunningham, “Approaching Usability: A Study of an Academic Health Sciences Library Web Site,” Medical Reference Services Quarterly 26, no. 2 (2007): 37–53; B. Battleson, A. Booth and J. Weintrop, “Usability Testing of an Academic Library Web Site: A Case Study,” Journal of Academic Librarianship 27, no. 3 (2001): 188– 98; G. H. Crowley et al., “User Perceptions of the Library’s Web Pages: A Focus Group Study at Texas A&M University,” Journal of Academic Librarianship 28, no. 4 (2002): 205–10; B. Thomsett-Scott and F. May, “How May We Help You? Online Education Faculty Tell Us What They Need from Libraries and Librarians,” Journal of Library Administration 49, no. 1/2 (2009): 111–35; D. Turnbow et al., 1769 ---- 108 iNForMAtioN tecHNoloGY AND liBrAries | septeMBer 2011 Nancy M. Foasberg Adoption of E-Book Readers among College Students: A Survey understand whether and how students are using e-book readers to respond appropriately. As new media formats emerge, libraries must avoid both extremes: uncritical, hype-driven adoption of new formats and irrational attachment to the status quo. ■■ Research Context Recently introduced e-reader brands have attracted so much attention that it is sometimes difficult to remember that those currently on the market are not the first genera- tion of such devices. The first generation was introduced, to little fanfare, in the 1990s. Devices such as the SoftBook and the Rocket E-Book reader are well documented in the literature, but were unsuccessful in the market.1 The most recent wave of e-readers began with the Sony Reader in 2006 and Amazon’s Kindle in 2007, and thus far is enjoy- ing more success. Barnes and Noble and Borders have entered the market with the Nook and the Kobo, respec- tively, and Apple has introduced the iPad, a multifunction device that works well as an e-reader. Amazon claims that e-book sales for the Kindle have outstripped their hardcover book sales.2 These numbers may reflect price differences, enthusiasm on the part of early adopters, marketing efforts on the parts of these par- ticular companies, or a lack of other options for e-reader users because the devices are designed to be compatible primarily with the offerings of the companies who sell them. Nevertheless, they certainly indicate a rise in the con- sumption of e-books by the public, as the dramatic increase in wholesale e-book sales bears out.3 In the meantime, sales of the devices increased nearly 80 percent in 2010.4 With this flurry of activity have come predictions that e-readers will replace print eventually, perhaps even within the next few years.5 Books have been published with such bold titles as Print is Dead.6 However, despite the excitement, e-readers are still a niche market. According to the 2010 Pew Internet and American Life survey, 5 percent of Americans own e-book readers. Those who do skew heavily to the wealthy and well-educated, with 12 percent having an annual household income of $75,000 or more and 9 percent of college graduates own- ing an electronic book reader. This suggests that e-book readers are still a luxury item to many.7 To academic librarians, it is especially important to know whether e-readers are being adopted by college students and whether they can be adapted for academic use. E-readers’ virtues, including their light weight, their ability to hold many books at the same time, and the speed with which materials can be delivered, could make them very attractive to students. However, they have many limitations for academic work. Most do not provide the ability to copy and paste into another document, have To learn whether e-book readers have become widely pop- ular among college students, this study surveys students at one large, urban, four-year public college. The survey asked whether the students owned e-book readers and if so, how often they used them and for what purposes. Thus far, uptake is slow; a very small proportion of students use e-readers. These students use them primarily for leisure reading and continue to rely on print for much of their reading. Students reported that price is the greatest bar- rier to e-reader adoption and had little interest in borrow- ing e-reader compatible e-books from the library. P ortable e-book readers, including the Amazon Kindle, Barnes and Noble Nook, and the Sony Reader, free e-books from the constraints of the computer screen. Although such devices have existed for a long time, only recently have they achieved some degree of popularity. As these devices become more common- place, they could signal important changes for libraries, which currently purchase and loan books according to the rights and affordances associated with print books. However, these changes will only come about if e-book readers become dominant. For academic libraries, the population of interest is college students. Their use of reading formats drives col- lection development practices, and any need to adjust to e-readers depends on whether students adopt them. Thus, it is important to research the present state of students’ interest in e-readers. Do they own e-readers? Do they wish to purchase one? If they do own them, do they use them often and regard them suitable for academic work? The present study surveys students at Queens College, part of the City University of New York, to gather infor- mation about their attitudes toward and ownership of e-books and e-book readers. Because only Queens College students were surveyed, it is not possible to draw conclu- sions about college students in general. However, the data do provide a snapshot of a diverse student body in a large, urban, four-year public college setting. The goal of the survey was to learn whether students own and use e-book readers, and if so, how they use them. In the midst of enthusiasm for the format by publishers, librarians and early adopters, it is important to consult the students themselves, whose preferences and reading habits are at stake. It is also vital for academic libraries to Nancy M. Foasberg (nfoasberg@qc.cuny.edu) is humanities li- brarian, Queens College, City university of new york, Flushing, new york. ADoptioN oF e-Book reADers AMoNG colleGe stuDeNts: A surveY | FoAsBerG 109 Foundation survey, Internet and American Life, found that e-readers were luxury items owned by the well educated and well off. In the survey, 5 percent of respondents reported owning an e-reader.12 In the ECAR Study of Undergraduate Students and Information Technology, 3.1 percent of undergraduate college students reported own- ing an e-book reader, suggesting that college students are adopting the devices at a slower rate than the general population.13 Commercial market research companies, including Harris Interactive and the Book Industry Study Group, also have collected data on e-book adoption. The Harris Interactive poll found that 8 percent of their respondents owned e-readers, and that those who did claimed that they read more since acquiring it. However, as a weighted online poll with no available measure of sampling error, these results should be considered with caution.14 The Book Industry Study Group survey, although it was sponsored by several publishers and e-reader manufacturers, appears to use a more robust method. This survey, Consumer Attitudes toward E-Book Reading, was conducted in three parts in 2009 and 2010. Kelly Gallagher, who was responsible for the group that conducted the study, remarks that “we are still in very early days on e-books in all aspects—technology and adoption.” Although the size of the market has increased dramatically, the survey found that almost half of all e-readers are acquired as a gift and that half of all e-books “purchased” are actually free. However, among those who used e-books, about half said they mostly or exclu- sively purchased e-books rather than print. The e-books purchased are mostly fiction (75 percent); textbooks com- prised only 11 percent of e-book purchases.15 Much of the literature on e-book readers consists of user studies, which provide useful information about how readers might interact with the devices once they have them in hand but provide no information about whether students are likely to use them of their own volition. However, these studies are of interest because they hint at reasons that students may or may not find e-readers useful, important information for predicting the future of e-books. User studies have covered small devices, such as PDAs (personal data assistants);16 first-generation e-read- ers, such as the Rocket eBook;17 and more recent e-book readers.18 The results of many recent e-reader user stud- ies have been very similar to studies on the usability of the first generation of e-book readers: the devices offer advantages in portability and convenience but lack good note-taking features and provide little support for nonlin- ear navigation. Amazon sponsored large-scale research on academic uses of e-book readers at universities, such as Princeton, Case Western Reserve University, and the University of Virginia,19 while other universities, such as Northwest Missouri State University,20 carried out their own projects limited note-taking capabilities, and rely on navigation strategies that are most effective for linear reading. The format also presents many difficulties regarding library lending. Many publishers rely on various forms of DRM (digital rights management) software to pro- tect copyrighted materials. This software often prevents e-books from being compatible with more than one type of e-book reader. Indeed, because e-book collections in academic libraries predate the emergence of e-book read- ers, many libraries now own or subscribe to large e-book collections that are not compatible with the majority of these devices. Furthermore, publishers and manufactur- ers have been hesitant to establish lending models for their books. Amazon recently announced that they would allow users to lend a book once for a period of four- teen days, if the publisher gave permission.8 This very cautious and limited approach speaks volumes about publishers’ fears regarding user sharing of e-books. Several libraries have developed programs for lending the devices,9 but there is no real model for lending e-books to users who already own e-readers. A service called Overdrive also provides downloadable collections, primar- ily of popular fiction, that can be accessed in this manner. However, the collections are small and are not compatible with all devices, including the most popular, the Kindle. In the United Kingdom, the Publisher’s Association has pro- vided guidelines under which libraries can lend e-books, which include a requirement that the user physically visit the library to download the e-book.10 Clearly, we do not currently have anything resembling a true library lending model for e-reader compatible e-books, especially not one that takes advantage of the format’s strengths. Despite the challenges, it is clear that if e-book read- ers are enthusiastically adopted by students, libraries will need to find a way to offer materials compatible with them. As Buczynski puts it, “Libraries need to be in play at this critical juncture lest they be left out or sidelined in the emerging e-book marketplace.”11 However, because the costs of participating are likely to be substantial, it is very important to discover whether students are indeed adopting the hardware. Few studies have focused on spontaneous student adoption of the devices, although several mention that when students were introduced to e-readers, they appeared to be unfamiliar with the devices and regard them as a novelty. However, e-readers have become more prevalent since many of these studies were conducted. Thus this study surveys students to find their attitudes toward e-book readers. ■■ Literature Review Only a few studies have attempted to quantify the popu- larity of e-readers. As mentioned above, the 2010 Pew 110 iNForMAtioN tecHNoloGY AND liBrAries | septeMBer 2011 their first encounter with an e-book reader.”34 While this is mere anecdote, it, along with the survey results noted above, raises the question of how popular the device really is on college campuses. Finally, a third group of studies attempts to predict the future of e-readers and e-books. Even before the introduc- tion of e-readers, some saw e-books as the likely future of academic libraries.35 More recently, one report discusses the likelihood of and barriers to e-book adoption. This article concludes that “barriers to e-book adoption still exist, but signs point to this changing within the next two to five years. That, of course, has been said for most of the past 15 to 20 years.”36 Still, Nelson points out that tech- nologies can become ubiquitous very quickly, using the iPod as an example, and warns libraries against falling behind.37 Yet another report puts e-books in the two-to- three-year adoption range and claims that e-books “have reached mainstream adoption in the consumer sector” and that the “obstacles have . . . started to fall away.”38 ■■ Method The e-reader survey was conducted as part of Queens College’s Student Technology Survey, which also covered several other aspects of students’ interactions with tech- nology. The author is grateful to the Center for Teaching and Learning (in particular, Eva Fernández and Michelle Fraboni) for graciously agreeing to include questions about e-readers in the survey and providing some assis- tance in managing the data. This survey, run through Queens College’s Center for Teaching and Learning, was hosted by SurveyMonkey and was distributed to students through their official e-mail accounts. Participants were offered a chance to win an iPod Touch as an incentive, but students who did not participate also were offered an opportunity to enter the iPod drawing. The survey was available between April and June 2010. All personally identifying information was removed from the responses to protect student privacy. Rather than surveying the entire population about e-readers and e-books, the survey limited most of the ques- tions to students with some experience with the format. Of the students who responded to the survey, only 63 (3.7 percent) used e-readers. However, 338 more students identified themselves as users of e-books but did not use e-readers. All other students skipped past the e-book ques- tions and were directed to the next part of the survey. The questions about e-readers fell into several catego- ries. The students were asked about their ownership of devices and which devices they planned to purchase in the future. While they might of course change their minds about future purchases, this is a useful way of measuring whether students regard the devices as desirable. with other e-readers. Other types of programs, most notably Texas A&M’s Kindle lending program,21 and many academic focus groups have also contributed to our knowledge of how students use e-readers. Users in nearly every study have praised the por- tability of these devices. This can be very important to students; users in one study noted that the portability of reading devices allowed them to “reclaim an otherwise difficult to use brief period,”22 and in another, students were able to multitask, doing household chores and study- ing at the same time.23 Adjustable text size and the ability to search for words in the text have also been popular among students, as has the novelty value of these devices. Environmental concerns surrounding heavy printing have also been cited as an advantage of e-readers.24 However, the limitations of these devices, some of which are severe in an academic setting, also have been noted. The comments of students at Gettysburg College are typical: they liked the e-readers for leisure reading, but found them awkward for classroom use.25 Lack of note-taking support was an important drawback for many students. Waycott and Kukulska-Hulme noted that students were much less likely to take notes while reading with a PDA than they were with print.26 A study at Princeton found that the same was true of students using the Kindle,27 and students at Northwest Missouri State University said they read less with an e-textbook than with a traditional one, although they did not report changes in their study habits.28 Despite the ability of many devices to search the text of a book, users in many studies also disliked the inability to skim and browse through the materials as they would with print.29 Interestingly, this complaint appeared in studies of all types of e-readers, even those with larger screens. Students, in a recent study with the Sony Reader and iPod Touch, noted that these devices did a poor job of supporting PDFs, a standard format for online course materials. The documents were displayed at a very small size and the words were some- times jumbled.30 Whether these drawbacks will prevent students from adopting e-book readers remained to be seen. Library and information science (LIS) students in a small, week-long study reiterated the problems found in the above studies, but nevertheless found themselves using e-readers extensively and reading more books and newspapers than they had before.31 Several of these user studies hint that e-readers are not currently commonplace as far as users often seemed to regard the devices with surprise and curiosity. In some studies, while users were initially attracted to the nov- elty value of the devices, their enthusiasm dimmed after using the devices and discovering technical problems and limitations.32 One author describes e-readers as “atten- tion getters, but not attention keepers.”33 A study in early 2009, in which students were provided with e-readers, notes that “for the majority of the participants, this was ADoptioN oF e-Book reADers AMoNG colleGe stuDeNts: A surveY | FoAsBerG 111 attitudes of students in general, similar surveys should be taken across many campuses in several demographically different areas. Researching e-readers is inherently difficult because the landscape is changing very quickly. Since the survey began, Apple’s iPad became available, prices for dedi- cated e-readers have dropped dramatically, publishers have become more willing to offer content electronically, and Amazon has released a new version of the Kindle and has begun taking out television advertisements for it. Without a follow-up survey, it is impossible to know whether these events have changed student attitudes. ■■ Results and Discussion e-reader Adoption Of the 1,705 students who responded to the survey, 401 say that they read e-books (table 1). Most students (338) who use e-books read them on a device other than an e-reader, but 63 say they use a dedicated reader for e-books (table 2). However, when students were asked about the technological devices that they own, only 56 selected e-book readers. Perhaps the seven students who use e-book readers but don’t report owning one are shar- ing or borrowing them, or perhaps they are using a device other than the ones enumerated in the question. Aside from table 3, which breaks down the e-reader brands that students own, the following data will be based upon the larger sample of 63 students. The students who read e-books on another device were asked whether they planned to buy an e-reader in the Respondents were also asked about their use of e-books. This category includes questions about what kind of reading students use e-books for, how much of their reading uses e-books, and where they are finding their e-books. It was important to learn whether students considered e-book readers appropriate for academic work, and whether they considered the library a potential source for e-books. Finally, to assess their attitudes toward e-book read- ers, students were asked to identify the main benefits and drawbacks of e-book readers. Several possibilities were listed, and students were asked to respond to them along a Likert scale. A field was also included in which students could fill in their own answers. After 643 incomplete surveys were eliminated, there were 1,705 responses from Queens College students. This is about 8 percent of the Queens College student body. E-mail surveys always run the risk of response bias, espe- cially when they concern technology. However, students who responded were representative of Queens College in terms of sex, age, class standing, major, and other demo- graphic characteristics. The results were compared using a chi-squared test with the level of significance set at 0.05. In some cases, there were too few respondents to test significance prop- erly and comparisons could not be made. Please see appendix for the e-reader questions included in the survey instruments. They will be referred to in more depth throughout this article. ■■ Survey Limitations The survey results may not be generalizable because of the survey’s small sample size. In particular, the 63 respondents who use e-book readers may not be rep- resentative of student e-reader owners in general. The survey also relies on self-reporting; no direct observation of student behavior took place. Students who do use e-readers may be more com- fortable with technology and more likely to respond to e-mail surveys. However, the sample is representative for Queens College students, and the percentage of students who own e-book readers is close to the national average at the time the survey was taken (5 percent).39 Since only Queens College students were surveyed, the results reflect the behavior and attitudes of students at a single large, four-year public college in New York City. The results do not necessarily reflect the experience of stu- dents at other types of institutions or in other parts of the United States. The other parts of the technology survey show that QC students are heavy users of technology, so they may adopt new technologies such as e-book read- ers more quickly than other students. To understand the Table 1. E-book use among respondents E-book use Number of respondents Read e-books 401 (23.5%) Do not read e-books 1262 (74.0%) Don’t know what an e-book is 42 (2.5%) Total 1705 (100%) Table 2. Devices used to read e-books among e-book readers Device used Number of respondents (% of e-book users) Dedicated e-reader 63 (15.7) Other device 338 (84.3) Total 401 (100) 112 iNForMAtioN tecHNoloGY AND liBrAries | septeMBer 2011 desire to buy an iPad, many more than reported owning an e-reader. Curiously, the e-reader owners reported that they planned to buy an iPad at the same rate as the other students. It is not clear whether these students plan to replace their e-reader or use multiple devices. In either case, while the arrival of the iPad and other tablet devices seems likely to increase the number of students carrying potential e-reading devices, some of its adopters will probably be students who already own e-readers. Not surprisingly, students who used e-readers tended to be early adopters of technology in general (table 4).40 Compared to the general pool of respondents, they were much more likely to like or love new technologies and much less likely to describe themselves as neutral or skep- tical of them. In a chi-squared test, these differences were significant at a level of 0.001. Although e-reading devices have existed since the 1990s, the newest, most popular generation of them is so recent that people who own one now are early adopters by definition. Compared to the rest of the survey respon- dents, both e-reader owners and other e-book users were much more likely to identify as early adopters of tech- nology in general. Given this trend, the adoption rate of e-readers among students may slow once the early adopt- ers are satisfied. uses of e-Books Students who used an e-book reader were asked how much of their reading they did with it and whether they used it for class, recreational, or work-related reading (table 5). Students without e-readers were asked the same questions about their use of e-books. While it is likely that students who use e-book readers continue to access e-books in other ways, this distinction was made because this survey was designed to study their use of e-readers specifically. Because e-reader users were not asked about their use of e-books in other formats, it is not clear whether their habits with more traditional e-book formats differ from those of other students. Fewer than half the e-reader users in the study used the device for two-thirds of their reading or more. In the table below, students who did all their reading and those who did about two-thirds of their reading with e-books are combined, because so few claimed to read e-books exclusively. Three students with e-readers and future. The majority had no immediate plans to buy one, with those who said they did not plan to acquire one and those who did not know combining for 62.43 percent. 23.67 percent planned to buy one either within the next year or before leaving college, and the remaining 13.91 percent planned to acquire an e-reader after graduating. Despite ergonomic disadvantages, many more stu- dents are using e-books on some other device, such as a computer or a cell phone, than are loading them on e-read- ers. Furthermore, a large percentage of these students do not plan to buy an e-book reader. The factors preventing these students from buying e-readers will be covered in more detail in the “Attitudes toward E-readers” section below. However, it seems likely that a major factor is price, identified by both e-reader owners and non-owners as the greatest disadvantage of these devices. When asked to list the devices they owned, 56 stu- dents named some type of e-book reader. Among these, the Amazon Kindle was the most popular (table 3). As expected, e-readers have yet to be adopted by most students at Queens College. At the time of this survey, less than 4 percent of respondents owned one. While the rest of the survey shows that these students are highly wired—82 percent own a laptop less than five years old and 93 per- cent have high-speed Internet access at home—this has not translated to a high rate of e-reader ownership. Although Apple’s iPad, a tablet device that functions as an e-reader among other things, was not yet released at the time of the survey, it may see wider adoption than the dedicated devices. When the survey was originally distributed, this device had been announced but not yet released. Overall, 8 percent of students expressed a Table 3. E-reader brands owned by students Devices owned Number of students (% of e-reader owners) Amazon Kindle 26 (46.4%) Barnes & Noble Nook 14 (25.0%) Sony Reader 10 (17.9%) Other 6 (10.7%) Total 56 (100.0%) Table 4. E-reader use and self-identification as an early adopter E-reader owners All respondents Love or like new technologies 40 (63.5%) 698 (40.9%) Neutral or skeptical about new technologies 23 (36.5%) 1007 (59.1%) Total 63 (100.0%) 1705 (100.0%) ADoptioN oF e-Book reADers AMoNG colleGe stuDeNts: A surveY | FoAsBerG 113 pleasure. This finding is much more surprising, given the very slow adoption of e-books before the introduction of e-readers, and the ergonomic problems with reading from vertical screens. However, students who used e-books without e-read- ers were much more likely to read e-books for classes. This difference may be due to the sorts of material that are available in each format. Although textbook publish- ers have shown interest in creating e-textbooks for use on devices such as the iPad, there is little selection avail- able for e-readers as yet. When working without e-book readers, however, there is a wide variety of academic materials available in electronic formats, and many text- books include an online component. Academic libraries, including the one at Queens College, subscribe to large e-book collections of academic materials. For the most part, these collections cannot be used on an e-reader, but they are available through the library’s website to stu- dents with an Internet connection and a browser. It is also possible that the e-readers are not well suited to class readings. Some past studies, cited above, have found that e-readers do not accommodate functions such as note taking, skimming, and non-sequential navigation very well. Since these are important functions for academic work, and both print books and “traditional” e-books are superior in these respects, such limitations may prevent students from using e-readers for classes. The user behaviors reported here do not appear to herald the end of print; in fact, very few students with e-readers use them for all their reading, and over half of the students with e-readers use them for one-third of their reading or less. It is not clear whether students intention- ally choose to read some materials in print and others with nine without said they used e-books for all their reading. Very few students without e-book readers used e-books for a large proportion of their reading; indeed, 54 per- cent said they used e-books for less than a third of their reading. Differences between the groups were tested for significance using a chi-squared test. Note that percent- ages may not add up to 100 percent, due to rounding. Since many studies of e-book readers have found them more suitable for recreational reading than for academic work, users of e-readers were asked to identify the kinds of readings for which they used e-readers and asked to identify all options that they found applicable (table 6). Since students were allowed to choose more than one option, the totals are greater than the number of participants. Indeed, e-readers were much more likely to be used for recreational reading and other types of e-books far more likely to be used for class. For other types of reading, differences between these groups were not significant. Since e-readers have been marketed largely for the popular fiction market and are designed to accommodate casual linear reading, it is not surprising that students who use them are most likely to report using them for leisure reading. In this area they seem to enjoy a strong advan- tage over more traditional e-book formats read on another device such as a computer or a cell phone. However, the study did not control for the amount of reading that students do. Students who use e-readers may be heavier leisure readers in general. Further research could clarify whether heavier use of leisure e-reading is due to the devices or the tendencies of those who own them. A large proportion of the students who read e-books without e-readers (65.7 percent) do read e-books for Table 5. Amount of reading done with e-books Amount of reading E-reader users Other users x2 Significance level Significant? About two-thirds or all 27 (42.8%) 65 (19.2%) 16.8 0.001 Yes About a third 14 (22.2%) 90 (26.6%) 0.1 0.5 No Less than a third 22 (34.9%) 183 (54.1%) 7.9 0.01 Yes Total 63 (99.9%) 338 (99.9%) —- —- —- Table 6. Types of reading done with e-books Type of Reading E-reader users Other users x2 Significance level Significant? Recreational 54 (85.7%) 222 (65.7%) 9.9 0.01 Yes Class 24 (38.1%) 217 (64.2%) 14.7 0.001 Yes Work 11 (17.8%) 88 (26.0%) 2.1 0.5 No Other 3 (4.8%) 8 (2.4%) 1.1 0.5 No 114 iNForMAtioN tecHNoloGY AND liBrAries | septeMBer 2011 from the manufacturer of the e-reader that supports them, this result is not surprising. It suggests that these booksellers have a high degree of power in the mar- ket, a potential effect of e-readers that deserves further attention. However, official e-book sellers of the sort men- tioned above are not the only option for students seeking digital reading material, since both independent online bookstores and open access repositories such as Project Gutenberg were used by students. Libraries, both public and academic, reached tradi- tional e-book users much more successfully than e-reader users. Although many libraries have large e-book col- lections, there is currently little material for e-readers. Despite the existence of a service called Overdrive, which provides e-books compatible with some e-readers (excluding the Kindle), circulating e-books is challenging, due to a host of technical and legal problems. Given this environment, it is not surprising that students without e-readers were more likely to use their public library as a source of e-books than were e-reader users. The Queens College campus library, which offers many electronic col- lections but none that are e-reader-friendly, fared worse; only one student claimed to have used it as a source of e-reader compatible materials. In the free comment field, students mentioned other sources of e-books such as the Apple iTunes store, the campus bookstore and lulu.com, an online bookseller that also provides self-publishing. Several also admitted, unprompted, that they download books illegally. Attitudes toward e-readers In the interests of learning what caused students to adopt e-readers or not, the survey used a series of Likert-style questions to ask what the students considered the benefits and drawbacks of such devices. Strikingly, e-reader own- ers and non-owners agreed about both the advantages and disadvantages; owning an e-reader did not seem to change most of the things that students value and dislike about it. Figure 1 shows the number of students in each group who their e-reader, or whether they are limited by the materials available for the e-reader. The circumstances under which students switch between electronic and print would be an excellent area for future research; is it a matter of what is practically available, or is the e-reader better suited for some texts and reading circumstances than others? sources of e-Books The major producers of e-readers are either primarily booksellers, such as Amazon and Barnes & Noble, or are hardware manufacturers who also provide a store where users can purchase e-books, such as Sony (or, after the iPad launch, Apple). In both models, the manufacturers hope to sell e-books to those who have purchased their devices. They provide more streamlined ways of load- ing these e-books on their devices, and in some cases use DRM to prevent their e-books from being used on com- peting devices, as well as to inhibit piracy. Table 7 shows the sources from which readers with and without e-readers obtain e-books. E-reader users were much more likely than non-users to get their e-books from the official store associated it—that is, the store providing the e-reader, such as Amazon, Barnes and Noble, or Sony’s ReaderStore. There was no significant difference between the two groups’ use of open access or independent sources, but the students who did not use e-readers were much more likely to use e-books from their public library, and while 19.8 percent of students without e-readers used the campus library as a source of e-books, only one student with an e-reader did. Since respondents were allowed to choose more than one answer, the results do not sum up to 100 percent. By a wide margin, students who own e-readers are most likely to purchase their e-reading materials from the “official” store; 86 percent cited the official store as a source of e-books. Students without e-readers also use these stores more than any other source of e-books, but they are nevertheless far less likely to use them than e-reader users. Because it is much easier to buy e-books Table 7. Sources of e-books How do you get e-books? E-reader users Other users x2 Significance level Significant? Store specific to popular e-readers 54 (85.7%) 154 (45.6%) 34.2 0.001 Yes Open access repositories 16 (25.4%) 120 (35.5%) 2.4 0.5 No Public library 10 (15.9%) 99 (29.3%) 4.8 0.05 Yes Independent online retailer 9 (14.3%) 71 (21.0%) 1.5 0.5 No Other 4 (6.3%) 39 (11.5%) N/A N/A N/A Campus library 1 (1.6%) 67 (19.8%) N/A N/A N/A ADoptioN oF e-Book reADers AMoNG colleGe stuDeNts: A surveY | FoAsBerG 115 students with e-readers were more likely than others to rate portability and convenience as “very valuable.” As the studies cited above suggest, being able to easily download books, carry them away from the computer, and store many books on a single device are very appeal- ing to students. Only the final two features, text-to-speech and special features such as dictionaries, attracted enough “not very valuable” or “not valuable” responses for an inter-group comparison. Both groups considered text-to-speech the least valuable feature, but students who did not own e-readers were significantly more likely to consider it a valuable or very valuable feature, perhaps indicating that the users to whom this is important have avoided the devices, which currently support it in a very limited fash- ion. Perhaps, too, students with e-readers rated this feature less useful because of its current limitations. In either case, rated each feature either valuable or very valuable. If the positive features of the devices are ranked based on the percentage of respondents who considered them very valuable, the order is almost the same for students with and without e-readers. For students with e-readers, the features rank as follows: portability, convenience, storage, special functions, and text-to-speech. For those without, convenience ranks slightly higher than portabil- ity; all other features rank in the same order. Tables 8 and 9 present the results of these questions in more detail. For the sake of brevity, the chi-squared results have been omitted. Any differences considered significant in the discussion below are significant at least at the 0.05 level. Nearly all e-reader users and a strong majority of other e-book users rated portability, convenience, and storage either “valuable” or “very valuable,” though Figure 1. Features rated “valuable” or “very valuable” 116 iNForMAtioN tecHNoloGY AND liBrAries | septeMBer 2011 among respondents suggests that that many of those who do not own an e-book reader are unfamiliar with the technology. Since e-readers are primarily sold over the Internet, many people have not had a chance to see or handle one, perhaps partly explaining this result. If they become more widespread, this may well change. Not surprisingly, respondents who did not own e-readers were significantly more likely to prefer print. However, it is worth noting that even among students who did use e-readers, over a third “agree” or “com- pletely agree” that they prefer print, with another third neither agreeing nor disagreeing. Use of e-readers does not appear to indicate hostility toward print. This is con- sistent with the students’ self-reports of e-reader use; as reported above, over half of the students surveyed use e-readers for one-third of their reading or less. Thus, it seems unlikely that most of these students plan to totally abandon print any time soon; rather, e-readers are provid- ing another format that they use in addition to print. As for students who do not use e-readers, over half say they prefer print, but this is far from their most widespread concern; rather, like e-reader owners, they are most likely to cite the cost of the reader or the selection of books avail- able as a drawback of the devices. Queens College students considered price the most important drawback of e-readers. For both groups (own- ers and non-owners), it was the factor most likely to be identified as a concern, and the difference between the it was the only variable listed in the survey for which either the “not very valuable” and “not valuable” responses from either group amounted to a combined total of greater than 10 percent of the respondents in that group. In addition to valuing the same features, e-reader own- ers and non-owners had similar concerns about the device. Figure 2 shows the number of respondents in each group who agreed or completely agreed that the issues listed were one of the main shortcomings of e-book readers. Tables 10 and 11 give the responses in more detail. The responses with which the most respondents either agreed or completely agreed were the same: Cost of e-reader, selection of e-books, and cost of e-books, in that order. Although groups such as the Electronic Frontier Foundation have raised concerns about privacy issues related to e-readers,41 these issues have made little impression on students; both e-reader users and non- users were in agreement in putting privacy at the bottom of the list. One exception to the general agreement between e-reader users and other e-book readers was concern about eyestrain. The majority (63 percent) of those who do not use e-readers either “completely agree” or “agree” that eyestrain is a drawback, while only 29 percent of e-reader owners did. This was a major concern for early e-readers, leading the current generation of these devices to use e-ink, a technology that resembles paper and is thought to eliminate the eyestrain problem. The disparity Table 8. Value of e-reader features, according to e-reader users Very valuable Valuable Somewhat valuable Not very valuable Not valuable at all No response Portability 52 (82.54%) 10 (15.87%) 1 (1.59%) 0 (0.00%) 0 (0.00%) 0 (0.00%) Convenience 46 (73.02%) 13 (20.63%) 1 (1.59%) 1 (1.59%) 1 (1.59%) 1 (1.59%) Storage 42 (66.67%) 16 (25.40%) 2 (3.17%) 1 (1.59%) 0 (0.00%) 2 (3.17%) Special functions 32 (50.79%) 18 (28.57%) 7 (11.11%) 3 (4.76%) 3 (4.76%) 0 (0.00%) Text-speech 10 (15.87%) 13 (20.63%) 12 (19.05%) 16 (25.40%) 11 (17.46%) 1 (1.59%) Table 9. Value of e-reader features, according to other e-book users Very valuable Valuable Somewhat valuable Not very valuable Not valuable at all No response Portability 199 (58.88%) 89 (26.33%) 39 (11.53%) 4 (1.18%) 5 (1.48%) 2 (0.06%) Convenience 194 (57.40%) 98 (28.99%) 34 (10.06%) 7 (2.07%) 2 (0.59%) 3 (0.89%) Storage 181 (53.55%) 99 (29.28%) 40 (11.83%) 10 (2.96%) 4 (1.18%) 4 (1.18%) Special Functions 169 (50.00%) 82 (24.26%) 58 (17.16%) 22 (6.51%) 4 (1.18%) 3 (0.89%) Text-speech 95 (28.11%) 77 (22.78%) 77 (22.78%) 50 (14.79%) 35 (10.36%) 4 (1.18%) ADoptioN oF e-Book reADers AMoNG colleGe stuDeNts: A surveY | FoAsBerG 117 responded, but they brought up issues such as highlight- ing, battery life, and the small size of the screen. Another student was more confident in the value of e-readers and used this space to proclaim paper books dead. e-book circulation programs Finally, students were asked whether they would be interested in checking out e-readers with books loaded on them from the campus library (table 12). As is often the case when a survey asks for interest in a prospective new service, the response was very posi- tive. However, it was expected that many of the students would prefer to download materials for devices that they already own to take advantage of the convenience of e-readers. On the contrary, a high percentage of both types of students expressed interest in checking out e-book readers, but very few wished to check out e-books two groups was not significant. At the time this survey was taken, Amazon’s Kindle cost close to $300 and Barnes and Noble’s Nook was priced similarly. Soon after the survey closed, however, the major e-reader manufactur- ers engaged in a “price war,” which resulted in the prices of the best-known dedicated readers, Amazon’s Kindle and Barnes and Noble’s Nook, falling to under $200. Given the feeling among survey respondents that the price of the readers is a serious drawback, this reduction may cause the adoption rate to rise. It would be worth- while to repeat this survey or a similar one in the near future to learn whether the e-reader price war has had any effect upon price-sensitive students. In the pilot survey, students had written in further responses about the drawbacks of e-readers, but not about their benefits. While some of those responses were incorporated into the final survey, a free text field was also added to catch any further comments. Few students Figure 2. Drawbacks with which students “agree” or “completely agree” 118 iNForMAtioN tecHNoloGY AND liBrAries | septeMBer 2011 ■■ Future Research Although this survey provides some data to help libraries think about the popularity of e-readers among students, many aspects of students’ use of e-readers remain unex- plored. Further research on how student adoption of e-book readers varies by location and demographics, par- ticularly considering students’ economic characteristics, for a device of their own. Even students who owned e-readers were much more likely to express interest in checking out the device than checking out materials to read on it. This preference belies the common assumption that users do not wish to carry multiple devices and prefer to download everything electronically. Instead, they were interested in checking out an e-reader from the library. Unless the emphasis of the question altered the results, it is somewhat difficult to account for this response. Table 10. Drawbacks of e-readers, according to e-reader owners Completely agree Agree Neither agree nor disagree Disagree Completely disagree No response Cost of reader 19 (30.16%) 23 (36.51%) 13 (20.63%) 7 (11.11%) 0 (0.00%) 1 (1.59%) Selection 11 (17.46%) 26 (41.27%) 12 (19.05%) 7 (11.11%) 6 (9.52%) 1 (1.59%) Cost of e-books 10 (15.87%) 20 (31.75%) 16 (25.40%) 11 (17.46%) 5 (7.94%) 1 (1.59%) Prefer print 6 (9.52%) 16 (25.40%) 21 (33.33%) 11 (17.46%) 8 (12.70%) 1 (1.59%) Eyestrain 7 (11.11%) 11 (17.46%) 20 (31.75%) 15 (23.81%) 9 (14.29%) 1 (1.59%) Interface 7 (11.11%) 10 (15.87%) 24 (38.10%) 9 (14.29%) 8 (12.70%) 5 (7.94%) Privacy 3 (4.76%) 9 (14.29%) 13 (20.63%) 26 (41.27%) 11 (17.46%) 1 (1.59%) Table 11. Drawbacks of e-readers, according to other e-book users Completely Agree Agree Neither agree nor disagree Disagree Completely disagree No response Cost of reader 146 (43.20%) 117 (34.62%) 50 (14.79%) 14 (4.14%) 11 (3.25%) 0 (0.00%) Selection 80 (23.67%) 136 (40.24%) 84 (24.85%) 27 (7.99%) 7 (2.07%) 4 (1.18%) Cost of e-books 94 (27.81%) 121 (35.80%) 76 (22.49%) 37 (10.95%) 10 (2.96%) 0 (0.00%) Prefer print 78 (23.08%) 99 (29.29%) 116 (34.32%) 25 (7.40%) 19 (5.62%) 1 (0.30%) Eyestrain 84 (24.85%) 129 (38.17%) 80 (23.67%) 33 (9.76%) 11 (3.25%) 1 (0.30%) Interface 43 (12.72%) 82 (24.26%) 145 (42.90%) 33 (9.76%) 20 (5.92%) 15 (4.44%) Privacy 39 (11.54%) 65 (19.23%) 144 (42.60%) 49 (14.50%) 40 (11.83%) 1 (0.30%) Table 12. Interest in checking out preloaded e-readers from the library E-reader owners Other e-book users Would be interested in checking out e-readers 44 (70.0%) 257 (76.0%) Would not be interested in checking out e-readers 4 (6.3%) 38 (11.2%) Would not be interested in checking out e-readers, but would like to check out e-books to read on my own e-reader 15 (23.8%) 43 (12.7%) Total 63 (100.1%) 338 (99.9%) ADoptioN oF e-Book reADers AMoNG colleGe stuDeNts: A surveY | FoAsBerG 119 whom would not object to using a print edition if one were available. Under these circumstances, and realizing that the future popularity of e-readers is far from guaranteed, developing such models is, for now, more important than putting them into practice in the short term. References 1. Nancy K. Herther, “The EBook Reader is not the Future of EBooks,” Searcher 16, no. 8 (2008): 26–40, http://search.ebsco host.com/login.aspx?direct=true&db=a9h&AN=34172354&site =ehost-live (accessed Dec. 22, 2010). 2. Charlie Sorrel, “Amazon: E-Books Outsell Hardcovers,” Wired, July 20, 2010, http://www.wired.com/gadgetlab/ 2010/07/amazon-e-books-outsell-hardcovers/ (accessed Dec. 22, 2010). 3. International Digital Publishing Forum, “Industry Statistics,” Oct. 2010, http://www.idpf.org/doc_library/indus trystats.htm (accessed Dec. 22, 2010). 4. Kathleen Hall, “Global E-Reader Sales to Hit 6.6m 2010,” Electronics Weekly, Dec. 9, 2010, http://www.electronicsweekly .com/Articles/2010/12/09/50083/global-e-reader-sales-to -reach-6.6m-2010-gartner.htm (accessed Dec. 22, 2010). 5. Cody Combs, “Will Physical Books be Gone in Five Years?” video interview with Nicholas Negroponte, CNN, Oct. 18, 2010, http://www.cnn.com/2010/TECH/innovation/10/17/negro ponte.ebooks/index.html (accessed Dec. 22, 2010). 6. Jeff Gomez, Print is Dead: Books in Our Digital Age (Basingstoke, UK: Palgrave Macmillan, 2009). 7. Aaron Smith, “E-Book Readers and Tablet Computers,” in Americans and Their Gadgets (Washington, D.C.: Pew Internet & American Life Project, 2010), http://www.pewinternet.org/ Reports/2010/Gadgets/Report/eBook-Readers-and-Tablet -Computers.aspx (accessed Dec. 22, 2010). 8. Alex Sharp, “Amazon Announces Kindle Book Lending Feature is Coming in 2010,” Suite101, Oct. 26, 2010, http:// www.suite101.com/content/amazon-announces-kindle-book -lending-feature-is-coming-in-2010-a300036#ixzz18CxAnFke (accessed Dec. 22, 2010). 9. Karl Drinkwater, “E-Book Readers: What are Librarians to Make of Them?” SCONUL Focus 49 (2010): 4–10, http://www .sconul.ac.uk/publications/newsletter/49/2.pdf (accessed Dec. 22, 2010). Drinkwater provides an overview and a discussion of the challenges and benefits of such programs. 10. Benedicte Page, “PA Sets out Restrictions on Library E-Book Lending,” The Bookseller, Oct. 21, 2010, http://www .thebookseller.com/news/132038-pa-sets-out-restrictions-on -library-e-book-lending.html (accessed Dec. 22, 2010). 11. James A. Buczynski, “Library eBooks: Some Can’t Find Them, Others Find Them and Don’t Know What They Are,” Internet Services Reference Quarterly 15, no. 1 (2010): 11–19, doi: 10.1080/10875300903517089, http://dx.doi.org/ 10.1080/10875300903517089 (accessed Dec. 22, 2010). 12. Smith, “E-Book Readers and Tablet Computers,” http:// www.pewinternet.org/Reports/2010/Gadgets/Report/eBook -Readers-and-Tablet-Computers.aspx (accessed Dec. 22, 2010). 13. Shannon D. Smith and Judith Borreson Caruso, The ECAR Study of Undergraduate Students and Information Technology, 2010 (Boulder, Colo.: Educause, 2010), http://net.educause. is certainly important. More research on the habits of students with e-readers would also help libraries and uni- versities to better serve their needs. In particular, while this survey found that students tend to switch between electronic and print formats, little is yet known about when and why they move from one to the other. It will also be important to research the differences between the reading habits of students who own e-read- ers and those who do not, as this may prove useful in interpreting the survey data about types of reading done with different kinds of e-books. Furthermore, since the e-book market changes quickly, continuing to research student adoption of e-readers is also important to monitor student reactions to new developments. ■■ Conclusion While many Queens College students express an interest in e-readers, and even those who do not own one believe that their portability and convenience offer valuable advantages, only a small percentage of students, many of whom are early adopters of technology in general, actu- ally use one. Furthermore, even those who own e-readers do not use them exclusively, and only a third say they prefer it to print. In light of these responses, the proper response to this technology may not be a discussion about whether “paper books are dead” (as one of the survey respondents wrote in the comment field) but how each format is used. Research on when, where, and for what purposes students might choose print or electronic has already begun.42 Many of the factors that contribute to the niche status of e-readers are changing. Competition between manufactur- ers has brought down the price of the reader itself, and the selection of books available for them is improving. Because these were some of the most important problems stand- ing in the way of e-reader adoption for Queens College students, e-reader ownership could increase rapidly. The lack of a significant difference between the attitudes of e-reader owners and nonowners merits further emphasis and examination, as it may indicate that price is indeed the major barrier to e-reader ownership. Although the prices are lower now than they were when the survey was origi- nally taken, this would present a major concern if e-readers became the expected format in which students read, per- haps even the possibility of a new kind of digital divide. As the future is uncertain, it is important for academic libraries to pay attention to their students’ adoption of e-readers, and to consider models under which they can provide materials compatible with them. However, it is important to remember that such materials would, at pres- ent, be accessible to only a small subset of users, many of 120 iNForMAtioN tecHNoloGY AND liBrAries | septeMBer 2011 20. Jon T. Rickman et al., “A Campus-Wide E-Textbook Initiative,” Educause Quarterly 32, no. 2 (2009), http://www.edu cause.edu/library/EQM0927 (accessed Dec. 22, 2010). 21. Dennis T. Clark, “Lending Kindle E-Book Readers: First Results from the Texas A&M University Project,” Collection Building 28, no. 4 (2009): 146–49, doi: 10.1108/01604950910999774, http://www.emeraldinsight.com/journals.htm?articleid=18174 06&show=abstract (accessed Dec. 22, 2010). 22. Marshall and Rutolo, “Reading-in-the-Small,” 58. 23. Mallett, “A Screen Too Far?” 142. 24. “E-Reader Pilot at Princeton.” 25. Foster and Remy, “E-Books for Academe,” 6. 26. Waycott and Kukulska-Hulme, “Students’ Experiences with PDAs,” 38. 27. “E-Reader Pilot at Princeton.” 28. Rickman, “A Campus-Wide E-Textbook Initiative.” 29. Dennis T. Clark et al., “A Qualitative Assessment of the Kindle E-Book Reader: Results from Initial Focus Groups,” Performance Measurement and Metrics 9, no. 2 (2008): 118–129, doi: 10.1108/14678040810906826, http://www.emeraldinsight .com/journals.htm?articleid=1736795&show=abstract (accessed Dec. 22, 2010); James Dearnley, Cliff McKnight, and Anne Morris. “Electronic Book Usage in Public Libraries: A Study of User and Staff Reactions to a PDA-based Collection,” Journal of Librarianship and Information Science 36, no. 4 (2004): 175–182, doi: 10.1177/0961000604050568, http://lis.sagepub.com/con- tent/36/4/175 (accessed Dec. 22, 2010); Mallett, “A Screen Too Far?” 143; Waycott and Kukulska-Hulme, “Students’ Experiences with PDAs,” 36. 30. Mallet, “A Screen Too Far?” 142–43. 31. M. Cristina Pattuelli and Debbie Rabina. “Forms, Effects, Function: LIS Students’ Attitudes toward Portable E-Book Readers,” ASLIB Proceedings: New Information Perspectives 62, no. 3 (2010): 228–44, doi: 10.1108/00012531011046880, http://www .emeraldinsight.com/journals.htm?articleid=1863571&show=ab stract (accessed Dec. 22, 2010). 32. See, for example, Gil-Rodriguez and Planella-Ribera, “Educational Uses of the E-Book,” 58–59; and Cliff McKnight and James Dearnley, “Electronic Book Use in a Public Library,” Journal of Librarianship & Information Science 35, no. 4 (2003): 235–42, doi: 10.1177/0961000603035004003, http://lis.sagepub .com/content/35/4/235 (accessed Dec. 22, 2010). 33. Rickman et al. “A Campus-Wide E-Textbook Initiative.” 34. Maria Kiriakova et al., “Aiming at a Moving Target: Pilot Testing EBook Readers in an Urban Academic Library,” Computers in Libraries 30, no. 2 (2010): 20–24, http://search .ebscohost.com/login.aspx?direct=true&db=a9h&AN=48757663 &site=ehost-live (accessed Dec. 22, 2010). 35. Mark Sandler, Kim Armstrong, and Bob Nardini, “Market Formation for E-Books: Diffusion, Confusion or Delusion?” The Journal of Electronic Publishing 10, no. 3 (2007), doi: 10.3998/3336451.0010.310, http://quod.lib.umich.edu/cgi/t/ text/text-idx?c=jep;view=text;rgn=main;idno=3336451.0010.310 (accessed Dec. 22, 2010). 36. Mark R. Nelson, “E-Books in Higher Education: Nearing the End of an Era of Hype?” Educause Review 43, no. 2 (2008), http://www.educause.edu/EDUCAUSE+Review/ EDUCAUSEReviewMagazineVolume43/EBooksinHigher EducationNearing/162677 (accessed Dec. 22, 2010). 37. Ibid. 38. L. Johnson et al., The 2010 Horizon Report (Austin, Tex.: edu/ir/library/pdf/ERS1006/RS/ERS1006W.pdf (accessed Dec. 22, 2010). 14. Harris Interactive, “One in Ten Americans Use an E-reader; One in Ten Likely to Get One in the Next Six Months,” press release, Sept. 22, 2010, http://www.piworld.com/com mon/items/biz/pi/pdf/2010/09/pi_pdf_HarrisPoll_eReaders. pdf (accessed Dec. 22, 2010). 15. Kat Meyer, “#FollowReader: Consumer Attitudes Toward E-Book Reading,” blog posting, O’Reilly Radar, Aug. 4, 2010, http://radar.oreilly.com/2010/08/followreader-consumer-atti tudes-toward-e-book-reading.html (accessed Dec. 22, 2010). 16. The following articles are all based on user studies with small form factor devices: Paul Lam, Shun Leung Lam, John Lam and Carmel McNaught, “Usability and Usefulness of EBooks on PPCs: How Students’ Opinions Vary Over Time,” Australasian Journal of Educational Technology 25, no. 1 (2009): 30–44, http:// www.ascilite.org.au/ajet/ajet25/lam.pdf (accessed Dec. 22, 2010); Catherine C. Marshall and Christine Rutolo, “Reading- in-the-Small: a Study of Reading on Small Form Factor Devices,” in JCDL ’02 Proceedings of the 2nd ACM/IEEE-CS Joint Conference on Digital Libraries (New York: ACM, 2002): 56–64. doi: 10.1145/544220.544230, http://portal.acm.org/citation .cfm?doid=544220.544230 (accessed Dec. 22, 2010); and J. Waycott and A. Kukulska-Hulme, “Students’ Experiences with PDAs for Reading Course Materials,” Personal Ubiquitous Computing 7, no. 1 (2002): 30–43, doi: 10.1007/s00779–002–0211-x, http://www .springerlink.com/content/w288kry251dd2vcd/ (accessed Dec. 22, 2010). 17. Some examples in an academic context: James Dearnley and Cliff McKnight, “The Revolution Starts Next Week: The Findings of Two Studies Considering Electronic Books,” Information Services & Use 21, no. 2 (2001): 65–78, http://search .ebscohost.com/login.aspx?direct=true&db=a9h&AN=5847810& site=ehost-live (accessed Dec. 22, 2010); and Eric J. Simon, “An Experiment Using Electronic Books in the Classroom,” Journal of Computers in Mathematics & Science Teaching 21, no. 1 (2002): 53–66, http://vnweb.hwwilsonweb.com/hww/jumpstart.jhtml?recid= 0bc05f7a67b1790e5237dc070f466830549a60a87b3fa34bd0b8951acd 7a879da9fa151218a88252&fmt=H (accessed Dec. 22, 2010). 18. Eva Patrícia Gil-Rodriguez and Jordi Planella-Ribera, “Educational Uses of the E-Book: An Experience in a Virtual University Context,” in HCI and Usability for Education and Work, ed. Andreas Holzinger, Lecture Notes in Computer Science no. 5298 (Berlin: Springer, 2008): 55–62, doi: 10.1007/978- 3-540-89350-9-5, http://www.springerlink.com/content/ d357482823j10m96/ (accessed Dec. 22, 2010); “E-Reader Pilot at Princeton, Final Report,” (Princeton University, 2009), http:// www.princeton.edu/ereaderpilot/eReaderFinalReportLong .pdf (accessed Dec. 22, 2010); Gavin Foster and Eric D. Remy. “E-Books for Academe: A Study from Gettysburg College,” Educause Research Bulletin, no. 22 (2009), http://www.educause .edu/Resources/EBooksforAcademeAStudyfromGett/187196 (Dec. 22, 2010); and Elizabeth Mallett, “A Screen Too Far? Findings from an E-Book Reader Pilot,” Serials 23, no. 2 (2010): 14–144, doi: 10.1629/23140, http://uksg.metapress.com/ media/mfpntjwvyqtggyjvudu7/contributions/f/3/2/6/ f32687v5r12n5h77.pdf (accessed July 11, 2011). 19. Steve Kolowich, “Colleges Test Amazon’s Kindle E-Book Reader as Study Tool,” USA Today, Feb. 23, 2010, http://www .usatoday.com/news/education/2010–02–23-IHE-Amazon-kin dle-for-college23_ST_N.htm (accessed Dec. 22, 2010). ADoptioN oF e-Book reADers AMoNG colleGe stuDeNts: A surveY | FoAsBerG 121 question 22, and was reused in the current survey. Again, the author extends thanks to Michelle Fraboni and Eva Fernández, who ran this portion of the survey at Queens College and allowed the use of their data. 41. Electronic Frontier Foundation, “Updated and Corrected: E-Book Buyer’s Guide to Privacy,” Deeplinks Blog, Jan. 6, 2010, http://www.eff.org/deeplinks/2010/01/updated-and-cor- rected-e-book-buyers-guide-privacy (accessed Dec. 22, 2010). 42. Pattuelli and Rabina, “LIS Students’ Attitudes.” New Media Consortium, 2010), http://wp.nmc.org/hori- zon2010/chapters/electronic-books/ (accessed July 11, 2011). 39. Aaron Smith, “E-Book Readers and Tablet Computers,” h t t p : / / w w w. p e w i n t e r n e t . o rg / R e p o r t s / 2 0 1 0 / G a d g e t s / Report/eBook-Readers-and-Tablet-Computers.aspx (accessed July 11, 2011). 40. This question was located in a portion of the survey not focused on e-book readers and thus does not appear in the appendix. The question derives from Smith and Caruso, 105, 122 iNForMAtioN tecHNoloGY AND liBrAries | septeMBer 2011 Appendix. Queens College Student Technology Survey ADoptioN oF e-Book reADers AMoNG colleGe stuDeNts: A surveY | FoAsBerG 123 124 iNForMAtioN tecHNoloGY AND liBrAries | septeMBer 2011 ADoptioN oF e-Book reADers AMoNG colleGe stuDeNts: A surveY | FoAsBerG 125 126 iNForMAtioN tecHNoloGY AND liBrAries | septeMBer 2011 ADoptioN oF e-Book reADers AMoNG colleGe stuDeNts: A surveY | FoAsBerG 127 128 iNForMAtioN tecHNoloGY AND liBrAries | septeMBer 2011 1770 ---- liBrAriANs AND tecHNoloGY skill AcQuisitioN: issues AND perspectives | rileY-HuFF AND rHoles 129 Debra A. Riley-Huff and Julia M. Rholes Librarians and Technology Skill Acquisition: Issues and Perspectives qualified individuals to fill these technology-driven librarian roles in our libraries and if so why? How are qualifications acquired and what are are they, besides a moving target? There appears to be two major convergent trends influencing this uncertain phenomenon. The first is what is perceived as “lack of awareness” and consen- sus about what the core of LIS needs to be or to become in order to offer real value in a constantly changing and competitive information landscape.5 The other trend centers on the role of LIS education and the continuing questions regarding its direction, efficacy, and ability to prepare future librarians for the modern information professions of now and the future. While changes are apparent it appears many LIS programs are still operat- ing on a two-track model of “traditional librarians and information managers” and there are enough questions in this area to warrant further investigation and inquiry.6 ■■ Literature Review Most of the literature pertaining to the readiness of librar- ians to work in increasingly technical environments, centers on LIS education. This certainly makes sense given the assumed qualifications the degree confers. Scant literature focuses solely on the core of the librarians’ professional identity, workplace culture, and institutional historical perspectives related to qualifications; how- ever, allusions to “redefining” LIS are often found in LIS education literature. There is limited research on prepro- fessional or even professional in-service training although calls for such research have been made repeatedly. A key study on LIS education is the 2000 Kaliper report, issued when the impact of technology in libraries was clearly reaching saturation.7 The report is the product of an analysis project with a goal of examining new trends in LIS education. The report lists six trends including three of which are pertinent to the investigation of technology inclusion in LIS programs. These trends note that in 2000, LIS programs were beginning to address a more broad range of information problems and environments, pro- grams were increasing IT content into the curriculum, and several programs were beginning to offer specializations within the curriculum, though not ones with a heavy tech- nology focus. In a widely cited curriculum study in 2004, Markey completed a comprehensive examination of 55 Libraries are increasingly searching for and employing librarians with significant technology skill sets. This arti- cle reports on a study conducted to determine how well prepared librarians are for their positions in academic libraries, how they acquired their skillss and how diffi- cult they are to hire and retain. The examination entails a close look at ALA-accredited LIS program technology course offerings and dovetails a dual survey designed to capture experiences and perspectives from practitioners, both library administrators and librarianss who have significant technology roles. A recent OCLC report on research libraries, risk, and systemic change discusses what ARL directors per- ceive as the highest risks to their libraries.1 The administrators reported on several high risks in the area of human resources including high-risk conditions in recruit- ment, training, and job pools. The OCLC report notes that recruitment and retention is difficult due to the competi- tive environment and the reduction in the pool of qualified candidates. Why precisely do administrators perceive that there is a scarcity of qualified candidates? Changes in libraries, most of which have been brought on by the digi- tal age, are reflected in the need for a stronger technological type of librarianship—not simply because technology is there to be taken advantage of, but because “information” by nature has found its dominion as the supreme commod- ity perfectly transported on bits. It follows, if information is your profession, you are no longer on paper. That LIS is becoming an increasingly technology-driven profession is both recognized and documented. A noted trend particularly in academic libraries is a move away from simply redefining traditional or existing library roles altogether in favor of new and com- pletely redesigned job profiles.2 This trend verifies actions by library administrators who are increasingly seeking librarians with a wider range of Information Technology (IT) skills to meet the demands of users who are access- ing information through technology.3 Johnson states the need well as We need an integrated understanding of human needs and their relationships to information systems and social structures. We need unifying principles that illuminate the role of information in both computation and cognition, in both communication and community. We need information professionals who can apply these principles to synthesize human-centered and technological perspectives.4 The questions then become, is there a scarcity of Debra A. riley-Huff (rileyhuf@olemiss.edu) is Web services li- brarian, university of Mississippi libraries, university, Miss. Julia M. rholes (jrholes@olemiss.edu) is dean of libraries, university of Mississippi libraries, university, Mississippi. 130 iNForMAtioN tecHNoloGY AND liBrAries | septeMBer 2011 academic libraries had embarked on an unprecedented increase in filling librarian positions with professionals who do not have a master’s degree in library science.13 Citing the Association of Research Libraries annual sal- ary statistics, among a variety of positions being filled by other professionals a substantial number are going to those in technology fields such as systems and instruc- tional technology. In the mid 2000s, suggestions that library schools needed to work more closely with com- puter science departments began coming up more often. Obstacles to these types of partnerships were noted as computer science departments failed to see the advantage offered by library science faculty as well as being wary of taking on a “softening” by the inclusion of what is perceived as a “soft science.”14 In response, most library schools have added courses in computing, but many still question the adequacy. More recently there have been increasing calls from within LIS for more research into LIS education and professional practice. In 2006, a study by McKinney comparing proposed “ALA Core Competencies” to what was actually being taught in ALA-accredited curricula, shed some light on what is currently offered in the core of LIS education.15 The study found that the core compe- tency required most often in ALA-accredited programs were “Knowledge Organization” or cataloging (94.6 per- cent), “Professional Ethics” (80.4 percent), “Knowledge Dissemination” or reference (73.2 percent), “Knowledge Inquiry” or research (66.1 percent), and “Technical Knowledge” or technology foundations (66.1 percent).16 These courses map well to ALA Core Competencies but the question in the digital age, is one, not even universally required, technology-related course adequate for a career in LIS? The literature would seem to reflect that it is not. 2007 saw many calls for studies of LIS education using methods that not only examined course curricula but that also sought evidence of outcomes by those working in the field.17 An interest in studies reporting on employers’ views, graduates’ workplace experiences, and if possible longitudinal studies have been outwardly requested.18 Indications are that those in library work environments can play a vital role in shaping the future course of LIS education and preprofessional training by providing tar- geted research, data, and evidence of where weaknesses are currently being experienced and what changes are driving new scenarios. The most current literature points out both areas of technology deficiencies and emerging opportunities in libraries. Areas with an apparent need for immediate improvement are the continuing integration of third-party Web 2.0 application programming inter- faces (APIs) and social networking platforms.19 Debates about job titles and labels continue but the actuality is that the number of adequately trained digital librarians has not kept up with the demand.20 Modern libraries require those in technology-related roles to have broad or ALA-accredited LIS programs looking for change between the years 2000 and 2002.8 Markey’s study revealed that while there were improvements in the number of IT-related courses offered and required throughout programs, they were still limited overall with the emphasis continuing to be on the core curriculum consisting of foundations, reference, organization, and management. One of the important points Markey makes is the considerable chal- lenge involved in retraining or acquiring knowledgeable faculty to teach relevant IT courses. The focus on LIS education issues came to the fore in 2004 when Michael Gorman released a pair of arti- cles asserting that there was a crisis in LIS education, namely an assault on LIS by what Gorman referred to as “Information Science,” “Information Studies” and “Information Technology.”9 Gorman’s papers sought to establish that there is a de facto competition between Information Science courses, which he characterized as courses with a computational focus and LIS courses, which composed core librarianship courses, those tend- ing to be the more user focused and organizational. Gorman claimed LIS faculty were being marginalized in favor of Information Science and made further claims regarding gender roles within the profession along the alleged LIS/IS split. Gorman also noted that there was no consensus about how “librarianship” should be defined coming from either ALA or the LIS graduate programs. The articles were not without controversy, spurring a flurry of discussion in the library community, which spawned several follow up articles. Dillon and Norris rallied against the library vs. information science argu- ment as a premise, which has no bearing on the reality of what is happening in LIS and does nothing but create yet another distracting disagreement over labels.10 Others argued for the increasing inclusion of technology courses in LIS education, as Estabrook put it, Librarianship without a strong linkage to technology (and it’s capacity to extend our work) will become a mastodon. Technology without reference to the core library principles of information organization and access is deracinated.11 As the future of LIS was being hotly debated, voices in the field were issuing warnings that obstacles were being encountered finding qualified librarians with the requi- site technology skills necessary to take on new roles in the library. In 2007, Johnson made the case for the increas- ing need for new areas of emphasis in LIS, including specializations such as Geographic Information Systems by pointing out that it is not so much the granular train- ing that is expected of LIS education but a higher level technology skill set that allows for the ability to move into these specializations, identify what is needed, assess problems, and make decisions.12 In 2006, Neal noted that liBrAriANs AND tecHNoloGY skill AcQuisitioN: issues AND perspectives | rileY-HuFF AND rHoles 131 by examination of course catalogs and surveys of both library administrators and technology librarians. The LIS educational data was obtained by inspecting course catalogs. Course catalogs and website curricu- lum pages from all ALA-accredited LIS programs in the United States, Canada, and Puerto Rico were examined in December 2009 for the inclusion of technology-related courses. The catalogs examined were for the 2009–10 aca- demic year. Spanish and French catalogs were translated. Each available course description was reviewed and those courses with a primary technology component were iden- tified. In a secondary examination the selected courses were closely inspected for the exact technology focus and the primary subject content was noted for each course. Courses were then separated into categories by areas of focus and tabulated. A targeted survey identified practicing technology librarians’ perspectives on their level of preparation and continuing skill level needs based on actual job demands. In this survey, librarians with significant technology roles was defined as “for the purposes of this survey a librarian with a significant technology role would be any librarian whose job would very likely be considered “IT” if they were not in a library and whose job titles contain words like “systems, digital, web, electronic, network, data- base, automation, and whose job involves maintaining and/or building various IT infrastructures.” The survey was posted on various library and library technology electronic discussion lists in December 2009 and was available for two weeks. Library administrative perspec- tives were also gained through a targeted survey aimed at those with an administrative role of department head or higher. The survey was designed to capture the reported experience library administrators have had with librar- ians in significant technology roles, primarily as it relates to skill levels, availability, hiring, and retention. This sur- vey was posted on to various library administrative and technology discussion lists in December 2009 and was also available for two weeks. Both surveys included many similar questions to compare and contrast viewpoints. Results were tabulated to form an overarching picture and some relevant comparisons were made. There are limitations and inherent issues with this type of research. Catalog examinations when completed by qualified librarians can hold great accuracy; how- ever, the introduction of bias or misinterpretation is always possible.26 When categorizing courses, the authors reviewed course descriptions three separate times to ensure accuracy. Courses in doubt were reviewed again with knowledgeable colleagues to obtain a consensus. Surveys designed to capture perspectives, views, and experiences are by nature highly subjective and provide data that is both qualitative and quantitative. Tabulated data was given strictly simple numerical representa- tion to provide a factual picture of what was reported. specialized competencies in areas such as web develop- ment, database design, and management paired with a good working knowledge of classification formats such as XML, MARC, EAD, RDF and Dublin Core. Educational technology (ET) has been identified as an area of expected growth opportunity for libraries and there have been sug- gestions that more LIS programs should partner with ET programs to improve LIS technology offerings, skills and preprofessional training.21 LIS program change, including the apparent coalesc- ing of information technology focused education would appear to be demonstrated by the iSchool or iField Caucus of ALA accredited programs, however the literature is not clear on if that is actually being evidenced. The iSchools organization started in as collective in 2005 with a goal of advancing information science. iSchools incorporate a multidisciplinary approach and those with a library science focus are ALA accredited.22 A 2009 study interest- ingly applied Abbott’s theoretical framework used in the Chaos of Disciplines to the iField.23 Resulting in abstract yet relevant conclusions, Abbott looks at change in a field through a sociological lens looking for patterns of fractal distinction over time. The study concluded that tradi- tional LIS education remained at the heart of the iField movement and that the real change has been in locale, from libraries to location independent.24 Hall’s 2009 study exploring the core of required courses across almost all ALA accredited programs reveals that the core curricu- lum is still principle-centered, but it is focusing less on reference and intermediary activities with a definite shift toward research methods and information technology.25 ■■ Method This research study was designed to capture a broad view of technology skill needs, skill availability, and skill acquisition in libraries, while still allowing for some areas of sharper focus on stakeholder perspectives. The four primary stakeholder groups in this study were identified as LIS educators, LIS students, working librarians, and library administrators. The research questions cover three main areas of technology skill acquisition and employ- ment. One area is LIS education and whether the status of all technology course offerings has changed in recent years in response to market demands. The second area is the experience of librarians with significant technology roles with regards to job availability, readiness, and tech- nology skill acquisition. The third area is, the perception of library administrators regarding the availability and readiness of librarians with technology roles. To cover the research questions and provide a broad situational view, the research was triangulated and aimed at the three question areas. Data collection was accomplished 132 iNForMAtioN tecHNoloGY AND liBrAries | septeMBer 2011 may arguably be considered description or cataloging. Metadata was included because it is an integral part of many new digital services. The categories are presented in column 1, the total number of courses offered is presented in column 2. The number of advanced courses available within each category total is further broken out into paren- thesis. Some programs offered more than one course in a given category; hence the percentage of programs offering at least one course is given in column 3. Additionally, although the librarian survey was targeted to “those with significant technology roles,” it would appear that the definition of “significant” seemed to vary in interpretation by the respondents. This is discussed in further detail in the findings. Given the limitations of this type of research, the authors did not attempt to find definite correlations, however trends and patterns are clearly revealed. ■■ Catalog Findings Course catalogs from all 57 ALA-accredited programs in the United States, Canada, and Puerto Rico were exam- ined for the inclusion of technology-related courses. A total of 439 technology-related courses were offered across the 57 LIS programs, including certificate program course offerings. The total number of technology-related courses offered by program ranged from 2 to 20. The mean number of courses offered per program was 7.7, the median was 10, and the mode was 4. Table 1 shows the total number of technology courses being offered per program by matching them with the number of courses they offer. Catalog course content descriptions were analyzed looking for a technology focus. The fifteen categories noted in table 2 were selected as representative of the technology-related courses offered. It is acknowledged that some course content may be overlapping, but each course was placed in only one category based on its primary con- tent. Note also the inclusion of “metadata markup” which Table 1. Number of technology-related courses being offered per program # of Programs Offering # of Courses Offered 1 offers 2 courses 6 offer 3 courses 8 offer 4 courses 6 offer 5 courses 7 offer 6 courses 5 offer 7 courses 5 offer 8 courses 1 offer 9 courses 6 offer 10 courses 1 offers 11 courses 3 offer 12 courses 2 offer 13 courses 2 offer 14 courses 1 offers 15 courses 1 offers 17 courses 1 offers 18 courses 1 offers 20 courses Table 2. Course content description and number of courses offered across all programs. The number of advanced courses in the total is given in parenthesis. Course Type as Categorized by the Course Content Description in the LIS Program Catalog # of Courses Offered % of Programs Offering at least 1 Course Database design, development and maintenance 47 (7) 70 Web Architecture (Web design, development, usability) 52 (11) 68 Broad technology survey courses (basics of library technologies and overviews) 50 65 Digital Libraries 43 (4) 61 Systems Analysis, server management 49 (6) 60 Metadata Markup (DC, EAD, XML, RDF) 43 (10) 50 Digital imaging, audio and video production 33 (5) 47 Automation and Integrated Library Systems 21 37 Networks 32 (3) 35 Human Computer Interaction 21 (4) 29 Instructional Technology 12 21 Computer programming languages, open source technologies 12 (2) 17 Web 2.0 (social networking, virtual reality, third party API’s) 11 17 User IT management (microcomputers in libraries) 6 10 Geographic Information Systems 6 (1) 8 liBrAriANs AND tecHNoloGY skill AcQuisitioN: issues AND perspectives | rileY-HuFF AND rHoles 133 ■■ Perspectives on Job Availability, Readiness and Skill Acquisition As previously noted in the method, two surveys were administered to collect participant viewpoint data per- tinent to the study. Reponses were carefully checked to determine whether they met the criteria for inclu- sion in the study. No attempt was made to disqualify respondents based solely on job title. It did appear that a significant number of non-target subjects did initially reply to the librarian survey, but quit the survey at the technology-related questions. Final inclusion was based on either an IT-related job title or if the respondent answered the technology questions regardless of job title. Tables 3–5 report demographic response data. ■■ Perspectives on Job and Candidate Availability A 2009 study by Matthew and Pardue asked the question “What skills do librarians need in today’s world?”29 They sought to answer this question by performing a content analysis, spread over five months, of randomly selected jobs from ALA’s JobList. What they found in the area of technology was a significant need for web development, An assessment of the course catalog facts reveals that there have been increases in the number of technol- ogy courses offered in LIS programs, but is it enough? Significant longitudinal data shows an increased empha- sis in the area of metadata. A 2008 study of the total number of LIS courses offering Internet or electronic resources and metadata schemas, found that the number of programs offering such as being ten (17.5 percent) with only twelve metadata courses offered in total.27 Current results show 43 metadata courses offered with 50 percent of LIS programs offering at least one course. The lack of a solid basis in web 2.0 applications and integration as reported by Aharony is confirmed by the current catalog data, with only 17 percent of programs offering a course.28 While at first glance it looks like many technol- ogy-related courses are currently being offered in LIS programs, a closer inspection reveals cause for concern. Many of these courses should be offered by 100 percent of LIS programs and advanced courses in many areas should be offered as well. While there may be some over- lap of content in some of these course descriptions, the percentages are still too low to deduce that LIS graduates, without preprofessional technology experience or educa- tion, are really prepared to take on serious technology roles in academic libraries. Table 3. Response data Responses Administrative Survey Librarian Survey Total responses 185 382 Total usable (qualified) 146 227 Table 4. Respondents institution by size By type Administrative Survey Librarian Survey Under 5,000 37 72 5,000 - 10,000 25 31 10,000 - 15,000 18 28 15,000 - 20,000 11 20 20,000 - 25,000 13 21 25,000 - 30,000 16 13 30,000 - 35,000 4 11 35,000 - 40,000 5 9 More than 40,000 12 21 Unknown 5 1 Table 5. Respondent type Administrative Survey: Position # of Responses Dean, Director, University Librarian 46 Department Head 71 Manager or other leadership role 29 Librarian Survey: General area of work # of Responses Public Services 48 Systems 42 Web Services 32 Reporting dual roles 31 Digital Librarian 29 Electronic Resources Librarian 28 Emerging/Instructional Technologies 18 Administrative 10 Metadata/Cataloger 9 Technical Services 7 Distance Education Librarian 4 Demographic Data 134 iNForMAtioN tecHNoloGY AND liBrAries | septeMBer 2011 based on the difficulty rating and the classifications were then averaged by difficulty. Some respondents were unsure of difficulty ratings because the searches hap- pened before their presence at their current library and those searches were excluded. Position classifications with less than five searches were excluded from averag- ing and are marked “na” in table 6. The difficulty rubric is as follows: 1 = easy; 2 = not too bad, pretty straightfor- ward search; 3 = a bit tough, the search was protracted; 4 = very difficult, required more than one search; 5 = unable to fill the position. It is to be noted that almost all levels of difficulties were reported for many classifications but that the overall average hiring difficulty rating was 2.48. A comparable set of questions was posted to the librarian survey. We asked librarians to report profes- sional level technology positions they had held in the past five years along with any current job searches. 164 responses were received by people indicating that they had held such a position or were searching for one, with the total number of positions/searches being reported at 316 with some respondents reporting multiple positions. Respondents reported having between one and five dif- ferent positions with the average number being 1.92 jobs per respondent (see table 7). The respondents were also asked to give the position title for each position held or positions they were apply- ing for as well as the difficulty encountered in obtaining the position. Like the administrative report, job titles were project management, systems development, and systems applications. Further they suggest that some librarians are using a substantial professional IT skills subset. This article’s literature review points out that there are assertions being made that some technology-related librar- ian positions are difficult to fill and may in fact be filled by non-MLS professionals. In the associated surveys the authors sought to capture data related to actual job avail- ability, search experiences and perspectives by both library administration and librarians. Note that both MLS librar- ians and a few professional library IT staff completed the survey. The distinction is made where appropriate. The survey asked library administrators if they had hired a technology professional position in the past five years. 146 responses were received and 100 respondents indicated that they had conducted such a search, with the total number of searches being reported at 167. Of these searches, 22 did not meet the criteria for inclusion due to other missing data such as job title. The total reported number of librarian/professional level technology posi- tions that were posted for hire by these respondents was 145 with some respondents reporting multiple searches for the same or different positions. Respondents conduct- ing searches reported having between 1 and 5 searches total with the average number being 1.45 per respondent. The respondents were also asked to provide the position title for each search, the difficulty encountered in conducting the search, and the success rate. Job titles were divided into categories to ascertain how many posi- tions in each category reported having a relevant search conducted. Each search was then assigned a point value Table 6. Administrative report on positions open, searches and difficulty of search (n = 145) Position Classification Searches Search Difficulty Systems/ Automation Librarian 40 2.78 Digital Librarian 32 2.6 Emerging & Instructional Technology Librarian 15 2.53 Web Services/ Development Librarian 33 2.51 Electronic Resources Librarian 22 1.95 Database Manager 1 na Network Librarian/ Professional 1 na Table 7. Librarian report on positions held or current searches and difficulty (n = 316) Position Classification # of Positions/ Searches Search Difficulty Administrative 8 3 Technical Services 17 2.11 Public Services 57 2.1 Systems/ Automation Librarian 76 1.89 Web Services/ Development Librarian 38 1.89 Electronic Resources Librarian 39 1.87 Digital Librarian 41 1.8 Metadata/Cataloger 13 1.77 Distance Education Librarian 6 1.66 Emerging & Instructional Technology Librarian 21 1.61 Reporting dual roles 30 na liBrAriANs AND tecHNoloGY skill AcQuisitioN: issues AND perspectives | rileY-HuFF AND rHoles 135 employment status for “newly minted” MLS graduates having just entered the profession were asked in a survey “Did specific information technology or computer skills lead to you getting a job?” the answer was a “resound- ing yes” by 66 percent of the respondents.33 Experience is divided into categories to ascertain how many positions in each classification category. Each position classification was then assigned a point value base on how the respon- dents rated the difficulty of those particular searches and the classifications were then averaged by difficulty using the same scale that was applied in the administrative sur- vey. Again, almost all levels of difficulties were reported for many classifications but that the overall average hir- ing difficulty rating was 1.9. To provide as accurate a picture as possible the sur- veys asked both groups to indicate if any well known mitigating factors contributed to complications with the job searches. These factors are shown in Table 8 which stacks both groups for comparison. This particular dataset reveals some interesting pat- terns. Those roles that were in the most demand were the also the most difficult to hire for, while these also were the easier positions for candidates to find. Librarians also listed more job categories as having a significant technol- ogy component than the administrators had. Perhaps most notable is the discrepancy shown between how administrators perceive the qualifications of candidates as compared to how candidates view themselves. While both groups acknowledge lack of IT skills and qualifications as the number one mitigating factor, library administrators perceive the problem as being significantly more serious. This data backs up other recent findings that important new job categories are being defined in LIS.30 The data also further support that these roles, while centering on core librarianship principles, have a different skill set.31 ■■ Job Readiness Perspectives Issues of job readiness for academic librarians need to be looked at from a number of different perspectives. Job readiness can be understood in one way by a candi- date and can be something different to an employer. Job readiness is not only of critical concern at the beginning of a librarian’s career, clearly this attribute continues to be significant throughout an individual’s length of service in one or more roles and to one or more employers. Job read- iness is composed of several factors, the most important being education, experience and ongoing skill acquisi- tion. While this is certainly true for all librarians it is of even more concern to those librarians with significant technology roles because of rapid changes in technology. A concern has been established in the literature and in this study that LIS education, in the areas of technol- ogy, may be inadequate and lack the intensity necessary for modern libraries. This perception has been backed up by entrants to the profession.32 That technology skills are extremely important to library employers has been evident for at least a decade. In 2001 a case study on Table 8. Mitigating factors in hiring and job search (n = 93) Administrative Survey: Mitigating factors in hiring as a percentage of respondents to the question (n = 93) % of Responses We had difficulty getting an applicant pool with adequate skills 54 We are unable to offer qualified candidates what we feel is a competitive salary. 38 We are located in what may reasonably be perceived as an undesirable area to live. 23 We are located in an area with a very high cost of living. 23 We have an IT infrastructure or environment that we and/or a candidate may have perceived as unacceptable. 20 The current economic climate has made hiring for these types of positions easier. 18 A successful candidate did not accept an offer of employment 13 Librarian Survey: Mitigating factors in job search as a percentage of respondents to the question (n = 198) % of responses I suspect I may not have/had adequate skills, experience or I was otherwise unqualified. 25 I have not been able to find a position for what I consider to be a fair salary. 11 Many jobs are located in what may reasonably be perceived as an undesirable area to live. 10 Many jobs are located in an area with a very high cost of living. 15 Some jobs have an IT infrastructure or environment that I have perceived as unacceptable. 10 The current economic climate has now made finding these types of positions tougher. 22 I was a successful candidate but I could or did not accept an offer of employment. 3 136 iNForMAtioN tecHNoloGY AND liBrAries | septeMBer 2011 library technology experience they preferred from a can- didate. There were 97 responses; the range of preferred experience was 0–7, the mean was 3.06, and the mode was 3. Librarians were also asked how much experience they had in a technology-related library role. There were 187 responses; the range of experience was 0–39 years, the mean was 8.7, the mode was 5. When participating administrators were asked if they felt it was necessary to have an MLIS librarian fill a technology-related role that is heavily user-centric, 110 administrators responded. also a very important factor, with one study of academic library search committees reporting committee members mentioning that “experience trumps education.”34 This study sought to gather data on possible patterns in the job readiness area. The authors wanted to know how job candidates and employers felt about the viability of new MLS graduates, how experience factored into job readiness, how much experience is out there and how long term experience impacted expectations. The survey asked administrators how many years of Table 9. Question sets related to experience factors by group Administrative Survey Strongly Disagree Disagree Can’t say Agree Strongly Agree New librarians right out of graduate school seem to be adequately prepared (n = 111) 7% 40% 24% 28% 1% Librarians with undergraduate or 2nd graduate degrees in a technology/computer fields seem adequately prepared (n = 109) 1% 9% 48% 39% 4% Librarians with pre-professional technology- related experience seem adequately prepared (n = 109) 1% 6% 47% 41% 8% Librarians with some (up to 3 years) post MLS technology experience seem adequately prepared (n = 111) 1% 10% 17% 62% 10% Librarians with more than 3 years post MLS technology experience seem adequately prepared (n = 111) 1% 3% 24% 55% 16% Librarians never seem adequately prepared for technology roles (n = 111) 19% 55% 12% 7% 6% Librarian Survey Strongly Disagree Disagree Other Agree Strongly Agree As a new librarian right out of graduate school I was adequately prepared (n = 187) 12% 19% No grad degree 3% 42% 8% I have an undergraduate or 2nd graduate degree in a technology/computer field that has helped me be adequately prepared (n = 187) 13% 7% No tech degree 60% 13% 6% I had pre-professional technology-related experience that helped me be adequately prepared (n = 187) 3% 7% No such experience 20% 43% 27% I have less than 3 years of post MLS technology experience and I am adequately prepared (n = 180) 6% 13% na 63% 16% 1% I have more than 3 years of post MLS technology experience and I am adequately prepared (n = 184) 2% 12% na 17% 48% 20% I have never felt like I am adequately prepared for technology roles (n = 186) 19% 43% Neutral 23% 12% 2% liBrAriANs AND tecHNoloGY skill AcQuisitioN: issues AND perspectives | rileY-HuFF AND rHoles 137 readiness of new librarians and the value of related technology degrees. Areas of agreement are noted in the importance of preprofessional experience, three or more years of experience, and the generally positive attitude regarding librarians’ ability to successfully take on signifi- cant technology roles in libraries. ■■ Ongoing Skill Acquisition and Retention How librarians with significant technology roles acquire the skills needed to do their jobs and how they keep those skills current was of great interest in this study. The importance of preprofessional experience has been noted but we should also include the value of service learning in LIS education as an important starting point. Successful service learning experiences include practicum and part- nerships with libraries in need of technology-related services. Successful projects such as online exhibits, wire- less policies, taxonomy-creation and cross-walking for CONTENTdm are just a few of the service projects that have given LIS students real-world experience.35 This Responses ranged from 50 percent “Yes,” 38 percent “No,” and 12 percent “Unsure.” To the same question, 195 practicing technology librarians responded with 58 percent “Yes,” 23 percent “No,” and 20 percent “Unsure.” The administrator participants were asked if they had ever had to fill a technology-related librarian role with a non-MLS hire simply because they were unable to find a qualified librarian to fill the job. Of 106 responses, 22 percent reported that they hired a non-MLS candidate. The librarian participants were also was asked to report on MLS status; out of 194 responses, 93 percent reported holding an MLS or equivalent. The survey also asked the librarian participants to report what year they graduated from their MLS program as the authors felt this data was important to the inherent longitudinal perspectives reported in the study. Of 162 responses, participants reported graduating between 1972–2009. The mean was 1999, the median was 2002, and the mode was 2004. Table 9 shows a question set related to experience factors, which stacks both groups for comparison. There are a few notable points in this particular dataset including what appears to be an area of disagree- ment between administrators and librarians about the Table 10. Education and skill supplementation for librarians with technology roles Administrative Survey: In what ways have you supplemented training for your librarians or professional staff with technology-related roles? (Does not include ALA conferences) % We have paid for technology-related conferences and pre-conferences. 79 We have paid for or allowed time off for classes. 72 We have paid for or allowed time for off online workshops and /or tutorials 87 We have paid for books or other learning materials. 55 We have paid for some or all of a 1st or 2nd graduate degree. 12 We would like to supplement but it is not in our budget. 5 We feel that keeping up with technology is essential for librarians with technology-related roles. 73 Librarian Survey: In what ways have you supplemented your own education related to technology skill development in terms of your time and/or money? (Not including ALA conferences) % I have attended technology-related conferences and pre-conferences. 73 I have taken classes. 60 I have taken online workshops and/or tutorials 87 I have bought books or other learning materials. 77 I am getting a 1st or 2nd graduate degree. 9 We would like to supplement my own education but I can not afford it. 13 I would like to supplement my own education but I do not have time. 13 I have not had to supplement in any way. 1 I feel that keeping up with technology is essential for librarians with technology-related roles. 84 I feel that keeping up with technology is somewhat futile. 11 138 iNForMAtioN tecHNoloGY AND liBrAries | septeMBer 2011 librarians who have transitioned successfully into tech- nology centric roles. This supports the perception that experience and on the job learning play a leading role in the development of technology skills for librarians. Open- ended survey comments also revealed a number of staff who initially were hired in an IT role and then went on to acquire an MLS while continuing in their technology- focused role. Retention is sometimes problematic for librarians with IT roles, primarily because many of them are also employ- able in many other settings apart from libraries. The survey asked administrators “Do you know any librar- ians with technology roles that have taken IT positions outside the library field?” and out of 111 respondents, 33 percent answered “yes.” In open-ended responses the most common reasons administrators felt retention may be a problem was salary, lack of challenges/opportuni- ties, and risk averse cultures. The survey also asked the librarian group “Do you think you would ever consider taking an IT position outside the library field?” Out of 190 respondents; 34 percent answered “yes,” 23 percent “yes, but only if it was education related,” and 42 percent “no.” Additionally 38 percent of these librarian respon- dents knew a librarian who took an IT position outside the library field. For the librarian participants an open response field in the survey, named work environment and lack of support for technology as the most often named reasons for this leaving a position. The surveys used in this research study covered several complicated issues. Those who responded to the surveys were encouraged to leave open text comments research study asked administrators and librarians in what formal ways they supplement their ongoing educa- tion and skill acquisition. Table 10 shows these results in a stacked format for comparison. Also of interest in this data set is the higher level of importance librarians place on continuing skill devel- opment in the area of technology. In open ended text responses a number of librarians reported that the less formal methods of monitoring electronic discussion lists and articles was also a very important part of keeping up with technology in their area. The priority of staying edu- cated, active and current for librarians with significant technology roles cannot be underestimated; what Tennant defines as technology agility, The capacity to learn constantly and quickly. I cannot make this point strongly enough. It does not matter what they know now. Can they assess a new technology and what it may do (or not do) for your library? Can they stay up to date? Can they learn a new technology without formal training? If they can’t they will find it difficult to do the job.36 Not all librarians with technology roles start out in those positions and thus role transformation must be examined. In some cases librarians with more traditional roles such as reference and collection development have transformed their skill set and taken on technology centric roles. Table 11 shows the results of the survey questions related to role transformation in a stacked format for comparison. To be noted in this data set is the large number of Table 11. Role transformation from traditional library roles to technology centric roles and the reverse. Administrative Survey (n = 104) % We have had one or more librarians make this transformation successfully. 53 We have had one or more librarians attempt this transformation with some success. 35 We have had one or more librarians attempt this transformation without success. 17 Some have been interested in doing this but have not done so. 14 We do not seem to have had anyone interested in this 11 We have had one or more librarians who started out in a technology-related librarian role but have left it for a more traditional librarian role. 5 Librarian Survey (n = 184) % I started out in a technology-related librarian role and I am still in it. 45 I have made a complete technology role transformation successfully from another type of librarian role. 30 I have attempted to make a technology role transformation but with only some success. 12 I have made a technology role transformation but sometimes I wish I had not. 9 I have made a technology role transformation but I wish I had not and I am interested in returning to a more traditional librarian role. 9 I am not a librarian. 4 liBrAriANs AND tecHNoloGY skill AcQuisitioN: issues AND perspectives | rileY-HuFF AND rHoles 139 vary considerably from program to program and the con- tent of individual courses appears to vary considerably as well. There appears to be a clear need for additional courses at a more advanced level. This need is evidenced by the experiences of both information technology job candidates and the administrators involved in the hiring decisions. There are clearly still difficulties in both the acquisition of needed skill sets for certain positions and in actual hiring for some information technology positions. There are also some discrepancies between how admin- istrators perceive candidates’ qualifications as compared to how the candidates view themselves. Administrators perceive the problem of a lack of IT skills/qualifications as more serious than do candidates. The two groups also differ on the question of “readiness” of new professionals. The two groups do agree on the importance of prepro- fessional experience, and they both exhibit generally positive attitudes toward librarians’ ability to successfully take on significant technology roles in libraries. in several key areas. A large number of comments were received and many of them were of considerable length. Many individuals clearly wanted to be heard, others were concerned their story would not be captured in the data, and many expressed a genuine overall interest in the topic. A few salient comments from a variety of areas covered are given in table 12. ■■ Conclusion This study seeks to provide an overview of the current issues related to IT staffing in academic libraries by reporting on three areas dealing with library skill acqui- sition and employment. With regards to the status of technology course offerings in LIS programs, there has been a significant increase in the number of technology- related courses, but the numbers of technology courses Table 12. A sample of open ended responses from the two surveys Administrative Survey “There is a huge need for more and adequate technology training for librarians. It is essential for libraries to remain viable in the future.” “Only one library technology position (coordinator) is a professional librarian. Others are professional positions without MLS.” “There is a lot of competition for few jobs, especially in the current economic climate.” “We finally hired at the level of technician as none of the MLS candidates had the necessary qualifications.” “If I wanted a position that would develop strategy for the library’s tools on the web or create a digitization program for special collections, I probably would want an MLS with library experience simply because they understand the expectations and the environment.” “Number of years of experience in technology is not as important as a willingness to learn and keep current. Sometimes old dogs won’t move on to new tricks. Sometimes new dogs aren’t interested in learning tricks.” Librarian Survey “I believe that because technology is constantly changing and evolving, librarians in technology-oriented positions must do the same.” “My problem with being a systems librarian in a small institution is that the job was 24/7/365. Way too much stress with no down time.” “I have left the library field for a few years but came back. My motivation was a higher salary, but that didn’t really happen.” “I’m considering leaving my current position because the technology role (which I do love) was added to my position without much training or support. Now that part of my job is growing so that I can’t keep up with all my duties.” “I don’t think that library school alone prepared me for my job. I had to do a lot of external study and work to learn what I did, and worked as a part-time Systems Library Assistant while in school, where I learned the majority of what prepared me for my current job.” “Library Schools need to be more rigorous about teaching students how to innovate with technology, not just use tools others have built. You can’t convert “traditional” librarians into technology roles without rigorous study. Otherwise, you will get mediocre and even dangerous results.” 140 iNForMAtioN tecHNoloGY AND liBrAries | septeMBer 2011 16. Ibid., 53–54. 17. Thomas W. Leonhardt, “Thoughts on Library Education,” Technicalities 27, no. 3 (2007): 4–7. 18. Thomas W. Leonhardt, “Library and Information Science Education” Technicalities 27, no. 2 (2007): 3–6. 19. Noa Aharony, “Web 2.0 in U.S. LIS Schools: Are They Missing the Boat?” Ariande 30, no. 54 (2008): 1. 20. Chuck Thomas and Salwa Ismail Patel, “Competency- Based Training for Digital Librarians: A Viable Strategy for an Evolving Workforce?” Journal of Education for Library & Information Science, 49, no. 4 (2008): 298–309. 21. Michael J. Miller, “Information Communication Technology Infusion in 21st Century Librarianship: A Proposal for a Blended Core Course,” Journal of Education for Library & Information Science 48, no. 3 (2007): 202–17. 22. “About the iSchools.” (2010); http://www.ischools.org/ site/about/ (accessed 9/1/2010). 23. Laurie J. Bonnici, Manimegalai M. Subramaniam, and Kathleen Burnett, “Everything Old is New Again: The Evolution of Library and Information Science Education from LIS to iField,” Journal of Education for Library & Information Science 50, no. 4 (2009): 263–74; Andrew Abbott, The Chaos of Disciplines (Chicago: Chicago Univ. Pr., 2001). 24. Bonnici, “Everything Old is New Again,” 263–74. 25. Russell A. Hall, “Exploring the Core: An Examination of Required Courses in ALA-Accredited,” Education for Information 27, no. 1 (2009): 57–67. 26. Ibid., 62. 27. Jane M. Davis, “A Survey of Cataloging Education: Are Library Schools Listening?” Cataloging & Cataloging Quarterly 46, no. 2 (2008): 182–200. 28. Aharony, “Web 2.0 in U.S. LIS,” 1. 29. Janie M. Mathews and Harold Pardue, “The Presence of IT Skill Sets in Librarian Position Announcements,” College & Research Libraries 70, no. 3 (2009): 250–57. 30. “Redefining LIS Jobs,” Library Technology Reports 45, no. 3, (2007): 40. 31. Youngok Choi and Edie Rasmussen, “What Qualifications and Skill are Important for Digital Librarian Positions in Academic Libraries? A Job Advertisement Analysis,” The Journal of Academic Librarianship 35, no. 5 (2009): 457–67. 32. Carla J. Soffle and Kim Leeder, “Practitioners and Library Education: A Crisis of Understanding,” Journal of Education for Library & Information Science 46, no. 4 (2005): 312–19. 33. Marta Mestrovic Deyrup and Alan Delozier, “A Case Study on the Current Employment Status of New M.L.S. Graduates,” Current Studies in Librarianship 25, no. 1/2, (2001): 21–38. 34. Mary A. Ball and Katherine Schilling, “Service Learning, Technology and LIS Education,” Journal of Education for Library & Information Science 47, no. 4 (2006): 277–90. 35. Marta Mestrovic Deyrup and Alan Delozier, “A Case Study on the Current Employment Status of New M.L.S. Graduates,” Current Studies in Librarianship 25, no. 1/2 (2001): 21–38. 36. Roy Tennant, “The Most Important Management Decision: Hiring Staff for the New Millennium,” Library Journal 123, no. 3 (1998): 102. More research is still needed to identify the key tech- nology skills needed. Case studies of successful library technology teams and individuals may reveal more about the process of skill acquisition. Questions regarding how much can be taught in LIS courses or practicum, and how much must be expected through on-the-job experience are good areas for more research. References 1. James Michalko, Constance Malpas and Arnold Arcolio, “Research Libraries, Risk and Systematic Change,” OCLC Research (Mar. 2010), http://www.oclc.org/research/publications/ library/2010/2010-03.pdf. 2. Lori A. Goetsch, Reinventing Our Work, “New and Emerging Roles for Academic Librarians,” Journal of Library Administration 48, no. 2 (2008): 157–72. 3. Janie M. Mathews and Harold Pardue, “The Presence of IT Skill Sets in Librarian Position Announcements,” College and Research Libraries 70, no. 3 (2009): 250–57. 4. Peggy Johnson, “From the Editor’s Desk,” Technicalities 27, no. 3 (2007): 2–4. 5. Ton deBruyn, “Questioning the Focus of LIS Education,” Journal of Education for Library & Information Science 48, no. 2 (2007): 108–15. 6. Jacquelyn Erdman, “Education for a New Breed of Librarian,” Reference Librarian 47, no. 98 (2007): 93–94. 7. “Educating Library and Information Science Professionals for a New Century: The KALIPER Report,” Executive Summary. ALIPER Advisory Committee, ALISE. (Reston, Virginia, July 2000), http://www.si.umich.edu/~durrance/TextDocs/ KaliperFinalR.pdf (accessed June 1, 2010). 8. Karen Markey, “Current Educational Trends in Library and Information Science Curricula,” Journal of Education for Library and Information Science 45, no. 4 (2004): 317–39. 9. Michael Gorman, “Whither library education?” New Library World 105, no. 9/10 (2004): 376–80; Michael Gorman, “What Ails Library Education?” Journal of Academic Librarianship 30, no. 2 (2004): 99–101. 10. Andrew Dillon and April Norris, “Crying Wolf: An Examination and Reconsideration of the Perception of Crisis in LIS Education,” Journal of Education for Library & Information Science 46, no. 4, (2005): 208–98. 11. Leigh S. Estabrook, “Crying Wolf: A Response,” Journal of Education for Library & Information Science 46, no. 4 (2005):299–303. 12. Ian M. Johnson, “Education for Librarianship and Information Studies: fit for purpose?” Information Development 23, no.1 (2007): 13–14. 13. James G. Neal, “Raised by Wolves,” Library Journal 131, no. 3 (2006): 42–44. 14. Sheila S. Intner, “Library Education for the Third Millennium,” Technicalities 24, no. 6 (2004): 10–12 15. Renee D. McKinney, “Draft Proposed ALA Core Competencies Compared to ALA-Accredited, Candidate, and Precandidate Program Curricula: A Preliminary Analysis,” Journal of Education for Library & Information Science 47 no.1 (2006): 52–77. 1771 ---- liBrAriANs AND tecHNoloGY skill AcQuisitioN: issues AND perspectives | FArNeY 141click ANAlYtics: visuAliziNG WeBsite use DAtA | FArNeY 141 Tutorial Tabatha A. Farney librarians who create website content should have access to website usage statistics to measure their webpages’ effectiveness and refine the pages as necessary.3 With web analytics libraries can increase the effectiveness of their websites, and as Marshall Breeding has observed, libraries can regularly use website statistics to determine how new webpage content is actu- ally being used and make revisions to the content based on this infor- mation.4 Several recent studies used Google Analytics to collect and report website usage statistics to measure website effectiveness and improve their usability.5 While web analytics are useful in a website redesign pro- cess, several studies concluded that web usage statistics should not be the sole source of information used to evaluate a website. These studies recommend using click data in con- junction with other website usability testing methods.6 Background A lack of research on the use of click analytics in libraries motivated the web services librarian to explore their potential by directly implementing them on the Library’s website. She found that there are several click analytics products available and each has its own unique functional- ity. However, many are commercially produced and expensive. With limited funding, the web services librarian selected Google Analytics’ In-Page Analytics, ClickHeat, and Crazy Egg because they are either free or inexpensive. Each tool was evaluated on the Library’s website for over a six month period. because Google Analytics cannot dis- cern between the same link repeated in multiple places on a webpage. Furthermore, she wanted to use web- site use data to determine the areas of high and low usage on the Library’s homepage, and use this information to justify her webpage reorganization decisions. Although this data can be found in a Google Analytics report, the web services librarian found it difficult to easily identify the neces- sary information within the massive amount of data the reports contain. The web services librarian opted to use click analytics, also known as click density analysis or site overlay, a subset of web analytics that reveals where users click on a webpage.1 A click analytics report produces a visual representation of what and where visitors are clicking on an indi- vidual webpage by overlaying the click data on top of the webpage that is being tested. Rather than wad- ing through the data, libraries can quickly identify what content users are clicking by using a click analyt- ics report. The web services librarian tested several click analytics prod- ucts while reassessing the Library’s homepage. During this process she discovered that each click analyt- ics tool had different functionalities that impacted their usefulness to the Library. This paper introduces and evaluates three click analytics tools, Google Analytics’ In-Page Analytics, ClickHeat, and Crazy Egg, in the context of redesigning the Library’s homepage and discusses the benefits and drawbacks of each. Literature Review Library literature indicates that libraries are actively engaged in interpreting website usage data for a variety of purposes. Laura B. Cohen’s study encourages libraries to use their website usage data to enhance their understanding of how visitors access and use library websites.2 Jeanie M. Welch further recommends that all Click Analytics: Visualizing Website Use Data Editor’s Note: This paper is adapted from a presentation given at the 2010 LITA Forum Click analytics is a powerful tech- nique that displays what and where users are clicking on a webpage help- ing libraries to easily identify areas of high and low usage on a page with- out having to decipher website use data sets. Click analytics is a subset of web analytics, but there is little research that discusses its poten- tial uses for libraries. This paper introduces three click analytics tools, Google Analytics’ In-Page Analytics, ClickHeat, and Crazy Egg, and eval- uates their usefulness in the context of redesigning a library’s homepage. W eb analytics tools, such as Google Analytics, assist libraries in interpreting their website usage statistics by formatting that data into reports and charts. The web services librarian at the Kraemer Family Library at the University of Colorado, Colorado Springs wanted to use website use data to reassess the Library’s homepage that was crowded with redundant links. For example, all the links in the site’s dropdown navi- gation were repeated at the bottom of the homepage to make the links more noticeable to the user, but it unintentionally made the page long. To determine which links the web services librarian would recommend for removal, she needed to compare the use or clicks the repetitive links received. At the time, the Library relied solely on Google Analytics to interpret website use data. However, this practice proved insufficient tabatha A. Farney (tfarney@uccs.edu) is Web services librarian, Kraemer Family library, university of Colorado, Colorado springs, Colorado. 142 iNForMAtioN tecHNoloGY AND liBrAries | septeMBer 2011 libraries, outbound links include library catalogs or subscription databases. Additional javascript tags must be added to each outbound link for Google Analytics to track that data.9 Once Google Analytics recognizes the outbound links, their click data will be available in the In-Page Analytics report. visitors to that page, and Outbound Destinations, links that navigate visitors away from that webpage. The Inbound Sources and Outbound Destinations reports can track out- bound links, which are links that have a different domain or URL address from the website tracked within Google Analytics. For In-Page Analytics Google Analytics is a popular, com- prehensive web analytics tool that contains a click analytics feature called In-Page Analytics (formerly Site Overlay) that visually displays click data by overlaying that infor- mation on the current webpage (see figure 1). Site Overlay was used dur- ing the Library’s redesign process, however, it was replaced by In-Page Analytics in October 2010.7 The web services librarian reassessed the Library’s homepage using In-Page Analytics, and found that the current tool resolved some of Site Overlay’s shortcomings. Site Overlay is no lon- ger accessible in Google Analytics, so this paper will discuss In-Page Analytics. Essentially, In-Page Analytics is an updated version of the Site Overlay (see figure 2). In addition to visually representing click data on a webpage, In-Page Analytics contains new features including the ability to easily segment data. Web analytics expert, Avinash Kaushik, stresses the importance of segmenting website use data because it breaks down the aggregated data into specific data sets that represents more defined groups of users.8 Rather than studying the total number of clicks a link received, an In-Page Analytics report can seg- ment the data into specific groups of users, such as mobile device users. In-Page Analytics provides several default segments, but custom seg- ments can also be applied allowing libraries to further filter the data that is constructive to them. In-Page Analytics also displays a complementing overview report of statistics located in a side panel next to the typical site overlay view. This overview report extracts useful data from other reports generated in Google Analytics without hav- ing to leave the In-Page Analytics report screen. The report includes the webpage’s Inbound Sources, also called top referrals, which are links from other webpages leading Figure 1. Screenshot of Google Analytics’ Defunct Site Overlay Figure 2. Screenshot of Google Analytic’s In-Page Analytic liBrAriANs AND tecHNoloGY skill AcQuisitioN: issues AND perspectives | FArNeY 143click ANAlYtics: visuAliziNG WeBsite use DAtA | FArNeY 143 services librarian uses a screen cap- ture tool, such as the Firefox add-on Screengrab13, to collect and archive the In-Page Analytics reports, but the process is clunky and results in the loss of the ability to segment the data. ClickHeat Labsmedia’s ClickHeat is an open source heat mapping tool that visu- ally displays the clicks on a webpage using color to indicate the amount of clicks an area receives. Similar to In-Page Analytics, a ClickHeat heat map displays the current webpage and overlays that page with click data (see figure 3). Instead of list- ing percentages or actual numbers of clicks, the heat map represents clicks using color. The warmer the color, such as yellows, oranges, or reds, the more clicks that area receives; the absence of color implies little to no click activity. Each heat map has an indicator that outlines the number of clicks a color represents. A heat map clearly displays the heavily used and underused sections on a web- page making it easy for people with little experience interpreting website usage statistics to interpret the data. However, a heat map is not about exact numbers, but rather general areas of usage. For exact numbers, a traditional, comprehensive web ana- lytics tool is required. ClickHeat can stand alone or be integrated into other web analytic tools.14 To have a more comprehensive web analytics product, the web services librarian opted to use the ClickHeat plugin for Piwik, a free, open source web analytics tool that seeks to be an alternative to Google Analytics.15 By itself Piwik has no click analytics feature, therefore ClickHeat is a use- ful plugin. Both Piwik and ClickHeat require access to a web server for instal- lation and knowledge of PHP and MySQL to configure them. Because the Kraemer Family Library does not maintain its own web servers, the pages, but it is time consuming and may not be worth the effort since the data are indirectly available.11 A major drawback to In-Page Analytics is that it does not discern between the same links listed in mul- tiple places on a webpage. Instead it tracks redundant links as one link, making it impossible to distinguish which repeated link received more use on the Library’s homepage. Similarly, the Library’s homepage uses icons to help draw attention to certain links. These icons are linked images next to their counterpart text link. Since the icon and text link share the same URL, In-Page Analytics can- not reveal which is receiving more clicks. In-Page Analytics is useless for comparing repetitive links on a web- page, but Google reports that they are working on adding this capability.12 As stated earlier, In-Page Analytics lays the click data over the current webpage in real-time, which can be both useful and limiting. Using the current webpage allows libraries to navigate through their site while staying within the In-Page Analytics report. Libraries can fol- low in the tracks of website users to learn how they interact with the site’s content and navigation. The downside is that it is difficult to compare a new version of a webpage with an older version since it only displays the current webpage. For example, the web services librarian could not accurately compare the use data between the old homepage and the revised homepage within the In-Page Analytics report because the newly redesigned homepage replaced the old page. Comparing different versions of a webpage could help determine whether the new revisions improved the page or not. An archive or export feature would remedy this problem, but In-Page Analytics does not have this capac- ity. Additionally, an export function would improve the ability to share this report with other librarians with- out having them login to the Google Analytics website. Currently, the web Evaluation of In-Page Analytics In-Page Analytics’ advanced seg- menting ability far exceeds the old Site Overlay functionality. Segmenting click data at the link level helps web managers to see how groups of users are navigating through a website. For example, In-Page Analytics can monitor the links mobile users are clicking, allowing web managers to track how that group of users are navigating through a website. This data could be used in designing a mobile version of a site. In-Page Analytics integrates a site overlay report and an overview report that contains selected web use statistics for an individual webpage. Although the overview report is not in visual context with the site over- lay view, it combines the necessary data to determine how a webpage is being accessed and used. This assists in identifying possible flaws in a website’s navigation, layout, or content. It also has the potential to clarify misleading website statis- tics. For instance, Google Analytics Top Exit Pages report indicates the Library’s homepage is the top exit page for the site. Exit pages are the last page a visitor views before leav- ing the site.10 Having a high exit rate could imply visitors were leaving the Library’s site from the homepage and potentially missing a majority of the Library’s online resources. Using In-Page Analytics, it was apparent the Library’s homepage had a high number of exits because many visi- tors clicked on outbound links, such as the library catalog, that navigated visitors away from the Library’s web- site. Rather than finding a potential problem, In-Page Analytics indicated that the homepage’s layout success- fully led visitors to a desired point of information. While the data from the outbound links is available in the data overview report, it is not dis- played within the site overlay view. It is possible to work around this problem by creating internal redirect 144 iNForMAtioN tecHNoloGY AND liBrAries | septeMBer 2011 The precise number of clicks is avail- able in traditional web analytics reports. Installing and configuring ClickHeat is a potential drawback for some libraries that do not have access to the necessary technology or staff to maintain it. Even with access to a web server and knowl- edgeable staff, the web services librarian still experienced glitches implementing ClickHeat. She could not add ClickHeat to any high traf- ficked webpage because it created a slight, but noticeable, lag in response time to any page it was added. The cause was an out-of-box configura- tion setting that had to be fixed by the campus’ Information Technology Department.17 Another concern for libraries is that ClickHeat is con- tinuously being developed with new versions or patches released peri- odically.18 Like any locally installed software, libraries must plan for con- tinuing maintenance of ClickHeat to keep it current. Just as with In-Page Analytics, ClickHeat has no export or archive function. This impedes the web main navigation on the homepage and opted to use links prominently displayed within the homepage’s con- tent. This indicated that either the users did not notice the main naviga- tion dropdown menus or that they chose to ignore them. Further usabil- ity testing of the main navigation is necessary to better understand why users do not utilize it. ClickHeat is most useful when combined with a comprehensive web analytics tool, such as Piwik. Since ClickHeat only collects data where visitors are clicking, it does not track other web analytics metrics, which limits its ability to segment the click data. Currently, ClickHeat only segments clicks by browser type or screen resolution. Additional seg- menting ability would enhance this tool’s usefulness. For example, the ability to segment clicks from new visitors and returning visitors may reveal how visitors learn to use the Library’s homepage. Furthermore, the heat map report does not provide the actual number of clicks on indi- vidual links or content areas since heat maps generalize click patterns. web services librarian worked with the campus’ Information Technology Department to install Piwik with the ClickHeat plugin on a campus web server. Once installed, Piwik and ClickHeat generate javascript tags that must be added to every page that website use data will be tracked. Although Piwik and ClickHeat can be integrated, the tools work separately so two javascript tags must be added to a webpage to track click data in Piwik as well as in ClickHeat. Only the pages that contain the ClickHeat tracking script will generate heat maps that are then stored within the local Piwik interface. Evaluation of ClickHeat In-Page Analytics only tracks links or items that perform some sort of action, such as playing a flash video,16 but ClickHeat tracks clicks on internal links, outbound links, and even non- linked objects, such as images. Hence, ClickHeat is able to track clicks on the entire webpage. Tracking non-linked objects was unexpectedly useful in identifying potential flaws in a web- page’s design. For instance, within a week of beta testing the Library’s redesigned homepage, it was evident that users clicked on the graphics that were positioned closely to text links. The images were intended to draw the user’s attention to the text link, but instead users clicked on the graphic itself expecting it to be a link. To alleviate possible user frustration, the web services librarian added links to the graphics to take visitors to the same destinations as their companion text links. ClickHeat treats every link or image as its own separate component, so it has the ability to compare the same link listed in multiple places on the same page. Unlike In-Page Analytics, ClickHeat was particularly helpful in analyzing which redundant links received more use on the homep- age. In addition, the heat map also revealed that users ignored the site’s Figure 3. Screenshot of ClickHeat’s heat map report liBrAriANs AND tecHNoloGY skill AcQuisitioN: issues AND perspectives | FArNeY 145click ANAlYtics: visuAliziNG WeBsite use DAtA | FArNeY 145 clicks that area has received with the brighter colors representing the higher percentage of clicks. The plus signs can be expanded to show the total number of clicks an item has received, and this number can be easily filtered into eleven predefined allowing Crazy Egg to differentiate between the same link or image listed multiple times on a webpage. Crazy Egg displays this data in color-coded plus signs which are located next to the link or graphic it represents. The color is based on the percentage of services librarian’s ability to share the heat maps and compare different ver- sions of a webpage. Again, the web services librarian manually archives the heat maps using a screen capture tool, but the process is not the perfect solution. Crazy Egg Crazy Egg is a commercial, hosted click analytics tool selected for this project primarily for its advanced click tracking functionality. It is a fee-based service that requires a monthly subscription. There are several subscription packages based on the number of visits and “snap- shots.” Snapshots are webpages that are tracked by Crazy Egg. The Kraemer Family Library subscribes to the standard package that allows up to twenty snapshots at one time with a combined total of 25,000 visits a month. To help manage how those visits are distributed, each tracked page can be assigned a specific num- ber of visits or time period so that one webpage does not use all the visits early in the month. Once a snapshot reaches its target number of visits or its allocated time period, it automatically stops tracking clicks and archives that snapshot within the Crazy Egg website.19 The snapshots convert the click data into three different click ana- lytic reports: heat map, site overlay, and something called “confetti view.” Crazy Egg’s heat map report is comparable to ClickHeat’s heat map; they both use intensity of col- ors to show high areas of clicks on a webpage (see figure 4). Crazy Egg’s site overlay is similar to In-Page Analytics in that they both display the number of clicks a link receives (see figure 5). Unlike In-Page Analytics, Crazy Egg tracks all clicks including outbound links as well as nonlinked content, such as graph- ics, if it has received multiple clicks. Every clicked link and graphic is treated as its own separate entity, Figure 4. Screenshot of Crazy Egg’s heat map report Figure 5. Screenshot of Crazy Egg’s site overlay report 146 iNForMAtioN tecHNoloGY AND liBrAries | septeMBer 2011 to decide which redundant links to remove from the homepage. The confetti view report was useful for studying clicks on the entire web- page. Segmenting this data allowed the web services librarian to identify click patterns on the webpage from a specific group. For example, the report revealed that mobile device users would scroll horizontally on the homepage to click on content, but rarely vertically. She also focused on the time to click segment, which reports how long it took a visitor to click on something, in the confetti view to identify links or areas that took users over half a minute to click. Both segments provided interesting information, but further usability testing is necessary to better under- stand why mobile users preferred not to scroll vertically or why it took users longer to click on certain links. Crazy Egg also has the ability to archive its snapshots within its profile. This is useful for comparing different versions of a webpage to discover if the modifications were an improve- ment or not. One goal for the Library’s homepage redesign was to shorten the page so users did not have to scroll Evaluation of Crazy Egg Crazy Egg combines the capabilities of In-Page Analytcis and ClickHeat in one tool and expands on their abili- ties. It is not a comprehensive web analytics tool like Google Analytics or Piwik, but rather is designed to specifically track where users are clicking. Crazy Egg’s heat map report is comparable to the one freely avail- able in ClickHeat, however, its site overlay and confetti view reports are more sophisticated than what is currently available for free. The web services librarian found Crazy Egg to be a worthwhile investment during the Library’s homepage redesign because it provided addi- tional context to show how users were interacting with the Library’s website. The site overlay facilitated the ability to compare the same link listed in multiple locations on the Library’s homepage. Not only could the web services librarian see how many clicks the links received, but she could also segment and com- pare that data to learn which links users were finding faster and which links new visitors or returning visi- tors preferred. This data helped her segments that include day of week, browser type, and top referring websites. Custom segments may be applied if they are set up within the Crazy Egg profile. The confetti view report displays every click the snapshot recorded and overlays those clicks as colored dots on the snapshot as shown in figure 6. The color of the dot corre- sponds to specific segment value. The confetti view report uses the same default segmented values used in the site overlay report but here they can be further filtered into defined values for that segment. For example, the confetti view can segment the clicks by window width and then further filter the data to display only the clicks from visitors with window widths under 1000 pixels to see if users with smaller screen resolutions are scrolling down long webpages to click on content. This information is hard to glean from Crazy Egg’s site overlay report because it focuses on the individual link or graphic. The confetti view report focuses on clicks at the webpage level, allowing libraries to view usage trends on a webpage. Crazy Egg is a hosted service like Google Analytics, which means all the data are stored on Crazy Egg’s web servers and accessed through its website. Implementing Crazy Egg on a webpage is a two-step process requiring the web manager to first set up the snapshot within the Crazy Egg profile and then add the tracking javascript tags to the webpage it will track. Once the javascript tags are in place, Crazy Egg takes a picture of the current webpage and stores that as the snapshot on which to overlay the click data reports. Since it uses a “snapshot” of the webpage, the website manager needs to retake a snapshot of the webpage if there are any changes to it. Retaking the snap- shot requires only a click of a button to automatically stop the old snap- shot and regenerate a new one based on the current webpage without hav- ing to change the javascript tags. Figure 6. Screenshot of Crazy Egg’s confetti view report liBrAriANs AND tecHNoloGY skill AcQuisitioN: issues AND perspectives | FArNeY 147click ANAlYtics: visuAliziNG WeBsite use DAtA | FArNeY 147 website. Next, she will explore ways to automate the process of sharing of website use data to make this information more accessible to other interested librarians. By sharing this information, the web services librarian hopes to promote informed decision making for the Library’s web content and design. References 1. Avinash Kaushik, Web Analytics 2.0: The Art of Online Accountability and Science of Customer Centricity (Indianapolis: Wiley, 2010): 81–83. 2. Laura B. Cohen, “A Two-Tiered Model for Analyzing Library Website Usage Statistics, Part 2: Log File Analysis,” portal: Libraries & the Academy 3, no. 3 (2003): 523–24. 3. Jeanie M. Welch, “Who Says We’re Not Busy? Library Web Page Usage as a Measure of Public Service Activity,” Reference Services Review 33, no. 4 (2005): 377–78. 4. Marshall Breeding, “An Analytical Approach to Assessing the Effectiveness of Web-Based Resources,” Computers in Libraries, 28, no. 1 (2008): 20–22. 5. Julie Arendt and Cassie Wagner, “Beyond Description: Converting Web Site Statistics into Concrete Site Improvement Ideas,” Journal of Web Librarianship 4, no. 1 (January 2010): 37–54; Steven J. Turner, “Websites Statistics 2.0: Using Google Analytics to Measure Library Website Effectiveness,” Technical Services Quarterly 27, no. 3 (2010): 261–278; Wei Fang and Marjorie E. Crawford, “Measuring Law Library Catalog Web Site Usability: A Web Analytic Approach,” Journal of Web Librarianship 2, no. 2–3 (2008): 287–306. 6. Ardent and Wagner, “Beyond Description,” 51–52; Andrea Wiggins, “Data-Driven Design: Using Web Analytics to Validate Heuristics,” Bulletin of the American Society for Information Science and Technology 33, no. 5 (2007): 20–21; Elizabeth L. Black, “Web Analytics: A Picture of the Academic Library Web Site User,” Journal of Web Librarianship 3, no. 1 (2009): 12–13. 7. Trevor Claiborne, “Introducing In-Page Analytics: Visual Context for your Analytics Data,” Google Analytics Blog, Oct. 15, 2010, http://analytics.blogspot .com/2010/10/introducing-in-page-ana tracking abilities, however, all pro- vide a distinct picture of how visitors use a webpage. By using all of them, the web services librarian was able to clearly identify and recommend the links for removal. In addition, she identified other potential usability concerns, such as visitors clicking on nonlinked graphics rather than the link itself. A major bonus of using click ana- lytics tools is their ability to create easy to understand reports that instantly display where visitors are clicking on a webpage. No previous knowledge of web analytics is required to under- stand these reports. The web services librarian found it simple to present and discuss click analytics reports with other librarians with little to no background in web analytics. This helped increase the transparency of why links were targeted for removal from the homepage. As useful as click analytics tools are, they cannot determine why users click on a link, only where they have clicked. Click analytics tools simply visualize website usage statistics. As Elizabeth Black reports, these “sta- tistics are a trail left by the user, but they do not explain the motivations behind the behavior.”20 She concludes that additional usability studies are required to better understand users and their interactions on a website.21 Libraries can use the click analytics reports to identify a problem on a webpage, but further usability testing will explain why there is a problem and help library web managers fix the issue and prevent repeating the mistake in the future. The web services librarian incor- porated the use of In-Page Analytics, ClickHeat, and Crazy Egg in her web analytics practices since these tools continue to be useful to test the usage of new content added to a webpage. Furthermore, she finds that click analytics’ straightforward reports prompted her to share web- site use data more often with fellow librarians to assist in other decision- making processes for the Library’s down too much to get to needed links. By comparing the old homepage and the new homepage confetti reports in Crazy Egg, it was instantly apparent that the new homepage had signifi- cantly fewer clicks on its bottom half than the old version. Furthermore, comparing the different versions using the time to click segment in the site overlay showed that plac- ing the link more prominently on the webpage decreased the overall time it took users to click on it. Crazy Egg’s main drawback is that archived pages that are no longer tracking click data count toward the overall number of snapshots that can be tracked at one time. If libraries regularly retest a webpage, they will easily reach the maximum number of snapshots their subscription permits in a relatively short period. Once a Crazy Egg subscription is cancelled data stored in the account is no longer accessible. This increases the importance of regularly export- ing data. Crazy Egg is designed to export the heat map and confetti view reports. The direct export func- tion takes a snapshot of the current report as it is displayed, and auto- matically converts that image into a PDF. Exporting the heat map is fairly simple because the report is a single image, but exporting all the content in the confetti view report is more difficult because the report is based on segments of click data. Each seg- ment type would have to be exported in a separate PDF report to retain all of the content. In addition, there is no export option for the site overlay report so there is not an easy method to manage that information outside of Crazy Egg. Even if libraries are actively exporting reports from Crazy Egg, data loss is inevitable. Summary and Conclusions Closely examining In-Page Analytics, ClickHeat, and CrazyEgg reveals that each tool has different levels of click 148 iNForMAtioN tecHNoloGY AND liBrAries | septeMBer 2011 (2009): 81–84. 17. ClickHeat Performance and Optimization, Labsmedia, http://www .labsmedia.com/clickheat/156894.html (accessed Feb. 7, 2011). 18. ClickHeat, Sourceforge, http:// sourceforge.net/projects/clickheat/files/ (accessed Feb. 7, 2011). 19. Crazy Egg, http://www.crazyegg .com/, (accessed on Mar. 25, 2011). 20. Black, “Web Analytics,” 12. 21. Ibid., 12–13. 13. Screengrab, Firefox Add-Ons, https://addons.mozilla.org/en-US/fire fox/addon/1146/ (accessed Feb. 7, 2011). 14. ClickHeat, Labsmedia, http:// www.labsmedia.com/clickheat/index .html (accessed Feb. 7,2011). 15. Piwik, http://piwik.org/ (accessed Feb. 7, 2011). 16. Paul Betty, “Assessing Homegrown Library Collections: Using Google Analytics to Track Use of Screencasts and Flash-Based Learning Objects,” Journal of Electronic Resources Librarianship, 21, no. 1 lytics-visual.html (accessed Feb. 7, 2011). 8. Kaushik, Web Analytics 2.0, 88. 9. Turner, “Websites Statistics 2.0,” 272–73. 10. Kaushik, Web Analytics 2.0, 53–55. 11. Site Overlay not Displaying Outbound Links, Google Analytics Help Forum, http://www.google.com/ support/forum/p/Google+Analytics/ thread?tid=39dc323262740612&hl=en (accessed Feb. 7, 2011). 12. Claiborne, “Introducing In-Page Analytics.” 1844 ---- A File Storage Service on a Cloud Computing Environment for Digital Libraries Victor Jesús Sosa-Sosa and Emigdio M. Hernandez-Ramirez INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2012 34 ABSTRACT The growing need for digital libraries to manage large amounts of data requires storage infrastructure that libraries can deploy quickly and economically. Cloud computing is a new model that allows the provision of information technology (IT) resources on demand, lowering management complexity. This paper introduces a file-storage service that is implemented on a private/hybrid cloud-computing environment and is based on open-source software. The authors evaluated performance and resource consumption using several levels of data availability and fault tolerance. This service can be taken as a reference guide for IT staff wanting to build a modest cloud storage infrastructure. INTRODUCTION The information technology (IT) revolution has led to the digitization of every kind of information.1 Digital libraries are appearing as one more step toward easy access to information spread throughout a variety of media. The digital storage of data facilitates information retrieval, allowing a new wave of services and web applications that take advantage of the huge amount of data available.2 The challenges of preserving and sharing data stored on digital media are significant compared to the print world, in which data “stored” on paper can still be read centuries or millennia later. In contrast, only ten years ago, floppy disks were a major storage medium for digital data, but now the vast majority of computers no longer support this type of device. In today’s environment, selecting a good data repository is important to ensure that data are preserved and accessible. Likewise, defining the storage requirements for digital libraries has become a big challenge. In this context, IT staff—those responsible for predicting what storage resources will be needed in the medium term—often face the following scenarios: • Prediction of storage requirements turn out to be below real needs, resulting in resource deficits. • Prediction of storage requirements turn out to be above real needs, resulting in expenditure and administration overhead for resources that end up not being used. In these situations, considering only an efficient strategy to store documents is not enough.3 The acquisition of storage services that implement an elastic concept (i.e., storage capacity that can be Victor Jesús Sosa-Sosa (vjsosa@tamps.cinvestav.mx) is Professor and Researcher at the Information Technology Laboratory at CINVESTAV, Campus Tamaulipas, Mexico. Emigdio M. Hernandez-Ramirez (emhr1983@gmail.com) is Software Developer, SVAM International, Ciudad Victoria, Mexico. INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2012 35 increased or reduced on demand, with a cost of acquisition and management relatively low) becomes attractive. Cloud computing is a current trend that considers the Internet as a platform providing on-demand computing and software as a service to anyone, anywhere, and at any time. Digital libraries naturally should be connected to cloud computing to obtain mutual benefits and enhance both perspectives.4 In this model, storage resources are provisioned on demand and are paid according to consumption. Services deployment in a cloud-computing environment can be implemented three ways: private, public, or hybrid. In the private option, infrastructure is operated solely for a single organization; most of the time, it requires an initial strong investment because the organization must purchase a large amount of storage resources and pay for the administration costs. The public cloud is the most traditional version of cloud computing. In this model, infrastructure belongs to an external organization where costs are a function of the resources used. These costs include administration. Finally, the hybrid model contains a mixture of private and public. A cloud-computing environment is mainly supported by technologies such as virtualization and service-oriented architectures. A cloud environment provides omnipresence and facilitates deployment of file-storage services. It means that users can access their files via the Internet from anywhere and without requiring the installation of a special application. The user only needs a web browser. Data availability, scalability, elastic service, and pay-per-use are attractive characteristics found in the cloud service model. Virtualization plays an important role in cloud computing. With this technology, it is possible to have facilities such as multiple execution environments, sandboxing, server consolidation, use of multiple operating systems, and software migration, among others. Besides virtualization technologies, emerging tools that allow the creation of cloud-computing environments also support this type of computing model, providing dynamic instantiation and release of virtual machines and software migration. Currently, it is possible to find several examples of public cloud storage, such as Amazon S3 (http://aws.amazon.com/en/s3), RackSpace (http://www.rackspace.com/cloud/public/files), and Google Storage (https://developers.google.com/storage), each of which provide high availability, fault tolerance, and services and administration at low cost. For organizations that do not want to use a third-party environment to store their data, private cloud services may offer a better option, although the cost is higher. In this case, a hybrid cloud model could be an affordable solution. Organizations or individual users, can store sensitive or frequently used information in the private infrastructure and less sensitive data in the public cloud. The development of a prototype of a file-storage service implemented on a private and hybrid cloud environment using mainly free and open-source software (FOSS) helped us to analyze the behavior of different replication techniques. We paid special attention to the cost of the system implementation, system efficiency, resource consumption, and different levels of data privacy and availability that can be achieved by each type of system. http://aws.amazon.com/en/s3 http://www.rackspace.com/cloud/public/files https://developers.google.com/storage A FILE STORAGE SERVICE ON A CLOUD COMPUTING ENVIRONMENT FOR DIGITAL LIBRARIES | SOSA-SOSA 36 INFRASTRUCTURE DESCRIPTION The aim of this prototyping project was to design and implement scalable and elastic distributed storage architecture in a cloud-computing environment using free, well-known, open-source tools. This architecture represents a feasible option that digital libraries can adopt to solve financial and technical challenges when building a cloud-computing environment. The architecture combines private and public clouds by creating a hybrid cloud environment. For this purpose, we evaluated tools such as KVM and XEN, which are useful for creating virtual machines (VM).5 Open Nebula (http://opennebula.org), Eucalyptus (http://www.eucalyptus.com), and OpenStack (http://www.openstack.org) are good, free options for managing a cloud environment. We selected Open Nebula for this prototype. Commodity hard drives have a relatively high failure rate, hence our main motivation to evaluate different replication mechanisms, providing several levels of data availability and fault tolerance. Figure 1(a) shows the core components of our storage architecture (the private cloud), and figure 1(b) shows a distributed storage web application named Distributed Storage On the Cloud (DISOC), used as a proof of concept. The private cloud also has an interface to access a public cloud, thus creating a hybrid environment. Figure 1. Main Components of the Cloud Storage Architecture The core components and modules of the architecture are the following: • Virtual Machine (VM). We evaluated different open-source were evaluated, such as KVM and XEN, for the creation of virtual machines.6 Some performance tests were done, and KVM showed a slightly higher performance than XEN. We selected KVM as the main Virtual Machine Manager (VMM) for the proposed architecture. VMMs also are called http://opennebula.org/ http://www.eucalyptus.com/ http://www.openstack.org/ INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2012 37 Hypervisors. Each VM has a Linux operating system that is optimized to work in virtual environments and requires a minimum consumption of disk space. The VM also includes an Apache web server, a PHP module, and some basic tools that were used to build the DISOC web application. Every VM is able to transparently access a pool of disks through a special data access module, which we called DAM. More details about DAM follow. • Virtual Machine Manager Module (VMMM). This has the function of dynamic instantiation and de-instantiation of virtual machines depending on the current load on the infrastructure. • Data Access Module (DAM). All of the virtual disk space required by every VM was obtained through the Data Access Module Interface (DAM-I). DAM-I allows VMs to access disk space by calling DAM, which provides transparent access to the different disks that are part of the storage infrastructure. DAM allocates and retrieves files stored throughout multiple file servers. • Load Balancer Module (LBM). This distributes the load among different VMs instantiated on the physical servers that make up the private cloud. • Load Manager (LM). This monitors the load that can occur in the private cloud. • Distributed Storage on the Cloud (DISOC). This is a web-based file-storage system that is used as a proof of concept and was implemented based on the proposed architecture. REPLICATION TECHNIQUES High availability is one of the important features offered in a storage service deployed in the cloud. The use of replication techniques has been the most useful proposal to achieve this feature. DAM is the component that provides different levels of data availability. It currently includes the following replication policies: no-replication, total-replication, mirroring, and IDA-based replication. • No-Replication. This replication policy represents the data availability method with the lowest level of fault tolerance. In this method, only the original version of a file is stored in the disk pool. It follows a round-robin allocation policy whereby load assignation is made based on a circularly linked list, taking into account disk availability. This policy prevents all files from being allocated to the same server, providing a minimal fault tolerance in case a server failure. • Mirroring. This replication technique is a simple way to ensure higher availability without high resource consumption. In this replication, every time a file is stored in a disk, the DAM creates a copy and places it on a different disk. • Total-replication. This represents the highest data availability approach. In this technique, a copy of the file is stored on all of the file servers available. Total-replication also requires the highest consumption of resources. • IDA-based replication. To provide higher data availability with less impact on the consumption of resources, an alternative approach based on information-dispersal techniques can be used. The Information Dispersal Algorithm (IDA) is an example of this A FILE STORAGE SERVICE ON A CLOUD COMPUTING ENVIRONMENT FOR DIGITAL LIBRARIES | SOSA-SOSA 38 strategy.7 When a file (of size |F|) is required to be stored using the IDA, the file is partitioned into n fragments of size |F|/m, where m AWS Linode Content Management System Joomla Linux, Apache, PHP, MySQL, Afghanistan Digital Libraries AWS Linode Website HTML HTML Sonoran Desert Knowledge Exchange AWS Linode Integrated Library System Koha Linux, Apache, Perl, MySQL Afghanistan Higher Education Union Catalog AWS Linode Web applications Home-grown J2EE web application J2EE, Java, Tomcat Japanese GIF (Global Interlibrary-loan) Holding Finder at Linode at Google App Engine AWS Linode Google App Engine Computing Services Monitoring Nagios Linux, Perl Internal application AWS Linode Networked Devices Administration SSH, SFTP Linux N/A AWS Linode 204 iNForMAtioN tecHNoloGY AND liBrAries | DeceMBer 2011 meet users’ needs at will. Rebuilding nodes and creating imaging are also easier on the cloud. Server failure resulting from hard- ware error can result in significant downtime. The UAL has a few server failure in the past few years. Last year a server’s RAID hard drives failed. The time spent on ordering new hard disks, waiting for server com- pany technician’s arrival, and finally rebuilding software environment (e.g., OS, web servers, application servers, user and group privileges) took six or more hours, not to mention about stress rising among custom- ers due to unavailability of services. Mirroring servers could minimize service downtime, but the cost would be almost doubled. In comparison, in the cloud computing model, the author took a few snapshots using the AWS web management interface. If a node fails, the author can launch an instance using the snapshot within a minute or two. Factors such as software and hardware failure, natural disasters, network failure, and human errors are the main causes for system down- time. The cloud computing providers generally have multiple data cen- ters in different regions. For instance, Amazon S3 and Google AppEngine are claimed to be highly available and highly reliable. Both AWS and Google App Engine offer automatic scaling and load balancing. The cloud computing providers have huge advantages in offering high avail- ability to minimize hardware failure, natural disasters, network failure, and human errors, while the locally man- aged server and storage approach has to be invested a lot to reduce these risks. In 2009 and 2010 the University of Arizona has experienced at least two network and server outages each lasting a few hours; one failure was because of human error and the other was because of a power failure from Tucson Electric Power. When a power line was cut by accident, what can you do? In comparison, over the past two years minimal downtime from includes 12TB hard disks (about 10TB usable space after RAID 5 configuration) with 5-year support, assuming 5-year life expectancy. ■❏ Operation expense: $1,438– $2,138 per year. ■● System administrator cost: $700–$1,200. See above. ■● Space cost: $300. See above. ■● Electricity costs: $438 per year. See above. ■● Network cost ignored. technology Analysis There is no need to purchase a server; no need to initial a cloud node; no need to setup security policies; no need to install Tomcat, Java and J2EE environment; and no need to update software. Compared to the traditional approach, PaaS eliminates upfront hardware and software investment, reduces time and work for setting up running environment, and removes hardware and software upgrade and maintenance tasks. IaaS eliminates upfront hardware investment along with other technical advantages dis- cussed below. The cloud computing model offers much better scalability over the traditional model due to its flexibility and lower cost. In our repository, the initial storage requirement is not significant, but can grow over time if more digital collections are added. In addition, the number of visits is not high, but can increase significantly later. An accurate estimate of both factors can be difficult. In the tra- ditional model, a purchased server has preconfigured hardware with limited storage. Upgrading storage and processing power can be costly and problematic. Downtime will be certain during the upgrade process. In comparison, the cloud comput- ing model provides an easy way to upgrade storage and processing power with no downtime if han- dling well. Bigger storage and larger instances with high-memory or high- CPU can be added or removed to ■■ Electricity cost: $2,190 = 5 years x 365 days/year x 24 hours/day x 0.5 kilowatt / hour x $0.10/kilowatt. Most libraries running digital library programs require big storage for preserving digitization files. The analysis below just illustrates a com- parison of the TCO of 10TB space. It shows that the TCO of locally man- aged storage has lower costs than Amazon S3’s storage TCO. Though the cloud computing model still have the advantage of on-demand, avoid big initial investment on equipment, the author believes that locally man- aged storage may be a better solution if planned well. Since Amazon S6 storage pricing decreases from $0.14/GB to $0.095/GB over 500TB, Amazon S3’s TCO might be lower if an organization has huge amounts of data. The author suggests readers should do their own analysis. ■■ The TCO of 10TB in Amazon S3 per year: $16,800. Note: Amazon S3 replicate data at least 3 times, assuming these preservation files do not need constant changes. Otherwise, data transfer fees could be high. ■❏ Operation expense: $16,800 per year. ■● $16,800 = $1,400/month x 12 months. (based on Amazon S3 pricing of $0.14/GB per month) ■● Network cost ignored. ■■ The TCO of a 10TB physical stor- age per year: $11,212–$12,612. ■❏ To match reliability of Amazon S3, local managed storage needs three copies of data: two in hard disk and one in tape. Note: Dell AX4–5I SAN storage: quoted on October 26, 2010. Replicate data 3 times, including 2 copies in hard disks, one copy in tape. Ignoring time value of money, 3 percent inflation per year based on CPI statistic data. ■❏ Hardware: $4,168 per year. ■● $20,840 a SAN storage selectiNG A weB coNteNt MANAGeMeNt sYsteM For AN AcADeMic liBrArY weBsite | HAN 205clouD coMPutiNG: cAse stuDies AND totAl costs oF owNersHiP | HAN 205 ’06), Nov. 6–8, 2006, Seattle, Wash., h t t p s : / / w w w. u s e n i x . o r g / e v e n t s / o s d i 0 6 / t e c h / c h a n g / c h a n g _ h t m l / (accessed Apr. 21, 2010). 16. Google, “GQL Reference, 2010, http://code.google.com/appengine/ docs/python/datastore/gqlreference .html (accessed Apr. 21, 2010); Google Developers, “Campfire One: Introducing Google App Engine (pt. 3),” 2010, http:// www.youtube.com/watch?v=oG6Ac7d- Nx8 (accessed Apr. 21, 2010). 17. David Chappell, “Introducing Windows Azure,” 2009, http://down- load.microsoft.com/download/e/4/3/ e43bb484–3b52–4fa8-a9f9-ec60a32954bc/ Azure_Services_Platform.pdf (accessed Apr. 2, 2010). 18. Linode, “Linode—Xen VPS Hosting,” 2010, http://www.linode.com/ (accessed Apr. 7, 2010). 19. Google, “Quotas—Google App Engine,” 2010, http://code.google.com/ appengine/docs/quotas.html (accessed Oct. 21, 2010). 20. Jay Jordan, “Climbing Out of the Box and Into the Cloud: Building Web- Scale for Libraries,” Journal of Library Administration 51, no. 1 (2011): 3–17. 21. Nurmi Daniel et al., “The Eucalyptus Open-Source Cloud-Computing System,” in 9th IEEE/ACM International Symposium on Cluster Computing and the Grid, 2009, doi: 10.1109/CCGRID.2009.93. 22. Google, “The JRE White List— Google App Engine—Google Code,” 2010, http://code.google.com/appengine/ docs/java/jrewhitelist.html (accessed Apr. 9, 2010); Google, “The Java Servelet Environment,” 2010, http://code.google .com/appengine/docs/java/runtime .html (accessed Apr. 9, 2010). 23. Google, “Changing Quotas To Keep Most Apps Serving Free,” 2009, http:// googleappengine.blogspot.com/2009/ 06/changing-quotas-to-keep-most-apps .html (access Oct. 21, 2010). 24. Michael Armbust et al., Above the Clouds: A Berkeley View of Cloud Computing (EECS Department, University of California, Berkeley: Reliable Adaptive Distributed Systems Laboratory, 2009), http://www.eecs.berkeley.edu/Pubs/ Te c h R p t s / 2 0 0 9 / E E C S - 2 0 0 9 - 2 8 . h t m l (accessed July 1, 2009). 25. Amazon, “Amazon EC2 Pricing,” 2010, http://aws.amazon.com/ec2/pric- ing/ (accessed Feb. 20, 2010). 26. Michael Healy, “Beyond CYA as a service,” Information Week 1288 (2011): 24–26. case of 10TB storage. Since Amazon offers lower storage pricing for huge amounts of data, readers are recom- mended to do their own analysis on the TCOs. References 1. Roger C. Schonfeld and Ross Housewright, Faculty Survey 2009: Key Strategic Insights for Libraries, Publishers, and Societies, 2010, http://www.ithaka .org/ithaka-s-r/research/faculty-surveys -2000–2009/faculty-survey-2009 (accessed Apr. 20, 2010). 2. Daniel Chudnov, “A View From the Clouds,” Computers in Libraries 30, no. 3 (2010): 33–35. 3. Jay Jordan, “Climbing Out of the Box and Into the Cloud: Building Web- Scale for Libraries,” Journal of Library Administration 51, no. 1 (2011): 3–17. 4. Erik Mitchell, “Cloud Computing and Your Library,” Journal of Web Librarianship 4, no. 1 (2010): 83–86. 5. Erik Mitchell, “Using Cloud Services For Library IT Infrastructure,” Code4Lib Journal 9 (2010), http://journal .code4lib.org/articles/2510 (accessed Feb 10, 2011). 6. Subhas C. Misra and Arka Mondal, “Identification of a Company’s Suitability for the Adoption of Cloud Computing and Modelling its Corresponding Return on Investment,” Mathematical & Computer Modelling 53 (2011): 504–21, doi: 10.1016/j. mcm.2010.03.037. 7. Michael Healy, “Beyond CYA as a service,” Information Week 1288 (2011): 24–26. 8. Yan Han, “On the Clouds: A New Way of Computing,” Information Technology & Libraries 29, no. 2 (2010): 88–93. 9. Ibid. 10. Peter Mell and Tim Grance, The NIST Definition of Cloud Computing, NIST, http://csrc.nist.gov/groups/SNS/cloud -computing/ (accessed Oct. 21, 2010). 11. Ibid. 12. Ibid. 13. Ibid. 14. Amazon, Amazon Elastic Compute Cloud (Amazon EC2), 2010, http://aws .amazon.com/ec2/ (accessed Oct. 21, 2010). 15. Fay Chang et al., “Bigtable: A Distributed Storage System for Structure Data,” in 7th Symposium on Operating Systems Design and Implementation (OSDI the cloud computing providers was reported. There are some issues when implementing cloud computing. Above the Clouds: A Berkeley View of cloud computing discusses ten obsta- cles and related opportunities for cloud computing.27 All of these obstacles and opportunities are tech- nical. The author’s first paper on this topic also discusses legal jurisdiction issues when considering cloud com- puting.28 Users should be aware of these potential issues when making a decision of adopting the cloud. Summary This paper starts with literature review of articles in cloud computing, some of them describing how librar- ies are incorporating and evaluating the cloud. The author introduces cloud computing definition, identi- fies three-level of services (SaaS, PaaS, and IaaS), and provides an overview of major players such as Amazon, Microsoft, and Google. Open source cloud software and how private cloud helps are discussed. Then he presents case studies using different cloud computing providers: case 1 of using an IaaS provider Amazon and case 2 of using a PaaS provider Google. In case 1, the author justifies the imple- mentation of DSpace on AWS. In case 2, the author discusses advantages and pitfalls of PaaS and demonstrates a small web application hosted in Google AppEngine. Detailed analysis of the TCOs comparing AWS with local managed storage and servers are presented. The analysis shows that the cloud computing has techni- cal advantages and offers significant cost savings when serving web appli- cations. Shifting web applications to the cloud provides several techni- cal advantages over locally managed servers. High availability, flexibility, and cost-effectiveness are some of the most important benefits. However, the locally managed storage is still an attractive solution in a typical 206 iNForMAtioN tecHNoloGY AND liBrAries | DeceMBer 2011 (accessed July 1, 2009). 29. Yan Han, “On the Clouds: A New Way of Computing,” Information Technology & Libraries 29, no. 2 (2010): 88–93. (EECS Department, University of California, Berkeley: Reliable Adaptive Distributed Systems Laboratory, 2009), http://www.eecs.berkeley.edu/Pubs/ Te c h R p t s / 2 0 0 9 / E E C S - 2 0 0 9 – 2 8 . h t m l 27. Erik Mitchell, “Cloud Computing and Your Library,” Journal of Web Librarianship 4, no. 1 (2010): 83–86. 28. Michael Armbust et al., Above the Clouds: A Berkeley View of Cloud Computing, Appendix. Running Instances on Amazon EC2 task 1: Building a New Dspace instance ■■ Build a clean OS: select an Amazon Machine image (AMI) such as Ubuntu 9.2 to get up and running in a minute or two. ■■ Install required modules and packages: install Java, Tomcat, PostgreSQL, and mail servers. ■■ Configure security and network access on the node. ■■ Install and configure DSpace: install system and configure configuration files. task 2: reloading a New Dspace instance ■■ Create a snapshot of current node with the EBS if desired: use AWS’s management tools to create a snapshot. ■■ Register the snapshot using AWS’s management tools and write down the snapshot id, specify the kernel and ramdisk. command: ec2-register: registers the AMI specified in the manifest file and generate a new AMI ID (see Amazon EC2 Documentation) (example: ec2-register -s snap-12345 -a i386 -d “Description of AMI” -n “name-of-image” —kernel aki-12345 — ramdisk ari-12345 ■■ In the future, a new instance can be started from this snapshot image in less than a minute. command: ec2-run-instances: launches one or more instances of the specified AMI (see Amazon EC2 Documentation) (example: ec2-run-instance ami-a553bfcc -k keypair2 -b /dev/sda1=snap-c3fcd5aa: 100:false) task 3: increasing storage size of current instance ■■ To create an instance with desired persistent storage (e.g., 100 GB) command: ec2-run-instances: launches one or more instances of the specified AMI (see Amazon EC2 Documentation) (example: ec2-run-instances ami-54321 -k ec2-key1 -b /dev/sda1=snap-12345:100:false) ■■ If you boot up an instance based on one of these AMIs with the default volume size, once it’s started up you can do an online resize of the file system: Command: resize2fs: ext2 file system resizer (example: resize2fs /dev/sda1) task 4: Backup ■■ Go to AWS web interface and navigate to the “Instances” panel. ■■ Select our instance and then choose “Create Image (EBS AMI).” ■■ This newly created AMI will be a snapshot of our system in its current state. 1880 ---- Improving Independent Student Navigation of Complex Educational Web Sites: An Analysis of Two Navigation Design Changes in LibGuides Kate A. Pittsley and Sara Memmott INFORMATION TECHNOLOGY AND LIBRARIES | SE PTEMBER 2012 52 ABSTRACT Can the navigation of complex research websites be improved so that users more often find their way without intermediation or instruction? Librarians at Eastern Michigan University discovered both anecdotally and by looking at patterns in usage statistics that some students were not recognizing navigational elements on web-based research guides, and so were not always accessing secondary pages of the guides. In this study, two types of navigation improvements were applied to separate sets of online guides. Usage patterns from before and after the changes were analyzed. Both sets of experimental guides showed an increase in use of secondary guide pages after the changes were applied whereas a comparison group with no navigation changes showed no significant change in usage patterns. In this case, both duplicate menu links and improvements to tab design appeared to improve independent student navigation of complex research sites. INTRODUCTION Anecdotal evidence led librarians at Eastern Michigan University (EMU) to investigate possible navigation issues related to the LibGuides platform. Anecdotal evidence included (1) incidents of EMU librarians not immediately recognizing the tab navigation when looking at implementations of the LibGuides platform on other university sites during the initial purchase evaluation, (2) multiple encounters with students at the reference desk who did not notice the tab navigation, and (3) a specific case involving use of a guide with an online course. The case investigation started with a complaint from a professor that graduate students in her online course were suddenly using far fewer resources than students in the same course during previous semesters. The students in that semester’s section relied heavily—often solely— on one database, while most students during previous semesters had used multiple research sources. This course has always relied on a research guide prepared by the liaison librarian, the selection of resources provided had not changed significantly between the semesters, and the assignment had not changed. Furthermore, the same professor taught the course and did not alter her recommendation to the students to use the resources on the research guide. What had changed between the semesters was the platform used to present research guides. The library had just migrated from a simple one-page format for research guides to the more flexible multipage format offered by the LibGuides platform. Only a few resources were listed on the first Kate A. Pittsley (kpittsle@emich.edu) is an Assistant Professor and Business Information Librarian and Sara Memmott (smemmott@emich.edu) is an Instructor and Emerging Technologies Librarian at Eastern Michigan University, Ypsilanti, Michigan. IMPROVING INDEPENDENT STUDENT NAVIGATION OF COMPLEX EDUCATIONAL WEBSITES | PITTSLEY AND MEMMOTT 53 LibGuides page of the guide used for the course. Only one of these resources was a subscription database, and that database was the one that current students were using to the exclusion of many other useful sources. After speaking with the professor, the liaison librarian also worked one-on-one with a student in the course. The student confirmed that she had not noticed the tab navigation and so was unaware of the numerous resources offered on subsequent pages. The professor then sent a message to all students in the course explaining the tab navigation. Subsequently the professor reported that students in the course used a much wider range of sources in assignments. Statistical Evidence of the Problem A look at statistics on guide use for fall 2010 showed that on almost all guides the first pages of guides were the most heavily used. As the usual entry point, it wasn’t surprising that the first pages would receive the most use; however, on many multipage guides, the difference in use between the first page and all secondary pages was dramatic. That users missed the tab navigation and so did not realize additional guide pages existed seemed like a possible explanation for this usage pattern. Librarians felt strongly that most users should be able to navigate guides without direct instruction in their use, and they were concerned by the evidence that indicated problems with the guide navigation. Was there something that could be done to improve independent student navigation in LibGuides? Two types of design changes to navigation were considered. To test the changes, each navigation change was applied to separate sets of guides. Usage patterns were then compared for those guides before and after changes were made. The investigators also looked at usage patterns over the same period for a comparison group to which no navigation changes had been made. LITERATURE REVIEW Navigation in LibGuides and Pathfinders The authors reviewed numerous articles related to LibGuides or pathfinders generally, but found few that mention navigation issues. They then turned to studies of website navigation in general. In an early article on the transition to web-based library guides, Cooper noted that “computer screens do not allow viewers to visualize as much information simultaneously as do print guides, and consequently the need for uncomplicated, easily understood design is even greater.”1 Four university libraries’ usability studies of the LibGuides platform specifically address navigation issues. University of Michigan librarians Dubicki et al. found that “tabs are recognizable and meaningful—users understood the function of the tabs.”2 The Michigan study then focused on the use of meaningful language for tab labels. However, at the LaTrobe University Library (Australia), Corbin and Karasmanis found a consistent pattern of students not recognizing the navigation tabs, and so recommended providing additional navigation links elsewhere on the page.3 At the University of Washington, Hungerford et al. found students did not immediately recognize the tab navigation: INFORMATION TECHNOLOGY AND LIBRARIES | SE PTEMBER 2012 54 During testing it was observed that users frequently did not notice a guide’s tabs right away as a navigational option. Users’ eyes were drawn to the top middle of the page first and would focus on content there, especially if there was actionable content, such as links to other pages or resources.4 The solution at the University of Washington was to require that all guides have a main page navigation area (LibGuides “box”) with a menu of links to the tabbed pages. After a usability study, MIT Libraries also recommended use of a duplicate navigation menu on the first page, stating in MIT Libraries staff guidelines for creating LibGuides to “make sure to link to the tabs somewhere on the main page” as “users don’t always see the tabs, so providing alternate navigation helps.”5 Navigation Palmer mentions navigation as one of the factors most significantly associated with website success as measured by user satisfaction, likelihood to use a site again, and use frequency.6 However, effective navigation may be difficult to achieve. Nielsen found in numerous studies that “users look straight at the content and ignore the navigation areas when they scan a new page.”7 In a presentation on the top ten mistakes in web design, human–computer interaction scholar Tullis included “awkward or confusing navigation.”8 The following review of the literature on website navigation design is limited to studies of navigation models that use browsing via menus, tabs, and menu bars. The navigation problem seen in LibGuides is far from unique. Usability studies for other information-rich websites demonstrate similar problems with users not recognizing navigation tabs or menu bars similar to those used in LibGuides. In 2001, McGillis and Toms investigated the usability of a library website with a horizontal navigation bar at the top of the page, a design similar to the single row of LibGuides tabs. This study found that users either did not see the navigation bar or did not realize it could be clicked.9 In multiple usability studies, U.S. Census Bureau researchers found similar problems with navigation bars on government websites. In 2009, Olmsted-Hawala et al. reported that study participants did not use the top-navigation bar on the Census Bureau’s Business and Industry website.10 The next year, Chen et al. again reported problems with top-navigation bar use on the Governments Division public website, explaining that the “top-navigation bar blends into the header, leading participants to skip over the tabs and move directly to the main content. This is a recurring issue the Usability Laboratory has identified with many Web sites.”11 One possible explanation for user neglect of tabs and navigation bars may be a phenomenon termed “banner blindness.” As early as 1999, Benway provided in-depth analysis of this problem. In his thesis, he uses the word “banner” not just for banner ads, but also for banners that consist of horizontal graphic buttons similar to the LibGuides tab design. Benway’s experiments show that an attempt to make important items visually prominent may have the opposite effect— that “the visual distinctiveness may actually make important items seem unimportant.” Benway follows with two recommendations: (1) that “any method that is created to make something stand out should be carefully tested with users who are specifically looking for that content to ensure that it does not cause banner blindness,” and (2) that “any item visually distinguished on a page should be duplicated within a collection of links or other navigation areas of the page. That way, if searchers ignore the large salient item, they can still find what they need through basic navigation.”12 IMPROVING INDEPENDENT STUDENT NAVIGATION OF COMPLEX EDUCATIONAL WEBSITES | PITTSLEY AND MEMMOTT 55 In 2005, Tullis cited multiple studies that showed that users found information faster or more effectively by using a simple table of contents than by using other navigation forms, including tab- based navigation.13 Yet in 2011, Nicolson et al. found that “participants rarely used table of contents; and often appeared not to notice them.”14 Yelinek et al. pointed to a practical problem in using content menus on LibGuides pages: since LibGuides pages can be copied or mirrored on other guides, guide authors must be cognizant that such menus could cause problems with incorrect or confusing navigational links on copied or mirrored pages.15 Success can also depend on the location of navigational elements, although researchers disagree on effects of location. In addition, user expectations of where to look for navigation elements may change over time along with changes in web conventions. In 2001, Bernard studied user expectations as to where common web functions would be located on the screen layout. He found that “most participants expected the links to web pages within a website to be almost exclusively located in the upper-left side of a web page, which conforms to the current convention of placing links on [the] left side.”16 In 2004, Pratt et al. found that users were equally effective using horizontal or vertical navigation menus, but when given a choice more users chose to use vertical navigation.17 Also in 2004, McCarthy et al. performed an eye-tracking study, which showed faster search times when sites conformed to the expected left navigation menu and a user bias toward searching the middle of the screen; but it also found that the initial effect of menu position diminished with repeated use of a site.18 Nonetheless, Jones found that by 2006 most corporate webpages used “horizontally aligned primary navigation using buttons, tabs, or other formatted text.”19 In 2008, Cooke found that users looked equally at left, top, and center menus; however, when “a visually prominent navigation menu populated the center of the Web page, participants were more likely to direct their search in this location.”20 Wroblewski describes how tab navigation was first popularized by Amazon.21 Burrell and Sodan investigated user preferences for six navigation styles and found that users clearly preferred tab navigation “because it is most easily understood and learned.”22 In the often-cited web design manual Don’t Make Me Think, Krug also recommends tabs: “Tabs are one of the very few cases where using a physical metaphor in a user interface actually works.”23 Krug recommends that tabs be carefully designed to resemble file folder tabs. They should “create the visual illusion that the active tab is in front of the other tabs . . . the active tab needs to be a different color or contrasting shade [than the other tabs] and it has to physically connect with the space below it. This is what makes the active tab ‘pop’ to the front.”24 An often-cited U.S. Department of Health and Human Services manual on research-based web design addresses principles of good tab design, stating that tabs should be located near the top of the page and should “look like clickable versions of real-world tabs. Real-world tabs are those that resemble the ones found in a file drawer.”25 Nielsen provides similar guidelines for tab design, which include that the selected tab should be highlighted, the current tab should be connected to the content area (just like a physical tab), and that one should use only one row of tabs.26 More recently, Cronin highlighted examples of good tab design that effectively use elements such as rounded tab corners, space between tabs, and an obvious design for the active tab that visually connects the tab to the area beneath it.27 Christie also provides best practices for tab design that include consistent use of only one row of tabs, use of a prominent color for the active tab and a single INFORMATION TECHNOLOGY AND LIBRARIES | SE PTEMBER 2012 56 background color for unselected tabs, changing the font color on the active tab, and use of rounded corners to enhance the file-folder-tab metaphor.28 Two articles mention that the complexity of a site can be a factor in navigation success. McCarthy et al. found that search times are significantly affected by site complexity and recommended finding ways to balance the provision of numerous user options with simplifying the site so that users can find their way.29 Little specifically suggests reducing the amount of extraneous information on LibGuides pages in her article, which applies cognitive load theory to use of library research guides.30 In sum, effective navigation is difficult to achieve. However, navigation design can be improved by considering the purpose of the site, user expectations, common conventions, best practices, the possibility that intuitive ideas for design may not perform as expected (e.g., banner blindness), the site’s complexity, and more. RESEARCH QUESTION AND METHOD Could design changes improve independent student use of LibGuides tab navigation? The literature reviewed above suggested two likely design changes to test: adding additional navigation links in the body of the page and improving the tab design. Testing these design changes on selected guides would allow the EMU library to assess the impact before implement changes on all library research guides. For this experiment, each type of navigation change was applied to separate subsets of guides; a subset of similar guides was selected as a comparison group; and usage patterns were analyzed for similar periods before and after changes were made. Navigation design changes were made to fourteen subject guides related to business. The business subject guides were divided into two experimental groups of seven guides. In group A, a table of contents box with navigation links was added to the front page of each guide, and in group B, the navigation tabs were altered in appearance. No navigation changes were made to comparison group C. Class specific guides were excluded from the experiment, as in many cases the business librarian would have instructed students in the use of tabs on class guides. Changes were made at the beginning of the winter 2011 semester so that an entire semester’s data could be collected and compared to the previous semester’s usage patterns. The design for group A was similar to the University of Washington implementation of a “What’s in the Guide” box on guide homepages that repeated the tab navigation links.31 For guides in group A, a table of contents box was placed on the guide homepages. It contained a simple list of links to the secondary pages of the guides, using the same labels as on the navigation tabs. The table of contents box used a larger font size than other body text and was given an outline color that contrasted with the outline color used on other boxes and matched the navigation tab color to create visual cues that this box had a different function from the other boxes on the page (navigation). The table of contents box was placed alongside other content on the guide homepages so users could still see the most relevant resources immediately. Figure 1 shows a guide containing a table of contents box. IMPROVING INDEPENDENT STUDENT NAVIGATION OF COMPLEX EDUCATIONAL WEBSITES | PITTSLEY AND MEMMOTT 57 Figure 1. Group A Guide with Content Menu Box Labeled “Guide Sections” The design change for group B focused on the navigation tabs. LibGuides tabs exhibit some of the properties of good tab design, such as allowing for rounded corners and contrasting colors for the selected tabs. Other aspects are not ideal, such as the line that separates the active tab from the page body.32 In the EMU Library’s initial LibGuides implementation, the option for tabs with rounded corners was used to resemble the design of manila file folders and increase the association with the file-folder metaphor. Possibilities for further design adaptation on the experimental guides were somewhat limited because these changes needed to be applied to the tabs of just a selected set of guides. The investigators theorized that increasing the height of the tabs might make them more closely resemble paper file folder tabs. Increasing the height would also increase the area of the tabs, and the larger size might also make the tabs more noticeable. This option was simple to implement on the guides in group B by adding html break tags,
, to the tab text. Taller tabs also provided more room for text on the tabs. Tabs in LibGuides will expand in width to fit the text label used, and if the tabs on a guide require more space on the page, they will be displayed in multiple rows. Multiple rows of tabs are visually confusing and break the tabs metaphor, decreasing their usefulness for navigation.33 The EMU Library’s best practices for research guides already encouraged limiting tabs to one row. Adding height to tabs allowed for clearer text labels on some guides without expanding the tab display beyond a single row. Figure 2 shows a guide containing the altered taller tabs. INFORMATION TECHNOLOGY AND LIBRARIES | SE PTEMBER 2012 58 Figure 2. Group B Guide with Tabs Redesigned to Look More Like File Folder Tabs While variations in content and usage of library guides did not allow for a true control group, other social science subject guides were selected as a comparison group. Social science subject guides were excluded from the comparison group if they had very low guide usage during the fall 2010 semester (fewer than thirty uses), or if they had fewer than three tabs, making them structurally dissimilar to the business guides. This left a group of sixteen comparison guides. No changes were made to the navigation design of these guides during the test period. The business guides—which the authors had permission to experiment with—tend to be longer and have more pages than other guides. On average, the experimental guides had more pages per guide than the comparison guides; guides in groups A and B averaged nine pages per guide, and comparison guides averaged five pages per guide. Guides with more pages will tend to have a higher percentage of hits on secondary pages because there are more pages available to users. However, the authors intended to measure the change in usage patterns with each guide measured against itself in different periods, and the number of pages in each guide did not change from semester to semester. DATA COLLECTION AND RESULTS LibGuides provides monthly usage statistics that include the total hits on each guide and the number of hits on each page of a guide. Use of secondary pages of the guides was measured by calculating the proportion of hits to each guide that occurred on secondary pages. Data for the fall 2010 semester (September through December 2010) was used to measure usage patterns before navigation changes were made to the experimental guides. Data for the winter 2011 semester (January through April 2011) was used to measure usage patterns after navigation changes were made. Each would represent a full semester’s use at similar enrollment levels with many of the same courses and assignments. Usage patterns for the comparison guides were also examined for these periods. IMPROVING INDEPENDENT STUDENT NAVIGATION OF COMPLEX EDUCATIONAL WEBSITES | PITTSLEY AND MEMMOTT 59 As shown in figures 3 and 4, in both group A and group B, the percentage of hits on secondary pages increased in five guides and decreased in two guides. Figure 3. Group A: Change in Secondary Page Usage with Content Menus Added for Winter 2011 Figure 4. Group B: Change in Secondary Page Usage with New Tab Design for Winter 2011 Both groups of experimental guides showed an increase in use of secondary guide pages after the design changes were made. The median usage score was calculated for each group. Group A, with the added menu links, showed an increase of 10.3 points in the median percentage of guide hits on secondary pages. Group B, with redesigned tabs, showed an increase of 10.4 points in the median percentage of guide hits on secondary pages. Within the comparison guides, the proportion of hits Secondary Tab Usage : Guides in Group A Fall 2010 Winter 2011 Secondary Tab Usage: Guides in Group B Fall 2010 Winter 2011 INFORMATION TECHNOLOGY AND LIBRARIES | SE PTEMBER 2012 60 on secondary pages did not change significantly from fall 2010 to winter 2011. Table 1 shows the median percentage of guide hits on secondary pages before and after navigation design changes. Group A: Menu Links Added Group B: Tabs Redesigned Group C: Comparison Group Fall 2010 39.1% 50.5% 37.7% Winter 2011 49.4% 60.9% 37.4% Table 1. Median Percentage of Guide Hits on Secondary Pages The box plot in figure 5 graphically illustrates the range of the usage of secondary pages in each group of guides and the changes from fall 2010 to winter 2011, showing the minimum, maximum, and median scores, as well as the range of each quartile. Figure 5. Distribution of Percentage of Guide Hits on Secondary Pages. This figure demonstrates the change in usage pattern for groups A and B and the lack of change in usage pattern for comparison group C. Averages for the percentage change in secondary tab use were also computed for the combined experimental groups and the comparison group. 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% Group A F10 Group A W11 Group B F10 Group B W11 Group C F10 Group C W11 IMPROVING INDEPENDENT STUDENT NAVIGATION OF COMPLEX EDUCATIONAL WEBSITES | PITTSLEY AND MEMMOTT 61 Experimental or Comparison N Mean Std. Deviation Std. Error Mean Change in secondary tab use dim ension 1 Experimental 14 .07871 .097840 .026149 Comparison 16 -. 02550 .145977 .036494 Table 2. Average Change in Secondary Tab Use from Fall 2010 to Winter 2011, Comparing All Experimental Guides (Groups A & B) With All Comparison (Group C) Guides. When comparing all experimental guides and all comparison guides, the change in use of secondary pages was found to be statistically significant. The average change in use of secondary pages for all experimental guides (groups A and B) was .07871, and the average for all comparison guides (group C) was -.02550. A t test showed that this difference was significant at the p < . 05 level (p = .032). STUDY LIMITATIONS In some (possibly many) cases, the first page of the guide provides all necessary sources and advice for an assignment. We measured actual use of secondary pages, but were unable to measure recognition of navigation elements where the student did not use the secondary pages because they had no need for additional resources. Because it wasn’t possible to control use of the guides during the periods studied, it is possible that factors other than the design changes contributed to the pattern of hits. Though subject guides rather than class guides were used to limit the influence of instruction in the use of guides, it wasn’t possible to determine with certainty if other faculty members instructed a significant number of students in the use of particular guides during the periods examined. The comparison group was slightly dissimilar in that they had fewer pages than the experimental guides; however, the number of pages on a guide did not correlate with a change in percentage of hits on secondary pages from one semester to the next. APPLICATION OF FINDINGS When presented with the study results, the full library faculty at EMU expressed interest in using both design changes across all library research guides. The change to tab design—which is easiest to implement—has been made to all subject guides. Some librarians also chose to add content menus to selected guides. Since the complexity of research guides is also a factor in successful navigation,35 a recent LibGuides enhancement was used to move elements from the header area to the bottom of the guides. The elements moved out of the header included the date of last update, guide URL, print option, and RSS updates. The investigators hypothesize that the reduced complexity of the header may help in recognizing the tab navigation. Although convinced that the experimental changes made a difference to independent student navigation in research guides, the authors hope to find further ways to strengthen independent navigation. Vendor design changes to enhance the tab metaphor, such as creating a more visible connection between the active tab and page, might also improve navigation.36 INFORMATION TECHNOLOGY AND LIBRARIES | SE PTEMBER 2012 62 CONCLUSION Designing navigation for complex sites, such as library research guides, is likely to be an ongoing challenge. This study suggests that thoughtful design changes can improve navigation. In this case, both duplicate menu links and improvements to tab design improved independent student navigation of complex research sites. REFERENCES AND NOTES 1. Eric A. Cooper, “Library Guides on the Web: Traditional Tenets and Internal Issues,” Computers in Libraries 17, no. 9 (1997): 52. 2. Barbara Dubicki Beaton et al., LibGuides Usability Task Force Guerrilla Testing (Ann Arbor: University of Michigan, 2009), http://www.lib.umich.edu/content/libguides-guerilla- testing. 3. Jenny Corbin and Sharon Karasmanis, Health Sciences Information Literacy Modules Usability Testing Report (Bundoora, Australia: La Trobe University Library, 2009), http://arrow.latrobe.edu.au:8080/vital/access/HandleResolver/1959.9/80852. 4. Rachel Hungerford, Lauren Ray, Christine Tawatao, and Jennifer Ward, LibGuides Usability Testing: Customizing a Product to Work for Your Users (Seattle: University of Washington Libraries, 2010), 6, http://hdl.handle.net/1773/17101. 5. MIT Libraries, Research Guides (LibGuides) Usability Results (Cambridge, MA: MIT Libraries, 2008), http://libstaff.mit.edu/usability/2008/libguides-summary.html; MIT Libraries, Guidelines for Staff LibGuides (Cambridge, MA: MIT Libraries, 2011), http://libguides.mit.edu/staff-guidelines. 6. Jonathan W. Palmer, “Web Site Usability, Design, and Performance Metrics,” Information Systems Research 13, no. 2 (2002): 151-67, doi:10.1287/isre.13.2.151.88. 7. Jakob Nielsen, “Is Navigation Useful?,” Jakob Nielsen’s Alertbox, http://www.useit.com/alertbox/20000109.html. 8. Thomas S. Tullis, “Web-Based Presentation of Information: The Top Ten Mistakes and Why They are Mistakes,” in HCI International 2005 Conference: 11th International Conference on Human-Computer Interaction, 22–27, July 2005, Caesars Palace, Las Vegas, Nevada USA (Mahwah NJ: Lawrence Erlbaum Associates, 2005), doi:10.1.1.107.9769. 9. Louise McGillis and Elaine G. Toms, “Usability of the Academic Library Web Site: Implications for Design,” College & Research Libraries 62, no. 4 (2001): 355–67, http://crl.acrl.org/content/62/4/355.short. 10. Erica Olmsted-Hawala et al., Usability Evaluation of the Business and Industry Web Site, Survey Methodology #2009–15, (Washington, DC: Statistical Research Division, U.S. Census Bureau, 2009), http://www.census.gov/srd/papers/pdf/ssm2009–15.pdf. 11. Jennifer Chen et al., Usability Evaluation of the Governments Division Public Web Site, Survey http://www.lib.umich.edu/content/libguides-guerilla-testing http://www.lib.umich.edu/content/libguides-guerilla-testing http://arrow.latrobe.edu.au:8080/vital/access/HandleResolver/1959.9/80852 http://hdl.handle.net/1773/17101 http://crl.acrl.org/content/62/4/355.short http://www.census.gov/srd/papers/pdf/ssm2009–15.pdf IMPROVING INDEPENDENT STUDENT NAVIGATION OF COMPLEX EDUCATIONAL WEBSITES | PITTSLEY AND MEMMOTT 63 Methodology #2010–02, (Washington, DC: U.S. Census Bureau, Usability Laboratory, 2010), 19, http://www.census.gov/srd/papers/pdf/ssm2010-02.pdf. 12. Jan Panero Benway, “Banner Blindness: What Searching Users Notice and Do Not Notice on the World Wide Web” (PhD diss., Rice University, 1999), 75, http://hdl.handle.net/1911/19353. 13. Tullis, “Web-Based Presentation of Information.” 14. Donald J. Nicolson et al., “Combining Concurrent and Sequential Methods to Examine the Usability and Readability of Websites With Information About Medicines,” Journal of Mixed Methods Research 5, no. 1 (2011): 25–51, doi:10.1177/1558689810385694. 15. Kathryn Yelinek et al., “Using LibGuides for an Information Literacy Tutorial 2.0,” College & Research Libraries News 71, no. 7 (July): 352–55, http://crln.acrl.org/content/71/7/352.short 16. Michael L. Bernard, “Developing Schemas for the Location of Common Web Objects,” Proceedings of the Human Factors and Ergonomics Society Annual Meeting 45, no. 15 (October 1, 2001): 1162, doi:10.1177/154193120104501502. 17. Jean A. Pratt, Robert J. Mills, and Yongseog Kim, “The Effects of Navigational Orientation and User Experience on User Task Efficiency and Frustration Levels,” Journal of Computer Information Systems 44, no. 4 (2004): 93–100. 18. John D. McCarthy, M. Angela Sasse, and Jens Riegelsberger, “Could I Have the Menu Please? An Eye Tracking Study of Design Conventions,” People and Computers 17, no. 1 (2004): 401–14. 19. Scott L. Jones, “Evolution of Corporate Homepages: 1996 to 2006,” Journal of Business Communication 44, no. 3 (2007): 236–57, doi:10.1177/0021943607301348. 20. Lynne Cooke, “How Do Users Search Web Home Pages?” Technical Communication 55, no. 2 (2008): 185. 21. Luke Wroblewski, “The History of Amazon’s Tab Navigation,” LUKEW IDEATION + DESIGN, May 7, 2007, http://www.lukew.com/ff/entry.asp?178. After addition of numerous product categories made tabs impractical, Amazon now relies on a left-side navigation menu. 22. A. Burrell and A. C. Sodan, “Web Interface Navigation Design: Which Style of Navigation- Link Menus Do Users Prefer?” in 22nd International Conference on Data Engineering Workshops, April 2006. Proceedings (Washington, D.C.: IEEE Computer Society, 2006), 42– 42, doi:10.1109/ICDEW. 2006.163. 23. Steve Krug, Don’t Make Me Think! A Common Sense Approach to Web Usability, 2nd ed. (Berkeley: New Riders, 2006), 79. 24. Ibid., 82. http://www.census.gov/srd/papers/pdf/ssm2010-02.pdf http://hdl.handle.net/1911/19353 http://crln.acrl.org/content/71/7/352.short http://www.lukew.com/ff/entry.asp?178 INFORMATION TECHNOLOGY AND LIBRARIES | SE PTEMBER 2012 64 25. U.S. Department of Health and Human Services, “Navigation,” in Research-Based Web Design & Usability Guidelines (Washington, DC: U.S. Department of Health and Human Services, 2006), 8, http://www.usability.gov/pdfs/chapter7.pdf. 26. Jakob Nielsen, “Tabs, Used Right,” Jakob Nielsen’s Alertbox, http://www.useit.com/alertbox/tabs.html. 27. Matt Cronin, “Showcase of Well-Designed Tabbed Navigation,” Smashing Magazine, April 6, 2009, http://www.smashingmagazine.com/2009/04/06/showcase-of-well-designed- tabbed-navigation. 28. Alex Christie, “Usability Best Practice, Part 1—Tab Navigation,” Tamar, January 13, 2010, http://blog.tamar.com/2010/01/usability-best-practice-part-1-tab-navigation. 29. McCarthy, Sasse, and Riegelsberger, “Could I have the Menu please?” 30. Jennifer J. Little, “Cognitive Load Theory and Library Research Guides,” Internet Reference Services Quarterly 15, no. 1 (2010): 52–63, doi:10.1080/10875300903530199. 31. Hungerford et al., LibGuides Usability Testing. 32. Christie, “Usability Best Practice”; Nielsen, “Tabs, Used Right”; Krug, Don’t Make Me Think; Cronin, “Showcase of Well-designed Tabbed Navigation.” 33. Christie, “Usability Best Practice”; Nielsen. “Tabs, Used Right.” 34. Eva D. Vaughan, Statistics: Tools for Understanding Data in the Behavioral Sciences (Upper Saddle River, NJ: Prentice Hall, 1998), 66. 35. McCarthy, Sasse, and Riegelsberger, “Could I Have the Menu Please?” 36. Springshare, the LibGuides vendor, has been amenable to customer feedback and open to suggestions for platform improvements. http://www.usability.gov/pdfs/chapter7.pdf http://www.smashingmagazine.com/2009/04/06/showcase-of-well-designed-tabbed-navigation http://www.smashingmagazine.com/2009/04/06/showcase-of-well-designed-tabbed-navigation http://blog.tamar.com/2010/01/usability-best-practice-part-1-tab-navigation 1913 ---- Usability Study of a Library’s M obile Website: An Example from Portland State University Kimberly D. Pendell and Michael S. Bowman USABILITY STUDY OF A LIBRARY’S MOBILE WEBSITE | PENDELL AND BOWMAN 45 ABSTRACT To discover how a newly developed library mobile website performed across a variety of devices, the authors used a hybrid field and laboratory methodology to conduct a usability test of the website. Twelve student participants were recruited and selected according to phone type. Results revealed a wide array of errors attributed to site design, wireless network connections, as well as phone hardware and software. This study provides an example methodology for testing library mobile websites, identifies issues associated with mobile websites, and provides recommendations for improving the user experience. INTRODUCTION Mobile websites are swiftly becoming a new access point for library services and resources. These websites are significantly different from full websites, particularly in terms of the user interface and available mobile-friendly functions. In addition, users interact with a mobile website on a variety of smartphones or other Internet-capable mobile devices, all with differing hardware and software. It is commonly considered a best practice to perform usability tests prior to the launch of a new website in order to assess its user friendliness, yet examples of applying this practice to new library mobile websites are rare. Considering the variability of user experiences in the mobile environment, usability testing of mobile websites is an important step in the development process. This study is an example of how usability testing may be performed on a library mobile website. The results provided us with new insights on the experience of our target users. In the fall of 2010, with the rapid growth of smartphones nationwide especially among college students, Portland State University (PSU) Library decided to develop a mobile library website for its campus community. The library’s lead programmer and a student employee developed a test version of the website. This version of the website included library hours, location information, a local catalog search, library account access for viewing and renewing checked out items, and access to reference services. It also included a “Find a Computer” feature displaying the availability of work stations in the library’s two computer labs. Kimberly D. Pendell (kpendell@pdx.edu) is Social Sciences Librarian, Assistant Professor, and Michael S. Bowman (bowman@pdx.edu) is Interim Assistant University Librarian for Public Services, Associate Professor, Portland State University Library, Portland, Oregon. mailto:kpendell@pdx.edu mailto:bowman@pdx.edu INFORMATION TECHNOLOGY & LIBRARIES | JUNE 2012 46 The basic architecture and design of the site was modeled on other existing academic library mobile websites that were appealing to the development team. The top-level navigation of the mobile website largely mirrored the full library website, utilizing the same language as the website when possible. The mobile website was built to be compatible with WebKit, the dominant smartphone layout engine. Use of JavaScript on the website was minimized due to the varying levels of support for it on different smartphones, and Flash was avoided entirely. Figure 1. Home Page of Library Mobile Website, Test Version We formed a mobile website team to further evaluate the test website and prepare it for launch. Three out of four team members owned smartphones, either an iPhone 3GS or an iPhone 4. We soon began questioning how the mobile website would work on other types of phones, recognizing that hardware and software differences would likely impact user experience of the mobile website. Performing a formal usability test using a variety of Internet-capable phones quickly became a priority. We decided to conduct a usability test for the new mobile website in order to answer the question: How user-friendly and effective is the new library mobile website on students’ various mobile devices? LITERATURE REVIEW Smartphones, mobile websites, and mobile applications have dominated the technology landscape in the last few years. Smartphone ownership has steadily increased, and a large percentage of USABILITY STUDY OF A LIBRARY’S MOBILE WEBSITE | PENDELL AND BOWMAN 47 smartphone owners regularly use their phone to access the Internet. The Pew Research Center reports that 52 percent of Americans aged 18–29 own smartphones, and 81 percent of this population use their smartphone to access the Internet or e-mail on a typical day. Additionally, 42 percent of this population uses a smartphone as their primary online access point.1 The 2010 ECAR Study of Undergraduate Students and Information Technology found that 62.7 percent of undergraduate students own Internet-capable handheld devices, an increase of 11.5 percent from 2009. The 2010 survey also showed that an additional 11.3 percent of students intended to purchase an Internet-capable handheld device within the next year.2 In this environment academic libraries have been scrambling to address the proliferation of student owned mobile devices, thus the number of mobile library websites is growing. The Library Success wiki, which tracks libraries with mobile websites, shows an 66 percent increase in the number of academic libraries in the United States and Canada with mobile websites from August 2010 to August 2011.3 We reviewed articles about mobile websites in the professional library science literature and found that mobile website usability testing is only briefly mentioned. In their summary of current mobile technologies and mobile library website development, Bridges, Rempel, and Griggs state that “user testing should be part of any web application development plan. You can apply the same types of evaluation techniques used in non-mobile applications to ensure a usable interface.”4 In a previous article, the same authors also note that not accounting for other types of mobile users is easy to do but leaves a potentially large audience for a mobile website “out in the cold.”5 More recently, Seeholzer and Salem found the usability aspect of mobile website development to be in need of further research.6 Usability evaluation techniques for a mobile website are similar to those for a full website, but the variety of smartphones and Internet-capable feature phones immediately complicates standard usability testing practices. The mobile device landscape is fraught with variables that can have a significant impact on the user experience of a mobile website. Factors like small screen size, processing power, wireless or data plan connection, and on-screen keyboards or other data entry methods contribute to user experience and impact usability testing. Zhang and Adipat note that, Mobile devices themselves, due to their unique, heterogeneous characteristics and physical constraints, may play a much more influential role in usability testing of mobile applications than desktop computers do in usability testing of desktop applications. Therefore real mobile devices should be used whenever possible.7 One strategy for usability testing on mobile devices is to identify device “families” by similar operating systems or other characteristics, then perform a test of the website. For example, Griggs, Bridges, and Rempel found representative models of device families at a local retailer, where they tested the site on the display phones. The authors also recommend “hallway usability testing,” an impromptu test with a volunteer.8 Zhang and Adipat go on to outline two methodologies for formal mobile application usability testing: field studies and laboratory experiments. The benefit of a mobile usability field study is INFORMATION TECHNOLOGY & LIBRARIES | JUNE 2012 48 the preservation of the mobile environment in which tasks are normally performed. However, data collection is challenging in field studies, requiring the participant to reliably and consistently self- report data. In contrast, the benefit of a laboratory study is that researchers have more control over the test session and data collection method. Laboratory usability tests lend themselves to screen capture or video recording, allowing researchers more comprehensive data regarding the participant’s performance on predetermined tasks.9 However, Billi and others point out that there is no general agreement in the literature about the significance or usefulness of the difference between laboratory and field testing of mobile applications.10 One compromise between field studies and laboratory experiments is the use of a smartphone emulator: an emulator mimics the smartphone interface on a desktop computer and is recordable via screen capture. However, desktop emulators mask some usability problems that impact smartphones, such as an unstable wireless connection or limited bandwidth.11 In order to record test sessions of users working directly with mobile devices, Jakob Nielsen, the well-known usability expert, briefly mentions the use of a document camera.12 In another usability test of a mobile application, Loizides and Buchanan also used a document camera with recording capabilities to effectively record users working with a mobile device.13 Usability attributes are metrics that help assess the user-friendliness of a website. In their review of empirical mobile usability studies, Coursaris and Kim present the three most commonly used measures in mobile usability testing: Efficiency: degree to which the product is enabling the tasks to be performed in a quick, effective and economical manner or is hindering performance; Effectiveness: accuracy and completeness with which specified users achieved specified goals in particular environment; Satisfaction: the degree to which a product is giving contentment or making the user satisfied.14 The authors present these measures in an overall framework of “contextual usability” constructed with the four variables of user, task, environment, and technology. An important note is the authors’ use of technology rather than focusing solely on the product; this subtle difference acknowledges that the user interacts not only with a product, but also other factors closely associated with the product, such as wireless connectivity.15 A participant proceeding through a predetermined task scenario is helpful in assessing site efficiency and effectiveness by measuring the error rate and time spent on a task. User satisfaction may be gauged by the participant’s expression of satisfaction, confusion, or frustration while performing the tasks. Measurement of user satisfaction may also be supplemented by a post-test survey. Returning to general evaluation techniques, mobile website usability employs the use of task scenarios, post-test surveys, and data analysis methods, similar to full site testing. General guides such as The Handbook of Usability Testing by Rubin and Chisnell and George’s User-Centered Library Websites: Usability Evaluation Methods provide helpful information on designing task scenarios, how to facilitate a test, post-test survey ideas, and methods of analysis.16 Another USABILITY STUDY OF A LIBRARY’S MOBILE WEBSITE | PENDELL AND BOWMAN 49 common data collection method in usability testing is the think aloud protocol as it allows researchers to more fully understand the user experience. Participants are instructed to talk about what they are thinking as they use the site; for example, expressing uncertainty of what option to select, frustration with poorly designed data entry fields, or satisfaction with easily understood navigation. Examples of the think aloud protocol can also be found in mobile website usability testing.17 METHOD While effective usability testing normally relies on five to eight participants, we decided a larger number of participants would be needed in order to capture the behavior of the site on a variety of devices. Therefore, we recruited twelve participants to accommodate a balanced variety of smartphone brands and models. Based on average market share, we aimed to test the website on four iPhones, four Android phones, and four other types of smartphones or Internet-capable mobile devices (e.g., BlackBerry, Windows phones). All study participants were university students, the primary target audience of the mobile website. We used three methods to recruit participants: a post to the library’s Facebook page, a news item on the library’s home page, and two dozen flyers posted around campus. Each form of recruitment described an opportunity for students to spend less than thirty minutes helping the library test its new mobile website. Also, participants would receive a $10 coffee shop gift card as an incentive. A project-specific email address served as the initial contact point for students to volunteer. We instructed volunteers to indicate their phone type in their e-mail; this information was used to select and contact the students with the desired variety of mobile devices. If a scheduled participant did not come to the test appointment, another student with the same or similar type of phone was contacted and scheduled. No other demographic data or screening was used to select participants, aside from a minimum age requirement of eighteen years old. We employed a hybrid field and laboratory test protocol, which allowed us to test the mobile website on students’ native devices while in a laboratory setting that we could efficiently manage and schedule. Participants used their own phone for the test without any adjustment to their existing operating preferences, similar to field testing methodology. However, we used a controlled environment in order to facilitate the test session and create recordings for data analysis. A library conference room served as our laboratory, and a document camera with video recording capability was used to record the session. The document camera was placed on an audio/visual cart and the participants chose to either stand or sit while holding their phones under the camera. The document camera recorded the phone screen, the participant’s hands, and the audio of the session. The video feed was available through the room projector as well, which helped us monitor image quality of the recordings. INFORMATION TECHNOLOGY & LIBRARIES | JUNE 2012 50 Figure 2. Video Still from Test Session Recording The test session consisted of two parts: the completion of five tasks using participants’ phones on our test website recorded under the document camera, and a post-test survey. Participants were read an introduction and instructions from a script in order to decrease variation in test protocol and our influence as the facilitators. We also performed a walk-through of the testing session prior to administering it to ensure the script was clearly worded and easy to understand. We developed our test scenarios and tasks according to five functional objectives for the library mobile website: 1. Participants can find library hours for a given day in the week. 2. Participants can perform a known title search in catalog and check for item status. 3. Participants can use My Account to view checked out books.18 4. Participants can use chat reference. 5. Participants can effectively search for a scholarly article using the mobile version of EBSCOhost Academic Search Complete. Prior to beginning the test, we encouraged participants to use the “think aloud” protocol while performing tasks. We also instructed them to move between tasks however they would naturally in order to capture user behavior when navigating from one part of the site to another. The post-test survey provided us with additional data and user reactions to the site. Users were asked to rate the site’s appearance, ease of use, and how frequently they might use the different website features USABILITY STUDY OF A LIBRARY’S MOBILE WEBSITE | PENDELL AND BOWMAN 51 (e.g., renewing a checked out item). The survey was administered directly after the task scenario portion of the test in order to take advantage of the users’ recent experience with the website. We evaluated the test sessions utilizing the measures of efficiency, effectiveness, and satisfaction. In this study, we assessed efficiency as time spent performing the task and effectiveness as success or failure in completing the task. We observed errors and categorized them as either a user error or site error. Each error was also categorized as minor, major, or fatal: minor errors were easily identified and corrected by the user; major errors caused a notable delay, but the user was able to correct and complete the task; fatal errors prevented the user from completing the task. To assess user satisfaction, we took note of user comments as they performed tasks, and we also referred to their ratings and comments on the post-test survey. Before analyzing the test recordings, we normalized our scoring behavior by performing a sample test session with a library staff member unfamiliar with the mobile website. We scored the sample recording separately and then met to discuss, clarify, and agree upon each error category. Each of the twelve test sessions was viewed and scored independently. Once this process was completed, we discussed our scoring of each test session video, combining our data and observations. We analyzed the combined data by looking for both common and unique errors for each usability task across the variety of smartphones tested. To protect participants’ confidentiality, each video file and post-test survey was labeled only with the test number and device type. Prior to beginning the study, all recruitment methods, informed consent, methodology, tasks and post-test survey were approved by Portland State University Human Subjects Research and Review Committee. FINDINGS Our recruitment efforts were successful with even a few same-day responses from the announcement posted on the library’s Facebook page. Some students also indicated that they had seen the recruitment flyers on campus. A total of fifty-two students volunteered to participate; twelve students were successfully contacted, scheduled, and tested. The distribution of the twelve participants and their types of phones is shown in table 1. Number of participants Operating system Phone model 4 Android HTC Droid Incredible 2; Motorola Droid; HTC MyTouch 3G Slide; Motorola Cliq 2 3 iOS iPhone 3GS 2 BlackBerry Blackberry 9630; Blackberry Curve INFORMATION TECHNOLOGY & LIBRARIES | JUNE 2012 52 1 Windows Phone 7 Windows Phone 7 1 webOS Palm Pixi 1 Other Windows Kin 2 feature phone (a phone with Internet capability, running KinOS) Table 1. Test Participants by Smartphone Operating System and Model Usability Task Scenarios All test participants quickly and successfully completed the first task, finding the library hours for Sunday. The second task was to find a book in the library catalog and report whether the book was available for check out. Nine participants completed this task; the Windows Phone 7 and the two Blackberry phones presented a fatal system error when working with our mobile catalog software, MobileCat. These participants were able to perform a search but were not able to view a full item record, blocking them from seeing the item’s availability and completing the task. This task also revealed one minor error for iPhone users: the iPhone displayed the item’s ten digit ISBN as a phone number, complete with touch-to-call button. Many users took more time than anticipated when asked to search for a book. The video recordings captured participants slowly scrolling through the menu before choosing “Search PSU- only Catalog.” A few participants expressed their hesitation verbally: ● “Maybe not the catalog? I don't know. Yeah I guess that would be the one.” ● “I don't look for books on this site anyway...my lack of knowledge more than anything else.” ● “Search PSU library catalog I'm assuming?” The Blackberry Curve participant did not recognize the catalog option and selected “Databases & Articles” to search for a book. She was guided back to the catalog after her unsuccessful search in EBSCOhost. We observed an additional delay in searching for a book when using the catalog interface. The catalog search included a pull down menu of collections options. The collections menu was included by the site developers because it is present in the full website version of the local catalog. Users tended to explore the menu looking for a selection that would be helpful in performing the task; however, they abandoned the menu, occasionally expressing additional confusion. USABILITY STUDY OF A LIBRARY’S MOBILE WEBSITE | PENDELL AND BOWMAN 53 Figure 3. Catalog Search with Additional “Collections” Menu The next task was to log into a library account and view checked out items. All participants were successful with this task, but frequent minor user errors were observed, all misspelling or numerical entry errors. Most participants self-corrected before submitting the login; however, one participant submitted a misspelled user name and promptly received an error message from the site. Participants were also instructed to log out of the account. After clicking “logout” one participant made the observation; “Huh, it goes to the login screen. I assume I'm logged out, though it doesn't say so.” The fourth task scenario involved using the library’s chat reference service via the mobile website. The chat reference service is provided via open source software in cooperation with L-net, the Oregon statewide service. Usability testing demonstrated that the chat reference service did not perform well on a variety of phones. Also, a significant problem arose when participants attempted to access chat reference via the university’s unsecured wireless network. Because the chat reference service is managed by a third-party host, three participants were confronted with a non-mobile friendly authentication screen (see discussion of the local wireless environment below). As this was an unexpected event in testing, participants were given the option to authenticate or abandon the task. All three participants who arrived at this point chose to move ahead with authentication during the test session. INFORMATION TECHNOLOGY & LIBRARIES | JUNE 2012 54 Once the chat interface was available to participants, other system errors were discovered. Only three out of twelve participants successfully sent and received a chat message. Only one participant (HTC Droid Incredible) experienced an error-free chat transaction. Various problems encountered included: · unresponsive or slow to respond buttons, · text fields unresponsive to data entry, · unusually long page loading time, · non-mobile-friendly error message upon attempting to exit, and · non-mobile-friendly “leave a message” webpage. Another finding from this task is that participants expressed concern regarding communication delays during the chat reference task. If the librarians staffing the chat service are busy with other users, a new incoming user is placed in a queue. After waiting in the chat queue for forty seconds, one participant commented, “Probably if I was on the bus and it took this long, I would leave a message.” Being in a controlled environment, participants looked to the facilitator as a guide for how long to remain in the chat queue, distorting the indication of how long users would wait for a chat reference transaction in the field environment. Figure 4. Chat Reference Queue USABILITY STUDY OF A LIBRARY’S MOBILE WEBSITE | PENDELL AND BOWMAN 55 The last task scenario asked participants to use the mobile version of EBSCOhost’s Academic Search Complete. Our test instance of this database generally performed well with Android phones and less well with webOS phones or iPhones. Android participants successfully accessed, searched, and viewed results in the database. iPhone users experienced delays in initiating text entry, three consecutive touches being consistently necessary to activate typing in the search field. Our feature phone participant with a Windows Kin 2 was unable to use EBSCOhost because the phone’s browser, Internet Explorer 6, is not supported by the EBSCOhost mobile website. The Palm Pixi participant also experienced difficulty with very long page loading times, two security certificate notifications (not present on other tests), and our EZproxy authentication page. With all these obstacles, the Palm Pixi participant abandoned the task. Another participant, Blackberry 9630, also abandoned the task due to slow page loading. A secondary objective of our EBSCOhost search task was to observe if participants explored EBSCOhost’s “Search Options” in order to limit results to scholarly articles. Our task scenario asked participants to find a scholarly article on global warming. Only one participant explored the EBSCOhost interface, successfully identified the “Search Options” menu, and limited the results to “scholarly (peer reviewed) articles.” Another participant included the words “peer reviewed” with “global warming” in the search field in an attempt to add the limit. A third expressed the need to limit to scholarly articles but was unable to discover how to do so. Of the remaining seven participants who searched Academic Search Complete for the topic “global warming” none expressed concern or awareness of the scholarly limit in Academic Search Complete. It is unclear whether this was a product of the interface design, users’ lack of knowledge regarding limiting their search to scholarly sources, or if our task scenario was simply too vague. Though participants’ wireless configurations, or lack thereof, was not formally part of the usability test, we quickly discovered that this variable had a significant impact on the user’s experience of the mobile website. In the introductory script and informed consent we recommended to participants that they connect to the university’s wireless network to avoid data charges. However, we did not explicitly instruct users to connect to the secure network. Most participants chose to connect to the unencrypted wireless network and appeared to be unaware of the encrypted network (PSU and PSU Secure respectively). Using the unencrypted network led to authentication requirements at two different points in the test: using the chat service and searching Academic Search Complete. Other users who were unfamiliar with adding a wireless network to their phone used their cellular network connection. These participants were asked to authenticate only when accessing EBSCOhost’s Academic Search Complete (see table 2). Participants expressed surprise at the appearance of an authentication request when performing different tasks, particularly while connected to the on-campus university wireless network. The required data entry in a non-mobile friendly authentication screen, and the added page loading time, created an obstacle for the participant to overcome in order to complete the task. Notably, three participants also explained their naivete on how to find and add a wireless network to their phone. INFORMATION TECHNOLOGY & LIBRARIES | JUNE 2012 56 Internet connection Library mobile website Chat reference EBSCOhost On campus, unencrypted wireless No authentication required Authentication required Authentication required On campus, encrypted wireless No authentication required No authentication required No authentication required On campus, cellular network No authentication required No authentication required Authentication required Off campus, any mode No authentication required No authentication required Authentication required Table 2. Authentication Requirements Based on Type of Internet Connection and Resource. Post -Test Survey Each participant completed a post-test survey that asked them to rate the mobile website’s appearance and ease of use. The survey also asked participants to rank how frequently they were likely to use specific features of the website such as search for books and ask for help on a rating scale of more than weekly, weekly, monthly, less than monthly, and never. Participants were also invited to add general comments about the website. The mobile website’s overall appearance and ease of use was highly rated by all participants. The straightforward design of the mobile website’s homepage also garnered praise in the comment section of the post-test survey. Comments regarding the site’s design included: “Very simple to navigate,” and “The simple homepage is perfect! Also, I love that the site rotates sideways with my phone.” For many of the features listed on the survey participants selected an almost even distribution across the frequency of use rating scale. However, two features were ranked as having potential for very high use. Nine out of twelve participants said they would search for articles weekly or more than weekly. Eight out of twelve participants said they would use the “Find a Computer” function weekly or more than weekly. Two participants additionally wrote in comments that “Find a Computer” was “very important” and would be used “every day.” At the other end of the scale, our menu option “Directions” was ranked as having a potential frequency of use of never, with the exception of one participant marking less than monthly. DISCUSSION Usability testing of the library’s mobile website provided the team with valuable information, leading us to implement important changes before the site was launched. We quickly decided on a USABILITY STUDY OF A LIBRARY’S MOBILE WEBSITE | PENDELL AND BOWMAN 57 few changes, while others involved longer discussion. The collections menu was removed from the catalog search; this menu distracted and confused users with options that were not useful in a general search. “Directions” was moved from a top level navigation element to a clickable link in the site footer. Also, the need for a mobile version of the library’s EZproxy authentication page was clearly documented and has since been created and implemented. However, the team was very pleased with the praise for the overall appearance of the website and its ease of use, especially considering the significant difficulties some participants faced when completing specific tasks. The “Find a Computer” feature of the mobile website was very popular with test participants. The potential popularity among users is perhaps a reflection of overcrowded computer labs across campus and the continued need students have for desktop computing. Unfortunately, “Find a Computer” has been temporarily removed from the site due to changes in computer laboratory tracking software at the campus IT level. We hope to soon again have access to the workstation data for the library’s two computer labs in order to develop a new version of this feature. The hesitation participants displayed when selecting the catalog option in order to search for a book was remarkable for its pervasiveness. It’s possible that the term “catalog” has declined in use to the point of not being recognizable to some users, and it is not used to describe the search on the homepage of the library’s full website. In fact, we had originally planned to name the catalog search option with a more active and descriptive phrase, such as “Find books and more,” which is used on the library’s full website. However, the full library website employs WorldCat Local, allowing users to make consortial and interlibrary loan requests. In contrast, the mobile website catalog reflects only our local holdings and does not support the request functionality. The team decided not to potentially confuse users further regarding the functionality of the different catalogs by giving them the same descriptive title. In the case that WorldCat Local’s beta mobile catalog increases in stability and functionality, we will abandon MobileCat and provide the same request options on the mobile website as on the full website. We discussed removing the chat service option from the “Ask Us” page. During usability testing, it was demonstrated that users would too frequently have poor experiences using this service due to slow page loads on most phones, the unpredictable responsiveness of text entry fields and buttons, and the wait time for a librarian to begin the chat. Also, it could be that waiting in a virtual queue on a mobile device is particularly unappealing because the user is blocked from completing other tasks simultaneously. The library recently implemented a new text reference service, and this service was added to the mobile website. The text reference service is an asynchronous, non-web- based service that is less likely to pose similar usability problems as those found with the chat service. This reflects the difference between applications developed for desktop computing, such as web-based instant messaging, versus a technology that is specifically related to the mobile phone environment, like text messaging. However, tablet device users complicate matters since they might use the full desktop website or the mobile website; for this reason, chat reference is still part of the mobile website. INFORMATION TECHNOLOGY & LIBRARIES | JUNE 2012 58 Participants’ interest in accessing and searching databases was notable. During the task, many participants expressed positive reactions to the availability of the EBSCOhost database. The post- test survey results demonstrated a strong interest in searching for articles via the mobile website, giving their potential frequency of use as weekly or more than weekly. This evidence supports the previous user focus group results of Seeholzer and Salem.19 Students are interested in accessing research databases on their mobile devices, despite the likely limitations of performing advanced searches and downloading files. Therefore, the team decided to include EBSCOhost’s Academic Search Complete along with eight other mobile-friendly databases in the live version of the website launched after the usability test. Figure 5. Home Page of the Library Mobile Website, Updated USABILITY STUDY OF A LIBRARY’S MOBILE WEBSITE | PENDELL AND BOWMAN 59 The new library mobile website was launched in the first week of fall 2011 quarter classes. In the first full week there were 569 visits to the site. Site analytics for the first week also showed that our distribution of smartphone models in usability testing was fairly well matched with the users of the website, though we underestimated the number of iPhone users: 64 percent of visits were from Apple iOS users, 28 percent from Android users, 0.7percent Blackberry users, and the remaining a mix of users with alternative mobile browsers and desktop browsers. Usability testing with participants’ native smartphones and wireless connectivity revealed issues which would have been absent in a laboratory test that employed a mobile device emulator and a stable network connection. The complications introduced by the encrypted and unencrypted campus wireless networks, and cellular network connections, revealed some of the many variables users might experience outside of a controlled setting. Ultimately, the variety of options for connecting to the Internet from a smartphone, in combination with the authentication requirements of licensed library resources, potentially adds obstacles for users. General recommendations for mobile library websites that emerged from our usability test include: · users appreciate simple, streamlined navigation and clearly worded labels; · error message pages and other supplemental pages linked from the mobile website pages should be identified and mobile-friendly versions created; · recognize that how users connect to the mobile website is related to their experience using the site; · anticipate problems with third-party services (which often cannot be solved locally). Additionally, system responses to user actions are important; for example, provide a “you have successfully logged out” message and an indicator that a catalog search is in progress. It is possible that users are even more likely to abandon tasks in a mobile environment than in a desktop environment if they perceive the site to be unresponsive. As test facilitators, we experienced three primary difficulties in keeping the testing sessions consistent. The unexpectedly poor performance of the mobile website on some devices required us to communicate with participants about when a task could be abandoned. For example, after one participant made three unsuccessful attempts at entering text data in the chat service interface, she was directed to move ahead to the next task. Such instances of multiple unsuccessful attempts were considered to be fatal system errors. However, under these circumstances, it is difficult to know whether our test facilitation led participants to spend more or less time than they normally would attempting a task. Secondly, the issue of system authentication led to unexpected variation in testing. Some participants proceeded through these obstacles, while others either opted out or had significant enough technical difficulties that the task was deemed a fatal error. Again, it is unclear how the average user would deal with this situation in the field. Some users INFORMATION TECHNOLOGY & LIBRARIES | JUNE 2012 60 might leave an activity if an obstacle appears too cumbersome, others might proceed. Finally, participants demonstrated a wide range in their willingness to “think aloud.” In retrospect, as facilitators, we should have provided an example of the method before beginning the test; perhaps doing so would have encouraged the participants to speak more freely. The relatively simple nature of most of the test tasks may have also contributed to this problem as participants seemed reluctant to say something that might be considered too obvious. Another limitation of our study is that the participants were a convenience sample of volunteers selected by phone type. Though our selection was based loosely on market share of different smartphone brands, a preliminary investigation into the mobile device market of our target population would have been helpful to establish what devices would be most important to test. Additional usability testing on more complex library related tasks, such as advanced searching in a database, or downloading and viewing files, is recommended for further research. Also of interest would be a study of user willingness to proceed past obstacles like authentication requirements and non-mobile friendly pages in the field. CONCLUSION We began our study questioning whether or not different smartphone hardware and operating systems would impact the user experience of our library’s new mobile website. Usability testing confirmed that the type of smartphone does have an impact on the user experience, occasionally significantly so. By testing the site on a range of devices, we observed a wide variation of successful and unsuccessful experiences with our mobile website. The wide variety of phones and mobile devices in use makes developing a mobile website that perfectly serves all of them difficult; there is likely to always be a segment of users who experience difficulties with any given mobile website. However, usability testing data and developer awareness of potential problems will generate positive changes to mobile websites and alleviate frustration for many users down the road. REFERENCES AND NOTES 1. Aaron Smith, “35% of American Adults Own a Smartphone: One Quarter of Smartphone Owners Use Their Phone for Most of Their Online Browsing,” Pew Research Center, June 15, 2011, http://pewinternet.org/~/media//Files/Reports/2011/PIP_Smartphones.pdf (accessed Oct. 13, 2011). 2. Shannon D. Smith and Judith B. Caruso, The ECAR Study of Undergraduate Students and Information Technology, 2010, EDUCAUSE, 2010, 41, http://net.educause.edu/ir/library/pdf/ERS1006/RS/ERS1006W.pdf (accessed Sept. 12, 2011); Shannon D. Smith, Gail Salaway, and Judith B. Caruso, The ECAR Study of Undergraduate Students and Information Technology, 2009, EDUCAUSE, 2009, 49, http://www.educause.edu/Resources/TheECARStudyofUndergraduateStu/187215 (accessed Sept. 12, 2011). http://pewinternet.org/~/media/Files/Reports/2011/PIP_Smartphones.pdf http://net.educause.edu/ir/library/pdf/ERS1006/RS/ERS1006W.pdf http://www.educause.edu/Resources/TheECARStudyofUndergraduateStu/187215 USABILITY STUDY OF A LIBRARY’S MOBILE WEBSITE | PENDELL AND BOWMAN 61 3. A comparison count of U.S. and Canadian academic libraries with active mobile websites, wiki page versions, August 2010 (56 listed) and August 2011 (84 listed). Library Success: A Best Practices Wiki, “M-Libraries: Libraries Offering Mobile Interfaces or Applications,” http://libsuccess.org/index.php?title=M-Libraries (accessed Sept. 7, 2011). 4. Laurie M. Bridges, Hannah Gascho Rempel, and Kim Griggs, “Making the Case for a Fully Mobile Library Web Site: From Floor Maps to the Catalog,” Reference Services Review 38, no. 2 (2010): 317, doi:10.1108/00907321011045061. 5. Kim Griggs, Laurie M. Bridges, and Hannah Gascho Rempel, “Library/Mmobile: Tips on Designing and Developing Mobile Web Sites,” Code4Lib Journal no. 8 (2009), under “Content Adaptation Techniques,” http://journal.code4lib.org/articles/2055 (accessed Sept. 7, 2011). 6. Jamie Seeholzer and Joseph A. Salem Jr., “Library on the Go: A Focus Group Study of the Mobile Web and the Academic Library,” College & Research Libraries 72, no. 1 (2011): 19. 7. Dongsong Zhang and Boonlit Adipat, “Challenges, Methodologies, and Issues in the Usability Testing of Mobile Applications,” International Journal of Human-Computer Interaction 18, no. 3 (2005): 302, doi:10.1207/s15327590ijhc1803_3. 8. Griggs, Bridges, and Rempel, “Library/Mobile.” 9. Zhang and Adipat, “Challenges, Methodologies,” 303–4. 10. Billi et al., “A Unified Methodology for the Evaluation of Accessibility and Usability of Mobile Applications,” Universal Access in the Information Society 9, no. 4 (2010): 340, doi:10.1007/s10209-009-0180-1. 11. Zhang and Adipat, “Challenges, Methodologies,” 302. 12. Jakob Nielsen, “Mobile Usability,” Alertbox, September 26, 2011, www.useit.com/alertbox/mobile-usability.html (accessed Sept. 28, 2011). 13. Fernando Loizides and George Buchanan, “Performing Document Triage on Small Screen Devices. Part 1: Structured Documents,” in IIiX ’10: Proceeding of the Third Symposium on Information Interaction in Context, ed. Nicholas J. Belkin and Diane Kelly (New York: ACM, 2010), 342, doi:10.1145/1840784.1840836. 14. Constantinos K. Coursaris and Dan J. Kim, “A Qualitative Review of Empirical Mobile Usability Studies” (presentation, Twelfth Americas Conference on Information Systems, Acapulco, Mexico, August 4–6, 2006), 4, http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.83.4082&rep=rep1&type=pdf (accessed Sept. 7, 2011) 15. Ibid., 2. http://libsuccess.org/index.php?title=M-Libraries http://journal.code4lib.org/articles/2055 file:///C:/Users/GERRITYR/Desktop/ITAL%2031n2_PROOFREAD/www.useit.com/alertbox/mobile-usability.html http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.83.4082&rep=rep1&type=pdf INFORMATION TECHNOLOGY & LIBRARIES | JUNE 2012 62 16. Jeffrey Rubin and Dana Chisnell, Handbook of Usability Testing: How to Plan, Design, and Conduct Effective Tests, 2nd ed. (Indianapolis, IN: Wiley, 2008); Carole A. George, User-Centered Library Web Sites: Usability Evaluation Methods (Cambridge: Chandos, 2008). 17. Ronan Hegarty and Judith Wusteman, “Evaluating EBSCOhost Mobile,” Library Hi Tech 29, no. 2 (2011): 323–25, doi:10.1108/07378831111138198; Robert C. Wu et al., “Usability of a Mobile Electronic Medical Record Prototype: A Verbal Protocol Analysis,” Informatics for Health & Social Care 33, no. 2 (2008): 141–42, doi:10.1080/17538150802127223. 18. In order to protect participants’ confidentiality a dummy library user account was created; the user name and password for the account were provided to the participant at the test session. 19. Seeholzer and Salem, “Library on the Go,” 14. 1914 ---- The Next Generation Integrated Library System: A Promise Fulfilled? Yongming Wang and Trevor A. Dawes INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2012 76 ABSTRACT The adoption of integrated library systems (ILS) became prevalent in the 1980s and 1990s as libraries began or continued to automate their processes. These systems enabled library staff to work, in many cases, more efficiently than they had in the past. However, these systems were also restrictive—especially as the nature of the work began to change—largely in response to the growth of electronic and digital resources that they were not designed to manage. New library systems—the second (or next) generation—are needed to effectively manage the processes of acquiring, describing, and making available all library resources. This article examines the state of library systems today and describes the features needed in a next-generation library system. The authors also examine some of the next-generation library systems currently in development that purport to fill the changing needs of libraries. INTRODUCTION Since the late 1980s and early 1990s, the library automation system has gone from inception to rapid implementation to near ubiquitous adoption. But after two decades of changes in information technology, and especially in the last decade, the library has seen itself facing tremendous changes in terms of both resources and services it provides. On the resource side, print material and physical items are no longer dominant collections; electronic resources are fast outpacing physical materials to become the dominant library resources, especially in the academic and special libraries. In addition, many other digital format resources, such as digital collections, institutional repositories, and e-books have taken root. On the service front, library users— accustomed to immediate and instant searching, finding, and accessing information in the Google age—demand more and more instant and easy access to library resources and services. But the library automation system, also called the integrated library system (ILS), has not changed much for the past two decades. It finds itself uneasily handling the ever-changing library environment and workflow. Library staff becomes ever more frustrated with the ILS, noting its inadequacy in dealing with their daily jobs. Library users are confused by the many interfaces and complexity of library applications and systems. It is obvious that we are at the tipping point for a dramatic change in the area of library automation systems. The library literature has been referring to these as second-generation library automation systems or next-generation library systems.1 Two pillars of the second-generation library automation system are(1) it will manage the library resources in the comprehensive and unified way regardless of resource format and location; and (2) it will break away from the traditional ILS models and build on the service oriented architecture (SOA) model. Yongming Want (wangyo@tcnj.edu) is Systems Librarian for The College of New Jersey Library, Ewing Township, and Trevor Dawes (tdawes@princeton.edu) is Access Services & Circulation Librarian, Princeton University Libraries, Princeton, New Jersey. THE NEXT GENERATION LIBRARY SYSTEM: A PROMISE FULFILLED? | WANG AND DAWES 77 We are at the beginning of a new era of library automation systems. Some library system vendors have realized the need to change and have started to develop and implement the second- generation library automation system. We believe that the concept and implementation of the new library automation system will catch on quickly among the all types of libraries. It will change how the library conducts its business and will benefit both library staff and users. LITERATURE REVIEW There is not much research literature on the subject to date. After more than a decade of library automation development and implementation, starting in the late 1990s, libraries have been facing the challenges ushered in by rapidly evolving Internet and Web 2.0 technologies in addition to the growing number of savvy web users. Libraries found themselves lagging behind other sources (such as Internet search engines) in meeting users’ information needs, and library staff members are generally frustrated by the lack of flexibility of traditional library systems. As early as 2007, Marshall Breeding pointed out that “as librarians continue to operate with sparse resources, performing ever more services with ever more diverse collections—but with no increases in staff—it’s more important than ever to have automation tools that provide the most effective assistance possible.”2 In his 2009 article, he deliberately says that “dissatisfaction with the current slate of ILS products runs high. The areas of concern lie in their inability to manage electronic content and with their user interfaces that do not fare well against contemporary expectations of the Web.”3 So what are the trends in libraries for the last decade in terms of library resources, collections, services, and resource discoveries? According to Breeding, there are three trends: “1. Increased digital collections; 2. Changed expectations regarding interfaces; 3. Shifted attitudes toward data and software.”4 Andrew Pace notes that “web-based content, licensed resources, born-digital documents, and institutionally significant digital collections emerged rapidly to overtake the effort required to maintain print collections, especially in academic libraries.”5 Another noticeable trend in the library technology field is occurring along with a similar trend in the general information technology field, that is, the open-source software movement. As Pace states, “Open Source Software (OSS) efforts such as the Open Archive Initiative (OAI), DSpace, and Koha—just to name a few, as an exhaustive list would overwhelm the reader—challenged commercial proprietary systems, not only for market share but often in terms of sophistication and functionality.”6 As for the infrastructure and features of the second-generation library automation system, both Breeding and Pace have their respective visions. Breeding writes that “the next generation of library automation systems needs to be designed to match the workflows of today’s libraries, which manage both digital and print resources.”7 “One of the fundamental assumptions of the next generation library automation would involve a design to accommodate the hybrid physical and digital existence that libraries face today.”8 Pace specifically requires that the next-generation library automation system should use the web as a platform to fulfill the notion of Software-as-a- Service (SaaS), or further, Platform-as-a-Service (PaaS). The technical advantages of such systems would include the ability to “1. Develop, test, deploy, host, and maintain on the same integrated environment; 2. user experience without compromise; 3. build-in scalability, reliability, and INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2012 78 security; 4. build-in integration with Web services and databases; 5. support collaboration; 6. deep application instrumentation.”9 Also as early as October 2007, Computers in Libraries invited Ellen Bahr to survey a number of library technology experts regarding what features and functionality they want to see built into ILSs soon. The experts included Roy Tennant, Kristin Antelman, Ross Singer, Andrew Pace, John Blyberg, Stephen Abram, and H. Frank Cervone. They identified the following key functionality for future ILSs: • Direct, read-only access to data, preferably through an open source database management system like MySQL. • A standard way to communicate with the ILS, preferably through an application programming interface. • Standards-compliant systems including better security and more complete documentation. • The ability to run the ILS on hardware that the library selects and on servers that the library administers. • Greater interoperability of systems, pertaining to the systems within the library (including components from vendors, open source communities, and homegrown systems) and beyond (enterprise-level systems such as courseware and university portals, and shared library systems such as OCLC). • Greater distinction between the ILS (which needs to efficiently manage a library’s business processes) and the OPAC (which needs to be a sophisticated finding tool). • Better user interfaces, making use of the most current technologies available and providing a single interface to all of the library’s holdings, regardless of format.10 Four Aspects of Next-Generation ILS There are four distinguishing characteristics of the next-generation ILS we believe are critical. They are comprehensive library resources management; a system based on service-oriented architecture; the ability to meet the challenge of new library workflow; and a next-generation discovery layer. Comprehensive Library Resources Management Comprehensive library resources management requires that next-generation ILSs should be able to manage all library materials regardless of format or location. Current ILSs are built around the traditional library practice of print collections and services designed around these collections, but the last ten to fifteen years have seen great shifts in both library collections and services. Print and physical materials are no longer the dominant resources. Actually, in many libraries, especially in academic and research libraries, the building of electronic and digital collections have taken a larger role in library collection development. The traditional ILS has not been able to handle ever-growing electronic and digital resources—either in terms of their acquisition or management. Therefore a variety of either commercial or open-source THE NEXT GENERATION LIBRARY SYSTEM: A PROMISE FULFILLED? | WANG AND DAWES 79 electronic resources management systems (ERM systems) have been developed over the years to address this management gap, but two problems exist: First, most ERM systems, whether commercial or open-source, have not been able to truly integrate the acquisition process into the acquisitions workflow of the current ILS systems, causing a messy and redundant workflow for the library staff. In libraries where an ERM is deployed, staff generally track workflows in both the ERM and the ILS. If the library’s workflows have not been revised, miscommunication between the traditional acquisitions staff and the electronic resources staff can cause confusion, delay, and may even lead to disruption of services to library patrons. Second, ERM systems, by design, don’t take current library workflows into account. While it is true that these resources may need to be processed differently, library staff generally are used to traditional processes and want systems that function in familiar ways. Many libraries, particularly academic libraries, still have relatively large serials departments responsible for the management of print journals. Some have only recently begun to develop the personnel and the skills required to manage the influx of electronic and digital resources. Because of these problems with existing ERM systems, it is important that the next-generation ILSs fully integrate the key features of ERM systems, enabling the library to streamline and efficiently manage resources and staff. Full integration of e-resource management would not only include acquisitions functionality but also the ability to manage licenses—a critical component of e-resource management—and the ability to manage the various packages, databases, and vendors. Describing and providing access to e-resources are two aspects of the e-resources management process. These two features of the ERM system should also be integrated with the description and metadata management component of the next-generation ILS. Centrally managing the metadata of e-resources enables easier discovery of resources by library users and has the advantage of shifting some of the management workflow to the metadata (or cataloging) staff. System Based on Service-Oriented Architecture Next-generation ILSs should be designed based on Service-Oriented Architecture (SOA). What is SOA? A service-oriented architecture (SOA) is an architecture for building business applications as a set of loosely coupled distributed components linked together to deliver a well-defined level of service. These services communicate with each other, and the communication involves data exchange or service coordination. SOA is based on Web Services. Broadly, SOA can be classified into two aspects: services and connections, described below. Services: A service is a function or some processing logic or business processing that is well- defined, self-contained, and does not depend on the context or state of other services. An example of a service is loan processing services, which can be a self-contained unit for processing loan applications. Another example is weather services, used to get weather information. Any application on the network can use the services of the weather service to get the weather information for a local area or region. In the library field, an example of a well-defined service is a check-in or check-out service. INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2012 80 Connections: Connections are the links connecting these self-contained distributed services with each other. They enable client-to-services communication. In case of web services, Simple Object Access Protocol (SOAP) is frequently used to communicate between services. There are many benefits of SOA in the next-generation ILS. These include the ability to be platform independent, therefore allowing libraries to use the software and hardware of their choice. There is no threat of being locked in to a single vendor, as many libraries are now with their current ILSs. SOA also enables incremental development, deployment, and maintenance. The vendors can use the existing software (investment) and use SOA to build applications without replacing existing applications. As Breeding described, the potential of web services (SOA) for libraries includes • real-time interaction between library-automation systems and business systems of a library’s parent organization; • real-time interaction between library-automation systems and library suppliers or other business partners; • blending of library services into campus or municipal portal environments; • insertion of library services and content into courseware-management systems or other learning environments; • blending of content from external sources into library interfaces; and • delivery of library services and content to library users through nontraditional channels. 11 Meet the Challenge of the New Library Workflow The library systems in use today are, in general, aging—most were developed at least ten to fifteen years ago. They have been updated with software patches and new releases, but they still demand that staff work in the manner in which the systems were originally designed. Although changes in our library operations have been realized in many organizations, these systems have not been able to adequately adapt to how library staff now want to—or need to—operate. The inability to keep pace with the move from largely print to increasingly electronic resources in our libraries is one of the reasons our existing systems fail. Copeland et al. present a stunning visual of the typical workflow involved in acquiring and making available an electronic resource in the print-based library management system.12 Their graphic depicts five possible starting points, nine decision points, and close to twenty steps involved in the process. This process may not be typical, but it is illustrative of the complex nature of our new workflows that simply cannot be accommodated by existing ILSs. As early as 1997, the Sirsi Corporation recognized the need to modify systems; they introduced Workflows, which is designed to streamline library operations.13 Workflows, which introduced a graphical user interface to the Sirsi Unicorn system, was intended to allow staff a certain amount of flexibility and customization, depending on the tasks they typically perform. The new systems that are being developed and deployed today promise even more flexibility and propose to enable staff to work more efficiently irrespective of the format of the material being processed. But these systems will require staff to think about workflows in entirely different ways. Not only will the method used to perform tasks be different (now web-based, hosted services as THE NEXT GENERATION LIBRARY SYSTEM: A PROMISE FULFILLED? | WANG AND DAWES 81 opposed to client-server-based tools) but the functionality has been enhanced to be more efficient. We cannot say how these new systems will be welcomed or resisted by staff. Nor can we say how much staff savings will be realized because these systems are still too new and have not yet been implemented on a wide enough scale for a thorough assessment. But they are at least starting to address the issue. On the one hand, they will open a new window for further study and exploration of how to shape the next-generation ILSs to suit the new library workflow. On the other hand, the library will benefit by changing some of their out-of-date practices and workflows around the new system. Next-Generation Discovery Layer Current library OPACs, like the ILSs themselves, are more than ten years old and generally have shown no improvement in search capability, navigability, or discovery. Meanwhile, search technology has radically improved in the past decade. Frustrations with the OPACs’ limitations on the part of both librarians and library users eventually motivated many libraries to seek alternatives. Libraries want to take advantage of the advances in search and discovery technology by implementing “NextGen” OPACs or library discovery services. Given the vast range of resources available in libraries—local print holdings, specialized databases, and commercial databases to name only a few—libraries want a service that would make as many of them as discoverable as possible. The ideal system would have a unified search interface with a single search box, but with relevance ranking, faceted search, social tagging of records, persistent links to records, RSS feeds for searches, and the ability to easily save searches or export selected records to standard bibliographic management software programs. The ideal system would also integrate with the library’s OPAC, overlaying its current interface with a more nimble and navigable interface that still allows real-time circulation status and provides as much support as possible for foreign language fonts. It would also be as customizable as possible. Numerous options for discovery currently exist, and these include Summon from Serials Solutions, Primo from Ex Libris, WorldCat Local from OCLC, EBSCO Discovery Service, and Encore from Innovative Interfaces. As these services are not the focus of this article, they will not be discussed in detail, but the next-generation ILSs should have the ability to integrate seamlessly with these discovery services. Analysis of Two Examples 1. Alma Development In early 2009, Ex Libris (owner of Aleph and Voyager) began discussions with several institutions (Boston College, Princeton University, and Katholieke Universiteit Leuven; Purdue University joined later) to develop what they then termed the Unified Resource Management system (URM). The URM was to replace the existing ILSs and the subsequent add-ons that provided functionality not inherently available, such as the Electronic Resources Management (ERM) tools. The “back- end” operations would also be de-coupled from the user interface as described elsewhere in this paper. INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2012 82 Through a series of in-person and online meetings with the development partners, Ex Libris staff developed the conceptual framework and functional requirements for the URM (later named Alma) and began development of the product. Alma was delivered to the partners in a series of releases, each with more functionality, and the feedback was used to enhance or further develop the product. Alma uses the concept of a shared metadata repository (the Metadata Management System) to which libraries would contribute, through which records would be shared, and from which records would be downloaded and edited with local information. Selection and acquisitions functions would be integrated not only within Alma, but within the discovery layer to allow patrons, as well as staff, the ability to suggest items for addition to the library’s holdings. With “smart fulfillment,” the workflows for delivering materials to patrons will also be seamless.14 One of the major changes planned for Alma is the ability to manage the types of resources that cannot be effectively managed in current ILSs—specifically electronic and digital resources. These resources are currently managed with the use of add-on products that interact with varying degrees of success with the ILSs. This lack of integration has been a source of frustration for library staff, particularly as library electronic and digital collections continue to steadily grow. The development partners have presented extensively at various conferences about the development process and have been mostly positive about the product. Dawes and Lute described Princeton University’s participation in a presentation at the 2011 ACRL Conference in Philadelphia.15 At Princeton, an executive committee was created to oversee that partner’s process. Other staff members were then involved in testing each of the partner releases as the functionality increased and was made available to them. The Princeton University team then provided feedback to Ex Libris via regular telephone calls, after which they would see changes based on their feedback, or a status update from Ex Libris about the particular issue reported. The staff members at Princeton believe that their participation in the development of Alma has given them an opportunity to closely examine their workflows to see where efficiencies can be made. 2. Kuali OLE Project In 2008 a group of nine libraries formed the Open Library Environment (OLE) project, later called Kuali OLE. Kuali is a community of higher education institutions that came together to build enterprise-level and open-source applications for the higher education community. These systems include some core applications such as Kuali Financial System, Kuali People Management, and other campus-wide applications. The Kuali OLE is its most recent endeavor. The purpose of the Kuali OLE project is to build an enterprise-level, open-source, and next-generation ILS. The goal of Kuali OLE, taken from its website (http://kuali.org/OLE), is to “develop the first system designed by and for academic and research libraries for managing and delivering intellectual information.” There are six principal objectives of the project: • To be built, owned, governed by the academic and research library community • To supports the wide range of resources and formats of scholarly information • To interoperate and integrate with other enterprise and network-based systems THE NEXT GENERATION LIBRARY SYSTEM: A PROMISE FULFILLED? | WANG AND DAWES 83 • To support federation across projects, partners, consortia, and institutions • To provide workflow design and management capabilities • To provides information management capabilities to nonlibrary efforts The funding is provided by a contribution from the Andrew W. Mellon Foundation and the nine partner institutions. Kuali OLE will be built based on the SOA model, on top of the Kuali middleware application, Kuali RICE, the core component of the Kuali suite of applications. Kuali Rice “provides an enterprise class middleware suite of integrated products that allows for applications to be built in an agile fashion. This enables developers to react to end user business requirements in an efficient and productive manner, so that they can produce high quality business applications.”16 Version 1.0 of Kuali OLE is scheduled to be released to the public in December 2012. A stepping and testing version (0.3) was released in November 2011, which covers some core acquisitions features such as “Select” and “Acquire” processes. We believe that the Kuali OLE software will not only provide an alternative solution of the ILS for academic and research libraries, but will change the way the library conducts its business, and will also have implications for staffing. These changes will result from the comprehensive management of library materials and resources, and the system’s interoperability with other college-level enterprise applications. CONCLUSION After about two decades of library automation system history, both libraries and vendors have begun to realize that a revolutionary change is needed in designing and developing the next- generation ILS. The system, built on the model of SOA, should enable the library to comprehensively and effectively manage all library resources and collections, should accommodate a more flexible library workflow, and should enable the library to provide better services to library users. It is encouraging to see that, in both the commercial and open-source arenas, concrete steps are being taken to develop these systems that will manage all library resources. Alma and Kuali OLE are but two of the next-generation ILSs in development. In 2011, Serials Solutions announced their intent to develop a system using the same principles as described. So have Innovative Interfaces and OCLC, the latter of which has already released an early version of their product to some institutions. Since these products are still in development and implementation is not yet widespread, their success in meeting the needs of the library community is still to be seen. REFERENCES 1. Marshall Breeding, “Next Generation Library Automation: Its Impact on the Serials Community,” The Serials Librarian 56, no. 1–4 (2009): 55–64. 2. Marshall Breeding, “It’s Time to Break the Mold of the Original ILS,” Computers in Libraries 27, no. 10 (2007): 39–41. 3. Breeding, “Next Generation Library Automation INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2012 84 4. Breeding, “It’s Time to Break the Mold of the Original ILS.” 5. Andrew Pace, “21st Century Library Systems,” Journal of Library Administration 49, no. 6 (2009): 641–50. 6. Ibid. 7. Breeding, “It’s Time to Break the Mold of the Original ILS.” 8. Breeding, “Next Generation Library Automation.” 9. Dave Mitchell, “Defining Platform-As-A-Service, or PaaS,” Bungee Connect Developer Network, 2008, http://bungeeconnect.wordpress.com/2008/02/18/defining-platform-as-a-service-or- paas (accessed Jan. 28, 2012). 10. Ellen Bahr, “Dreaming of a Better ILS,” Computers in Libraries 27, no. 9 (2007): 10–14. 11. Marshall Breeding, “Web Services and Service Oriented Architecture,” Library Technology Reports 42, no. 3 (2006): 3–42. 12. Jessie L. Copeland et al., “Workflow Challenges: Does Technology Dictate Workflow?” Serials Librarian 56, no. 1–4 (2009): 266–70. 13. “SIRSI Introduces WorkFlows to Streamline Library Operations,” Information Today 14, no. 7 (1997): 52. 14. Ex Libris, “Ex Libris Alma: The Next Generation Library Services Framework,” 2011, www.exlibrisgroup.com/category/AlmaOverview (accessed Jan. 3, 2012). 15. ACRL Virtual Conference, “Princeton University Discusses Ex Libris Alma,” 2011, www.learningtimes.net/acrl/2011/906 (accessed Jan. 3, 2012). 16. Kuali Rice website, http://www.kuali.org/rice (accessed Sept. 10, 2012). http://bungeeconnect.wordpress.com/2008/02/18/defining-platform-as-a-service-or-paas http://bungeeconnect.wordpress.com/2008/02/18/defining-platform-as-a-service-or-paas http://www.exlibrisgroup.com/category/AlmaOverview http://www.kuali.org/rice 1917 ---- METS as an Intermediary Schema for a Digital Library of Complex Scientific Multimedia Richard Gartner INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2012 24 ABSTRACT The use of the Metadata Encoding and Transmission Standard (METS) schema as a mechanism for delivering a digital library of complex scientific multimedia is examined as an alternative to the Fedora Content Model (FCM). Using METS as an “intermediary” schema, where it functions as a template that is populated with content metadata on the fly using Extensible Stylesheet Language Transformations (XSLT), it is possible to replicate the flexibility of structure and granularity of FCM while avoiding its complexity and often substantial demands on developers. METS as an Intermediary Schema for a Digital Library of Complex Scientific Multimedia Of the many possible approaches to structuring complex data for delivery via the web, two divergent philosophies appear to predominate. One, exemplified by such standards as the Metadata Encoding and Transmission Standard (METS)1 or the Digital Item Declaration Language (DIDL),2 relies on the structured packaging of the multiple components of a complex object within “top-down” hierarchies. The second, of which the Fedora Content Model (FCM) is perhaps a prime example,3 takes the opposite approach of disaggregating structural units into atomistic objects, which can then be recombined according to the requirements of a given application.4 Neither is absolute in its approach—METS, for instance, allows cross-hierarchy linkages, and many FCM models are designed hierarchically—but the distinction is clear. Many advantages are validly claimed for the FCM approach to structuring digital data objects. Individual components, not constrained to hierarchies, may be readily reused in multiple representations with great flexibility.5 Complex interobject relationships may be encoded using semantic linkages,6 a potentially much richer approach to expressing these than the structural relationships of XML can allow. Multiple levels of granularity, from that of the collection as a whole down to its lowest-level components, can readily be modelled, allowing interobject relationships to be encoded as easily as intercomponent ones.7 Such models, particularly the RDF-based Fedora content model, are very powerful and flexible, but can often lead to complexity and consequently considerable demands on system development before they can be implemented. In addition, despite the theoretical interoperability offered by RDF, in practice the exchange and reuse of content models has proved somewhat limited because considerable work is usually required to re-create and validate a content model created elsewhere.8 This article examines whether it is possible to replicate the advantages of this approach to structuring data within the constraints of the more rigid METS standard. The data used for this analysis is a set of digital objects that result from biological nanoimaging experiments, the interrelationships of which present complex problems when they are delivered online. The Richard Gartner (richard.gartner@kcl.ac.uk) is a Lecturer in Library and Information Science, King’s College, London. METS AS AN INTERMEDIARY SCHEMA FOR A DIGITAL LIBRARY OF SCIENTIFIC MULTIMEDIA | GARTNER 25 method used is an unconventional use of a METS template as an intermediary schema;9 this allows something of the flexibility of the FCM approach while retaining the relative simplicity of the more structured METS model. A Nanoimaging Digital Library and its Metadata Requirements The collection analysed for this study derives from biological nanoimaging experiments undertaken at the Randall Division of Cell and Molecular Biophysics at King’s College London. Biological nanoimaging is a relatively new field of research that aims to unravel the biological processes at work when molecules interact in living cells; this is done by using optical techniques that can resolve images down to the molecular level. It has particular value in the study of how diseases progress and has great potential to help predict the effects of drugs on the physiology of human organs. As part of the Biophysical Repositories in the Lab (BRIL) project at King’s College London,10 a digital library is being produced to meet the needs of practitioners of live cell protein studies. Although the material being made available here is highly specialised, and the user base is restricted to a specialist cohort of biologists, the challenges of this library are similar to those of any collection of digital objects: in particular, the metadata strategy employed must be able to handle the delivery of complex, multifile objects as efficiently as, for example, a library of digitized books has to manage the multiplicity of image files that make up a single digital volume. The digital library itself is hosted on the widely used Fedora repository platform; as a result, it is employing FCM as the basis of its data modelling. The purpose of this analysis is to ascertain whether METS can also be used for the complex models required by this data and to compare its potential viability as an architecture for this type of application with FCM. A particular challenge of this collection is that the raw images from which it is constituted require combining and processing before they are delivered to the user. A further challenge is that the library encompasses images from a variety of experiments, all of which combine these files in different ways and employ different software for processing them. Some measure of the complexity of these requirements can be gathered from figure 1 below, which illustrates the processes involved in delivering the digital objects for two types of experiments. Figure 1. Architecture for Two Experiment Types METS AS AN INTERMEDIARY SCHEMA FOR A DIGITAL LIBRARY OF SCIENTIFIC MULTIMEDIA | GARTNER 26 The images created by two experiments, bleach and ACTIN_5, are shown here: it will be seen that the bleach experiment is divided into two subtypes (here called 2Grating and Apotone). Each type or subtype of experiment has its own requirements for combining the images it produces before they are displayed. For the subtype 2Grating, for instance, two images, each generated using a different camera grating, are processed in parallel (indicated by the brackets); these are then combined using the software package process-non-clem (shown by the hexagonal symbol) to produce a display image in TIFF format. The subtype Apotone requires three grating images and a further image with background information to be processed in parallel by the software process-apotone; in this case, the background image provides data to be subtracted from the combined three grating images to produce the final TIFF for display. ACTIN_5 experiments are entirely different: they produce still images that need to be processed sequentially (shown by the braces) to produce a video. Encoding the BRIL Architecture in METS This architecture, although complex, is readily handled within METS in a manner analogous to that of more conventional collections. As in any METS file, the structure of the experiments, including their subexperiments, is encoded using nested division (
) elements in the structural map (example 1a).
[subsidiary
s containing image information]
[subsidiary
s containing image information]
[subsidiary
s containing image information]
Example 1a. Sample Experiment-Level Structural Map Within these containing divisions, subsidiary
elements are used to map the combination of images necessary to deliver the content for each type. METS allows the specification of the parallel or sequential structuring of files using its and elements respectively. The parallel processing of the Apotone subtype, for instance, could be encoded as shown in example 1b. INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2012 27
Example 1b. Sample Parallel Structure for Raw Image Files to be Combined Using a Process Specified in Associated Metadata (Behavior Section) Each division of the structural map of this type may in its turn be attached to a specific software item in the METS behavior section to designate the application through which it should be processed: the tri-partite set of images in example 1b, for instance, would be linked to the process- apotone software using the code in example 1c. Example 1c. Sample METS Behavior Mechanism for a Specification of Image Processing This approach is straightforward, and METS is capable of encoding all of the requirements of this data model, although at the cost of large file sizes and a degree of inflexibility. This may be no problem when the principle rationale behind the creation of this metadata is preservation: linking all of the project metadata in a coherent, albeit monolithic, structure of this kind benefits especially its usage as an Open Archival Information System (OAIS) Archival Information Package (AIP), one of the key functions for which METS was designed. Problems are likely to arise, however, when this approach is scaled up in a delivery system to include the potentially millions of data objects that this project may produce. The large size of the METS files that this approach necessitates makes their on-the-fly processing for delivery much slower than a system that uses aggregations of the smaller files required by the FCM model and so processes only metadata at the granularity necessary for the delivery of each object. Such flexibility is much harder to achieve within METS, although mechanisms that currently exist for aggregating diverse objects within METS may seem to offer some degree of solution to this problem. Complex Relationships under METS Underlying the METS structural map is an assumed ontology of digital objects that encodes a long- established view of text as an ordered hierarchy of content objects;11 this model accounts for the METS AS AN INTERMEDIARY SCHEMA FOR A DIGITAL LIBRARY OF SCIENTIFIC MULTIMEDIA | GARTNER 28 map’s use of hierarchical nesting and the ordinality of the object’s components. The rigidity of this model is alleviated to some extent by the facility within METS to encode structural links that cut across these hierarchies. These links, which join nodes at any level of the structural map, are particularly useful for encoding hyperlinks within webpages,12 and so are often used for archiving websites. Various attempts have been made to extend the functionality of the structural map and structural links sections to allow more sophisticated aggregations and combinations of components beyond the boundaries of a single digital object, in a manner analogous to the flexible granularity of FCM. METS itself offers the possibility of aggregating other METS files through its (METS pointer) element: this element, always a direct child of a
element in the structural map, references a METS file that contains metadata on the digital content represented by this
. For example, two complex digital objects could be represented at a higher collection level, as shown in example 2.
Example 2. Use of METS Element This feature has found some use in such projects as the ECHO DEPository, which uses it to register digital objects at various stages of their ingest into, and dissemination from, a repository;13 it is also recommended by the PARADIGM project as a method for archiving born-digital content, such as emails.14 Nonetheless, its usage remains fairly limited; of all the METS Profiles registered on the central METS repository, for instance, ECHO DEP at the time of writing remains the only project on the Library of Congress’s repository of METS Profiles to employ this feature. 15 An important reason for its limited take-up is that its potential for more sophisticated uses than merely populating a division of the structural map is severely limited by its place in the METS schema. The element can only be used as a direct child of its parent
: it cannot, for instance, be located in or elements to indicate that the objects referenced in its subordinate METS files should be processed in parallel or in sequence (as is required by the different experiment types in figure 1), nor may the contents of these files be processed by the sophisticated partitioning features of the element, which allows subsidiary parts of a
to be addressed directly. A more sophisticated approach to combining digital object components is to employ Open Archives Initiative Object Reuse and Exchange (OAI-ORE) aggregations,16 which express more complex relationships at greater levels of granularity than the method allows. INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2012 29 McDonough’s examination of the possibility of aligning the two standards concludes that it is indeed possible, although at the cost of eliminating the METS behavior section and removing much of the flexibility of METS’s structural links, both side effects of OAI-ORE’s requirement that resource maps must form a connected RDF graph.17 In addition, converting between METS and OAI-ORE may not be lossless, depending on the design of the METS document.18 Neither approach therefore seems ideal for an application of this type, the former because of the limited ways in which the element can be deployed outside the element and its subsidiaries, the latter because of its removal of the functionality of the behavior section, which is essential for the delivery of material such as this. METS as an Intermediary Schema An alternative approach adopted here uses the technique of employing METS files as intermediary schemas to act as templates from which METS-encoded packages for delivery can be generated. Intermediary XML schemas are intermediary in the sense that they are designed not to act as final metadata containers for archiving or delivery, but as mediating encoding mechanisms from which data or metadata in these final forms can be generated by XSLT transformations: one example is CERIF4REF, a heavily constrained XML schema used to encode research management information from which metadata in the complex Common European Research Information Format (CERIF) data model can be generated.19 The CERIF4REF schema attempts to emulate the architectural processing features of SGML,20 which are absent from XML; these allowed simpler Document Type Definitions (DTDs) to be compiled for specific applications, which could then be mapped to established, more complex, SGML models. Instead of architectural processing, CERIF4REF uses XSLT to carry out this processing, so allowing the combination of a simpler scheme tailored to the requirements of an application to be combined with the benefits of a more interoperable but highly complex model that is difficult to implement in its standard form. Instead of using this technique for constraining encoding to a simpler model and generating more complex data structures from this, the intermediary schema technique may be used to define templates, similar to a content model, from which the final METS files to be delivered can be constructed. As is the case with CERIF4REF, XSLT is used for these transformations, and the XSLT files form an integral part of the application. In this way, a series of templates, beginning with highest-level abstractions, are used to generate their more concrete subsidiaries, until a final version used for dissemination is generated. The core of this application is a METS file, which acts as a template for the data delivery requirements for each type of experiment. Figure 2 demonstrates the components necessary for defining these for the 2Grating experiment subtype detailed previously in figure 1. METS AS AN INTERMEDIARY SCHEMA FOR A DIGITAL LIBRARY OF SCIENTIFIC MULTIMEDIA | GARTNER 30 Figure 2. Defining an Experiment Subtype in METS The data model for the delivery of these objects is defined in the (b): as can be seen here, a series of nested
elements is used to define the relationship of experiment subtypes to types, and then to define, at the lowest level of this structure, the model for delivering the objects themselves. In this example, two files are to be processed in parallel; these are defined by elements within the (parallel) element. In a standard METS file, the FILEID attribute of would reference a element within the METS file section (a): in this case, however, they reference empty file group () elements, which are populated with elements when this template undergoes its XSLT transformation. The final component of this template is the METS behavior section (c), in which the applications required to process the digital objects are defined. Two behavior sections are shown in this example: the first is used to invoke the XSLT transformation by which this METS template file is to be processed, the second to define the software necessary to co-process the two images files for delivery. Both indicate the divisions of the structural map whose components they process by their STRUCTID attributes: the first references the map as a whole because it applies to recursively to the METS file itself, the second references the experiment for which it is needed. When delivering a digital object, it is then necessary to process this template METS file to generate the final version used to encode its metadata in full. The XSLT used to do this co-processes the template and a separate METS file defined for each object containing all of its relevant metadata: INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2012 31 this latter file is used to populate the empty sections of the template, in particular the file section. Figure 3 provides an illustration of the XSLT fragment which carries out this function. Figure 3. The XSLT transformation file is evoked with the sample parameter, which contains the number of the sample to be rendered: this is used to generate the filename for the document function, which selects the relevant METS file containing metadata for the image itself. The element within this file, which corresponds to the required image, is then integrated into the relevant element in the template file, populating it with its subcomponents, including the element, which contains the location of the file itself. In the case of the ACTIN_5 experiment, which generates a video file from a sequence of still images, the processes involved are slightly more complicated. Because the number of still images to be processed will vary for each sample, it is not possible to specify the template for the delivery METS AS AN INTERMEDIARY SCHEMA FOR A DIGITAL LIBRARY OF SCIENTIFIC MULTIMEDIA | GARTNER 32 version of the sequence explicitly within a element as is done for the other experiments. Instead, it is necessary to define a further METS file (the “sequence file”) in which the sequence for a given sample is defined. In this case, the architecture is shown in figure 4. Figure 4. Populating Sequentially Processed File Section with XSLT In this case the element in the METS template file acts as a placeholder only and does not encode even the skeletal information for the parallel-processed TIFF files in figure 3. Similarly, the structural map
for this experiment indicates only that this section is a sequence but does not enumerate the files themselves even in template form. Both of these sections are populated when the file is processed by the XSLT transformation to import metadata from the METS “sequence file,” INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2012 33 in which the file inventory (a) and sequential structure (b) for a given sample are listed. The XSLT file populates the file section and structural map directly from this file, replacing the skeletal sections in the template with their counterparts from the sequence file. Through this relatively simple XSLT transformation, the final delivery version of the METS file is readily generated for either content model. This file can itself then be delivered on the fly (for instance, as a Fedora disseminator); this is done by using a further XSLT file to process the complex digital object components using the mechanism associated with each experiment in the METS behavior section. Given the relatively small size of all of the files involved, this processing can be done more quickly than would be possibly using a fully aggregated METS approach. In the laboratory environment in particular, where the fast rendering and delivery of these images is needed so as not to impede workflows, this has major advantages. Although the project aimed to examine specifically the use of Fedora for the delivery of this complex material, and so employed FCM as the basis of its metadata strategy, the technique examined in this article proved itself a viable alternative that made much fewer demands on developer time. The small number of XSLT stylesheets required to render and deliver the METS files were written within a few hours: the development time to program the delivery of the RDF- based metadata that formed the FCM required several weeks. Processing XML using XSLT disseminators in Fedora is very fast, and so using this method instead of processing RDF introduces no discernible delays in object delivery. CONCLUSIONS This approach to delivering complex content appears to offer the benefits of the alternative approaches outlined above in a simpler manner than either currently allows. It offers much greater flexibility than the METS element, which can only populate a complete structural map division. When compared to the FCM approach, this strategy, which relies solely on relatively simple XSLT transformations for processing the metadata, requires less developer time but offers a similar degree of flexibility of structure and granularity. It also avoids much of the rigidity of the OAI-ORE approach by not requiring the use of connected RDF graphs, and so frees up the behavior section to define the processing mechanisms needed to deliver these objects. Using the intermediary schema technique in this way does therefore offers a means of combining the advantages of employing well-defined interoperable metadata schemes and the practicalities of delivering digital content in an efficient manner, which makes limited demands on development. As such, it represents a viable alternative to the previous attempts to handle complex aggregations within METS discussed above. The adoption of integrated library systems (ILS) became prevalent in the 1980s and 1990s as libraries began or continued to automate their processes. These systems enabled library staff to work, in many cases, more efficiently than they had been in the past. However, these systems were also restrictive—especially as the nature of the work began to change—largely in response to the growth of electronic and digital resources for which they were not intended to manage. New library systems—the second (or next) generation—are needed to effectively manage the processes of acquiring, describing, and making available all library resources. This article examines the state of library systems today and describes the features needed in a next-generation ILS. The authors also examine some of the next-generation ILSs currently in development that purport to fill the changing needs of libraries. METS AS AN INTERMEDIARY SCHEMA FOR A DIGITAL LIBRARY OF SCIENTIFIC MULTIMEDIA | GARTNER 34 REFERENCES 1 Library of Congress, “Metadata Encoding and Transmission Standard (METS) Official Web Site,” 2011 http://www.loc.gov/standards/mets (accessed August 1, 2011). 2 Organisation Internationale de Normalisation, “ISO/IEC JTC1/SC29/WG11: Coding of Moving Pictures and Audio,” 2002, http://mpeg.chiariglione.org/standards/mpeg-21/mpeg-21.htm (accessed August 1, 2011). 3 Fedora Commons, “The Fedora Content Model Architecture (CMA),” 2007, http://fedora- commons.org/documentation/3.0b1/userdocs/digitalobjects/cmda.html (accessed December 9, 2011). 4 Carl Lagoze et al., “Fedora: An Architecture for Complex Objects and their Relationships,” International Journal on Digital Libraries 6, no. 2 (2005): 130. 5 Ibid., 127. 6 Ibid., 135. 7 Ibid. 8 Rishi Sharma, Fedora Interoperability Review (London: Centre for e-Research, 2007), http://wwwcache1.kcl.ac.uk/content/1/c6/04/55/46/fedora-report-v1.pdf.3 (accessed August 1, 2011). 9 Richard Gartner, “Intermediary Schemas for Complex XML Publications: An Example from Research Information Management,” Journal of Digital Information 12, no. 3 (2011), https://journals.tdl.org/jodi/article/view/2069 (accessed August 1, 2011). 10 Centre for e-Research, “BRIL,” n.d., http://bril.cerch.kcl.ac.uk (accessed August 1, 2011). 11 S. J. DeRose et al., “What is Text, Really,” Journal of Computing in Higher Education 1, no. 2 (1990): 6. 12 Digital Library Federation, “: Metadata Encoding And Transmission Standard: Primer And Reference Manual,” Digital Library Federation, 2010, www.loc.gov/standards/mets/METSPrimerRevised.pdf, 77 (accessed August 1, 2011). 13 Bill Ingram, “ECHO Dep METS Profile for Master METS Documents,” n.d., http://dli.grainger.uiuc.edu/echodep/METS/DRAFTS/MasterMETSProfile.xml (accessed August 1, 2011). 14 Susan Thomas, “Using METS for the Preservation and Dissemination of Digital Archives,” n.d., www.paradigm.ac.uk/workbook/metadata/mets-altstruct.html (accessed August 1, 2011). 15 Library of Congress. “METS Profiles: Metadata Encoding and Transmission Standard (METS) http://www.loc.gov/standards/mets http://mpeg.chiariglione.org/standards/mpeg-21/mpeg-21.htm http://fedora-commons.org/documentation/3.0b1/userdocs/digitalobjects/cmda.html http://fedora-commons.org/documentation/3.0b1/userdocs/digitalobjects/cmda.html http://wwwcache1.kcl.ac.uk/content/1/c6/04/55/46/fedora-report-v1.pdf.3 https://journals.tdl.org/jodi/article/view/2069 http://bril.cerch.kcl.ac.uk/ http://dli.grainger.uiuc.edu/echodep/METS/DRAFTS/MasterMETSProfile.xml INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2012 35 OfficialWeb Site”, 2011. http://www.loc.gov/standards/mets/mets-profiles.html (accessed December 6, 2011). 16 Open Archives Initiative, “Open Archives Initiative Protocol—Object Exchange and Reuse,” n.d., www.openarchives.org/ore (accessed December 12, 2011). 17 Jerome McDonough, “Aligning METS with the OAI-ORE Data =Mmodel,” JCDL ’09 Proceedings of the 9th ACM/IEEE-CS Joint Conference on Digital libraries (New York: Association for Computing Machinery, 2009): 328. 18 Ibid., 329. 19 Gartner, “Intermediary Schemas.” 20 Gary Simons, “Using Architectural Processing to Derive Small, Problem-Specific XML Applications from Large, Widely-Used SGML Applications,” Summer Institute of Linguistics Electronic Working Papers (Chicago: Summer Institute of Linguistics, 1998), www.silinternational.org/silewp/1998/006/SILEWP1998-006.html (accessed August 1, 2011). http://www.loc.gov/standards/mets/mets-profiles.html http://www.openarchives.org/ore/ 1919 ---- Trends at a Glance: A Management Dashboard of Library Statistics Emily Morton-Owens and Karen L. Hanson INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2012 36 ABSTRACT Systems librarians at an academic medical library created a management data dashboard. Charts were designed using best practices for data visualization and dashboard layout, and include metrics on gatecount, website visits, instant message reference chats, circulation, and interlibrary loan volume and turnaround time. Several charts draw on EZproxy log data that has been analyzed and linked to other databases to reveal use by different academic departments and user roles (such as faculty or student). Most charts are bar charts and include a linear regression trend line. The implementation uses Perl scripts to retrieve data from eight different sources and add it to a MySQL data warehouse, from which PHP/JavaScript webpages use Google Chart Tools to create the dashboard charts. INTRODUCTION New York University Health Sciences Libraries (NYUHSL) had adopted a number of systems that were either open-source, home-grown, or that offered APIs of one sort or another. Examples include Drupal, Google Analytics, and a home-grown interlibrary loan (ILL) system. Systems librarians decided to capitalize on the availability of this data by designing a system that would give library management a single, continuously self-updating point of access to monitor a variety of metrics. Previously this kind of information had been assembled annually for surveys like AAHSL and ARL. 1 The layout and scope of the dashboard was influenced by Google Analytics and a beta dashboard project at Brown.2 The dashboard enables closer scrutiny of trends in library use, ideally resulting in a more agile response to problems and opportunities. It allows decisions and trade-offs to be based on concrete data rather than impressions, and it documents the library’s service to its user community, which is important in a challenging budget climate. Although the end product builds on a long list of technologies—especially Perl, MySQL, PHP, JavaScript, and Google Chart Tools—the design of the project is lightweight and simple, and the number of lines of code required to power it is remarkably small. Further, the design is modular. This means that NYUHSL could offer customized versions for staff in different roles, restricting the display to show only data that is relevant to the individual’s work. Because most libraries have a unique combination of technologies in place to handle functions like circulation, reference questions, circulation, and so forth, a one-size-fits-all software package that Emily Morton-Owens (emily.morton.owens@gmail.com) was Web Services Librarian and Karen Hanson (Karen.Hanson@med.nyu.edu) is Knowledge Systems Librarian, New York University Health Sciences Libraries, New York. TRENDS AT A GLANCE: A MANAGEMENT DASHBOARD OF LIBRARY STATISTICS | MORTON-OWENS AND HANSON 37 could be used by any library may not be feasible. Instead, this lightweight and modular approach could be re-created relatively easily to fit local circumstances and needs. Visual Design Principles In designing the dashboard, we tried to use some best practices for data visualization and assembling charts into a dashboard. The best-known authority on data visualization, Edward Tufte, states “above all else, show the data.”3 In part, this means minimizing distractions, such as unnecessary gridlines and playful graphics. Ideally, every dot of ink on the page would represent data. He also emphasizes truthful proportions, meaning the chart should be proportional to the actual measurements.4 A chart should display data from zero to the highest quantity, not arbitrarily starting the measurements at a higher number, because that distorts the proportions between the part and the whole. A chart also should not use graphics that differ in width as well as length, because that causes the area of the graphic to increase incorrectly, as opposed to simply the length increasing. Pie charts are popular chart types that have serious problems in this respect despite their popularity; they require users to judge the relative area of the slices, which is difficult to do accurately.5 Generally, it is better to use a bar chart with different length bars whose proportions users can judge better. Color should also be used judiciously. Some designers use too many colors for artistic effect, which creates a “visual puzzle”6 as the user wonders whether the colors carry meaning. Some colors stand out more than others and should be used with caution. For example, red is often associated with something urgent or negative, so it should only be used in appropriate contexts. Duller, less saturated colors are more appropriate for many data visualizations. A contrasting style is exemplified by Nigel Holmes, who designs charts and infographics with playful visual elements. A recent study compared the participants’ reactions to Holmes’ work with plain charts of the same data.7 There was no significant difference in comprehension or short- term memorability; however, the researchers found that the embellished charts were more memorable over the long term, as well as more enjoyable to look at. That said, Holmes’ style is most appropriate for charts that are trying to drive home a certain interpretation. In the case of the dashboard, we did not want to make any specific point, nor did we have any way of knowing in advance what the data would reveal, so we used Tufte’s principles in our design. A comparable authority on dashboard design is Stephen Few. A dashboard combines multiple data displays in a single point of access. As in the most familiar example, a car dashboard, it usually has to do with controlling or monitoring something without taking your focus from the main task.8 A dashboard should be simple and visual, not requiring the user to tune out extraneous information or interpret novel chart concepts. The goal is not to offer a lookup table of precise values. The user should be able to get the idea without reading too much text or having to think INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2012 38 too hard about what the graph represents. Thinking again of a car, its speedometer does not offer a historical analysis of speed variation because this is too much data to process while the car is moving. Similarly, the dashboard should ideally fit on one screen so that it can be taken in at a glance. If this is not possible, at least all of the individual charts should be presented intact, without scrolling or being cramped in ways that distort the data. A dashboard should present data dimensions that are dynamic. The user will refer to the dashboard frequently, so presenting data that does not change over time only takes up space. Better yet, the data should be presented alongside a benchmark or goal. A benchmark may be a historical value for the same metric or perhaps a competitor’s value. A goal is an intended future value that may or may not ever have been reached. Either way, including this alternate value gives context for whether the current performance is desirable. This is essential for making the dashboard into a decision-making tool. Nils Rasmussen et al. discuss three levels of dashboards: strategic, tactical (related to progress on a specific project), and operational (related to everyday, department-level processes). 9 So far, NYUHSL’s dashboard is primarily operational, monitoring whether ordinary work is proceeding as planned. Later in this paper we will discuss ways to make the dashboard better suited to supporting strategic initiatives. System Architecture The dashboard architecture consists of three main parts: importer scripts that get data from diverse sources, a data warehouse, and PHP/JavaScript scripts that display the data. The data warehouse is a simple MySQL database; the term “warehouse” refers to the fact that it contains a stripped-down, simplified version of the data that is appropriate for analysis rather than operations. Our approach to handling the data is an ETL (extract, transform, load) routine. Data are extracted from different sources, transformed in various ways, and loaded into the data warehouse. Our data transformations include reducing granularity and enriching the data using details drawn from other datasets, such as the institutional list of IP ranges and their corresponding departments. Data rarely change once in the warehouse because they represent historical measurements, not open transactions.10 There is an importer script customized for each data source. The data sources differ in format and location. For example, Google Analytics is a remote data source with a unique Data Export API, the ILL data are in a local MySQL database, and LibraryH3lp has remote CSV log files. The scripts run automatically via a cron job at 2a.m. and retrieves data for the previous day. That time was chosen to ensure all other nightly cron jobs that affect the databases are complete before the dashboard imports start. Each uses custom code for its data source and creates a series of MySQL INSERT queries to put the needed data fields in the MySQL data warehouse. For example, a script might pull the dates when an ILL request was placed and filled, but not the title of the requested item. TRENDS AT A GLANCE: A MANAGEMENT DASHBOARD OF LIBRARY STATISTICS | MORTON-OWENS AND HANSON 39 A carefully thought-out data model simplifies the creation of reports. The data structure should aim to support future expansion. In the data warehouse, information that was previously formatted and stored in very inconsistent ways is brought together uniformly. There is one table for each kind of data with consistent field names for dates, services, and so forth, and others that combine related data in useful ways. The dashboard display consists of a number of widgets, one for each chart. Each chart is created with a mixture of PHP and JavaScript. Google Chart Tools interprets lines of JavaScript to draw an attractive, proportional chart. We do not want to hardcode the values in this JavaScript, of course, because the charts should be dynamic. Therefore we use PHP to query the data warehouse and a statement for each line of results to “write” a line of the data in JavaScript. Figure 1. PHP is used to read from the database and generate rows of data as server-side JavaScript. Each PHP/JavaScript file created through this process is embedded in a master PHP page. This master page controls the order and layout of the individual widgets using the PHP include feature to add each chart file to the page plus a CSS stylesheet to determine the spacing of the charts. Finally, because all the queries take a relatively long time to run, the page is cached and refreshes itself the first time the page is opened each day. The dashboard can be refreshed manually if the database or code is modified and someone wants to see the results immediately. Many of the dashboard’s charts include a linear regression trend line. This feature is not provided by Google Charts and must be inserted into the widget’s code manually. The formula can be found online.11 The sums and sums of squares are totted up as the code loops through each line of data, and these totals are used to calculate the slope and intercept. In our twenty-six-week displays, we never want to include the twenty-sixth week of data because that is the present (partial) week. The linear regression line takes the form y = mx + b. We can use that formula along with the slope and intercept values to calculate y-values for week zero and the next-to-last week (week twenty- five). Those two points are plotted and the trend line is drawn between them. The color of the line depends on its slope (greater or less than zero). Depending on whether we want that chart’s metric to go up or down, the line is green for the desirable direction and red for the undesirable direction. INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2012 40 Details on Individual Systems Gatecount Most of NYUHSL’s five locations have electronic gates to track the number of patrons who visit. Formerly these statistics were kept in a Microsoft Excel spreadsheet, but now there is a simple web form into which staff can enter the gate reading twice daily. The data goes directly into the data warehouse, and the a.m. and p.m. counts are automatically summed. There is some error- checking to prevent incorrect numbers being entered, which varies depending on whether that location’s gate is the kind that provides a continuously increasing count or is reset each day. The data are presented in a stacked bar chart, summed for the week. The user can hover over the stacked bars to see numbers for each location, but the height of the stacked bar and the trend line represent the total visits for all locations together. Figure 2. Stacked Bar Chart with Trendline Showing Visits per Week to Pphysical Library Branches over a Twenty-Six-Week Period Ticketing NYUHSL manages online user requests with a simple ticketing system that integrates with Drupal. There are four types of tickets, two of which involve helping users and two of which involve reporting problems. The “helpful” services are general reference questions and literature search requests. The “trouble” services are computer problems and e-resource problems. These two pairs TRENDS AT A GLANCE: A MANAGEMENT DASHBOARD OF LIBRARY STATISTICS | MORTON-OWENS AND HANSON 41 each have their own stacked bar chart because, ideally, the number of “helpful” tickets would go up while the number of “trouble” tickets would go down. Each chart has a trend line, color-coded for the direction that is desirable in that case. Figure 3. Stacked Bar Chart with Trendline Showing Trouble Tickets by Type The script that imports this information into the data warehouse simply does so from another local MySQL database. It only fetches the date and the type of request, not the actual question or response. It also inserts a record into the user transactions table, which will be discussed in the section on user data. Drupal NYUHSL’s Drupal site allows librarians directly to contribute content like subject guides and blog posts.12 The dashboard tracks the number of edits contributed by users (excluding the web services librarian and the web manager, who would otherwise swamp the results). This is done with a simple COUNT query on the node_revisions table in the Drupal database. Because no other processing is needed and caching ensures the query will be done at most once per day, this is the only widget that pulls data directly from the original database at the time the chart is drawn. Koha Koha is an open-source OPAC system. At NYUHSL, Koha’s database is in MySQL. Each night the importer script copies “issues” data from Koha’s statistics table. This supports the creation of a INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2012 42 stacked bar chart showing the number of item checkouts each week, with each bar divided according to the type of item borrowed (e.g., book or laptop). As with other charts, a color-coded trend line was added to show the change in the number of item checkouts. Google Analytics The dashboard relies on the Google Analytics PHP Interface (GAPI) to retrieve data using the Google Analytics Data Export API.13 Nothing is stored in the data warehouse and there is no importer script. The first widget gets and displays weekly total visits for all NYUHSL websites, the main NYUHSL website, and visits from mobile devices. A trend line is calculated from the “all sites” count. The second widget retrieves a list of the top “outbound click” events for the past thirty days and returns them as URLs. A regular expression is used to remove any EZproxy prefix, and the remaining URL is matched against our electronic resources database to get the title. Thus, for example, the widget displays “Web of Knowledge” instead of “http://ezproxy.med.nyu.edu/login?url=http://apps.isiknowledge.com/.” A future improvement to this display would require a new table in the data warehouse and importer script to store historic outbound click results. This data would support comparison of the current list with past results to identify click destinations that are trending up or down. Figure 4. Most Popular Links Clicked On to Leave the Library’s Website in a Thirty-Day Period TRENDS AT A GLANCE: A MANAGEMENT DASHBOARD OF LIBRARY STATISTICS | MORTON-OWENS AND HANSON 43 Libraryh3lp LibraryH3lp is a Jabber-based IM product that allows librarians to jointly manage a queue of reference queries. It offers CSV-formatted log files that a Perl script can access using “curl,” a command-line tool that mimics a web browser’s login, cookies, and file requests. The CSV log is downloaded via curl, processed with Perl’s Text::CSV module, and the data are then inserted into the warehouse. The first LibraryH3lp widget counts the number of chats handled by each librarian over the past ninety days. The second widget tracks the number of chats for the past twenty-six weeks and includes a trend line. Figure 5. Bar Chart Showing Number of IM Chats per Week over a Twenty-Six-Week Period Document Delivery Services The Document Delivery Services (DDS) department fulfills ILL requests. The web application that manages these requests is homegrown, with a database in MySQL. Each night, a script copies the latest requests to the data warehouse. The dashboard uses this data to display a chart of how many requests are made each week and which publications are requested from other libraries most frequently. This data could be used to determine whether there are resources that should be considered for purchase. INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2012 44 The DDS data was also used to demonstrate how data might be used to track service performance. One chart shows the average time it takes to fulfill a document request. Further evaluation is required to determine the usefulness of such a chart for motivating improvement of the service or whether this is perceived as a negative use of the data. Some libraries may find this kind of information useful for streamlining services. Figure 6. This stacked bar chart shows the number of document delivery requests handled per week. The chart separates patron requests from requests made by other libraries. EZproxy Data EZproxy is an OCLC tool for authenticating users who attempt to access the library’s electronic resources. It does not log e-resource use where the user is automatically authenticated using the institutional IP range, but the data are still valuable because it logs a significant amount of use that can support in-depth analysis. Because of the gaps in the data, much of the analysis looks at patterns and relationships in the data rather than absolute values. Karen Coombs’ article discussing the analysis of EZproxy logs to understand e-resource at the department level provided the initial motivation to switch on the EZproxy log.14 When logging is enabled, a standard web log file is produced. Here is a sample line from the log: 123.45.6.7 amYu0GH5brmUska hansok01 [09/Sep/2011:18:25:23 -0500] POST http://ovidsp.tx.ovid.com: 80/sp3.3.1a/ovidweb.cgi HTTP/1.1 20020472 http://ovidsp.tx.ovid.com.ezproxy.med.nyu.edu/sp-3.3.1a/ovidweb.cgi TRENDS AT A GLANCE: A MANAGEMENT DASHBOARD OF LIBRARY STATISTICS | MORTON-OWENS AND HANSON 45 Each line in the log contains a user IP address, a unique session ID, the user ID, the date and time of access, the URL requested by the user, the HTTP status code, the number of bytes in the requested file, and the referrer (the page the user clicked on to get to the site). The EZproxy log data undergoes some significant processing before being inserted into the EZproxy report tables. The main goal of this is to enrich the data with relevant supplemental information while eliminating redundancy. To facilitate this process, the importer script first dumps the entire log into a table and then performs multiple updates on the dataset. During the first step of processing, the IP addresses are compared to a list of departmental IP ranges maintained by Medical Center IT. If a match is found, the “location accessed” is stored against the log line. Next, the user ID is compared with the institutional people database, retrieving a user type (faculty, staff, or student) and a department, if available (e.g., radiology). One item of significant interest to senior management is the level of use within hospitals. As a medical library, we are interested in the library’s value to patient care. If there is significant use in the hospitals, this could furnish positive evidence about the library’s role in the clinical setting. Next, the resource URL and the referring address are truncated down to domain names. The links in the log are very specific, showing detailed user activity. Because the library is operating in a medical environment, privacy is a concern and so specific addresses are truncated to a top-level domain (e.g. ovid.com) to suppress any tie to a specific article, e-book, or other specific resource. Finally, a query is run against the remaining raw data to condense the log down to unique session ID/resource combinations, and this block of data is inserted into a new table. Each user visit to a unique resource in a single session is recorded; for example, if a user visits Lexis Nexis, Ovid Medline, Scopus, and Lexis Nexis again in a single session, three lines will be recorded in the user activity table. A single line in the final EZproxy activity table contains a unique combination of location accessed (e.g., Tisch Hospital), user department (e.g., radiology), user type (e.g., staff), earliest access date/time for that resource (e.g., 9/9/201118:25), resource name (e.g., scopus.com), session ID, and referring domain (e.g., hsl.med.nyu.edu). There is significant repetition in the log. Depending on what filters are set up, every image within a webpage could be a line in the log. The method of condensing the data described previously results in a much smaller and more manageable dataset. For example, on a single day 115,070 rows of were collected in the EZproxy log, but only 2,198 were inserted into the final warehouse table after truncating the URLs and removing redundancy. In a separate query on the raw data table, a distinct list containing user ID, date, and the word “e- resources” is built and stored in a “user transactions” table. This very basic data are stored so that simple user analysis can be performed (see “User Data” below). INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2012 46 Figure 7. Line Chart Showing Total Number of EZproxy Sessions Captured per Week over a Twenty-Six- Week Period Once the EZproxy data are transferred to the appropriate tables, the raw data (and thus the most concerning data from a privacy standpoint) is purged from the database. Several dashboard charts were created using the streamlined EZproxy data, a simple count of weekly e-resource users, and a table showing resources whose use changed most significantly since the previous month. It was challenging to calculate the significance of the variations in use since resources that went from one session in a month to two sessions were showing the same proportional change as those that increased from one thousand to two thousand sessions. A basic calculation was created to highlight the more significant changes in use. d = (p- q) if d<0 then significance = d—8 x 10 d q +1 if d>0 then significance = d +8 x 10 d q +1 d = Difference between last month and this month p = Number of visits last month (8 to 1 days ago) q = Number of visits previous month (15 to 9 days ago) TRENDS AT A GLANCE: A MANAGEMENT DASHBOARD OF LIBRARY STATISTICS | MORTON-OWENS AND HANSON 47 This equation serves the basic purpose of identifying unusual changes in e-resource use. For example, one e-resource was shown trending up in use after a librarian taught a course in it. Figure 8. Table of E-Resources Showing the Most Significant Change in Use over the Last Month Compared to the Previous Month The EZproxy data has already proven to be a rich source of data. The work so far has only scratched the surface of what the data could show. Only two charts are currently displayed on the dashboard, but the value of thisdata is more likely to come from one-off customized reports based on specific queries, like tracking use of individual resources over time or looking at variations of use within specific buildings, departments, or user types. There is also a lot that could be done with the referrer addresses. For example, the library has been submitting tips to the newsletter that is delivered by email. The referrer log allows the number of clicks from this source to be measured so that librarians can monitor the success of this marketing technique. User Data Each library system includes some user information. Where user information is available in a system, a separate table is populated in the warehouse. As mentioned briefly above, a user ID, a date, and the type of service used (e-resources, DDS, literature search, etc.) is stored. Details of the transaction are not kept here. The user ID can be used to look up basic information about the user such as role (faculty, staff, student) and department. We should emphasize for clarity that the detailed information about the activity is completely separated from any information about the user so that the data cannot be joined back together. INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2012 48 The most sensitive data, such as the raw EZproxy log data, is purged after the import script has copied the truncated and de-identified data. Even though the data stored is very basic, information at the granularity of individual users is never displayed on the dashboard. The user information is aggregated by user type for further analysis and display. The institutional people database can be used to determine how many people are in each department. A table added to the dashboard shows the number of resource uses and the percentage of each department that used library resources in a six-month period. Some potential uses of this data include identifying possible training needs and measuring the success of library outreach to specific departments. For example, if one department uses the resources very little, this may indicate a training or marketing deficit. It may also be interesting to analyze how the academic success of a department aligns with library resource use. Do the highest intensity users of library resources have greater professional output or higher prestige as a research department, for example? It is unsurprising to find that medical students and librarians are most likely to use library resources. The graduate medical education group is third and includes medical residents (newly qualified doctors on a learning curve). As with the EZproxy data, there are numerous insights to be gained from this data that will help the library make strategic decisions about future services. Figure 9. Table Showing the Proportion of Each User Group that has Used at Least One Library Service in a Six-Month Period RESULTS TRENDS AT A GLANCE: A MANAGEMENT DASHBOARD OF LIBRARY STATISTICS | MORTON-OWENS AND HANSON 49 The dashboard has been available for almost a year. It requires a password and is only available to NYUHSL’s senior management team and librarians who have asked for access. Feedback on the dashboard has been positive, and librarians have begun to make suggestions to improve its usefulness. One librarian uses the data warehouse for his own reports and will soon provide his queries so that they can be added to the dashboard. The dashboard has facilitated discoveries about the nature of our users and has identified potential training needs and areas of weakness in outreach. A static dashboard snapshot was recently created for presentation to the dean of the medical school to illustrate the extent and breadth of library use. The initial dashboard aimed to demonstrate the kinds of library statistics that it is possible to extract and display, but there is much to be done to improve its operational usefulness. A dashboard working group has been established to build on the original proof-of-concept by improving the data model and adding relevant charts. Some charts will be incorporated into the public website as a snapshot of library activity. The dashboard was structured to be adaptable and expandable. The next iteration will support customization of the display for each user. New charts will be added as requested, and charts that are perceived to be less insightful will be removed. For example, one chart shows the number of reference chat requests answered by each librarian in addition to the number of chats handled per week. The usefulness of this chart was questioned when it was observed that the results were merely a reflection of which librarians had the most time at their own desks, allowing them to answer chats. This is an example of how it can be difficult to separate context from numbers. In this instance the individual statistics were only included because the data was available, not because any particular request from management, so these charts may be removed from the dashboard. NYUHSL is also investigating the Ex Libris tool UStat, which supports analysis of COUNTER (Counting Online Usage of NeTworked Electronic Resources) reports from e-resources vendors. UStat covers some of the larger gaps in the EZproxy log, including journal-level rather than vendor-level analysis, and most importantly, the use statistics for non-EZproxied addresses. A future project will be to see whether there is an automated way to extract use metrics, either from UStat or directly from the vendors to be incorporated into the data warehouse. Preliminary discussion are being held with IT administrators about the possibilities of EZproxying library resource URLs as they pass through the firewall so that the EZproxy log becomes a more complete reflection of use. An example of a strategic decision based on dashboard data involves NYUHSL’s mobile website. Librarians had been considering the question of whether to invest substantial effort in identifying and presenting free apps and mobile websites to complement the library’s small collection of licensed mobile content. The chart of website visits on the dashboard surprisingly shows that the number of visits that come from mobile devices is consistently fewer 3 percent, probably because of the relatively modest selection of mobile-optimized website resources. Rather than invest INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2012 50 significant effort in cataloging additional potentially lackluster free resources that would not be seen by a large number of users, the team decided to wait for more headlining subscription-based resources to become available and increase traffic to the mobile site. It would be worthwhile to add charts to the dashboard that track metrics related to new strategic initiatives requiring librarians to translate strategic ideas into measurable quantities. For example, if the library aspired to make sure users received responses more quickly, charts tracking the response time for various services could be added and grouped together to track progress on this goal. As data continues to accumulate, it will be possible to extend the timeframe of the charts, for example, making weekly charts into monthly ones. Over time, the data may become more static, requiring more complicated calculations to reveal interesting trends. CONCLUSIONS The medical center has a strong ethic of metric-driven decisions, and the dashboard brings the library in line with this initiative. The dashboard allows librarians and management to monitor key library operations from a single, convenient page, with an emphasis on long-term trends rather than day-to-day fluctuations in use. It was put together using freely available tools that should be within the reach of people with moderate programming experience. Assembling the dashboard required background knowledge of the systems in question, was made possible by NYUHSL’s use of open-source and homegrown software, and increased the designers’ understanding of the data and tools in question. REFERENCES 1 Association of Academic Health Sciences Libraries, “Annual Statistics,” http://www.aahsl.org/mc/page.do?sitePageId=84868 (accessed November 7, 2011); Association of Research Libraries, “ARL Statistics,” http://www.arl.org/stats/annualsurveys/arlstats (accessed November 7, 2011). 2 Brown University Library, “dashboard_beta :: dashboard information,” http://library.brown.edu/dashboard/info (accessed January 5, 2012). 3 Edward R. Tufte, The Visual Display of Quantitative Information (Cheshire, CT: Graphics, 2001), 92. 4 Ibid., 56. 5 Ibid., 178. 6 Ibid., 153. 7 Scott Bateman et al., “Useful Junk? The Effects of Visual Embellishment on Comprehension and Memorability of Charts,” CHI ’10 Proceedings of the 28th International Conference on Human Factors in Computing Systems (New York, ACM, 2010) , doi: 10.1145/1753326.1753716. http://www.aahsl.org/mc/page.do?sitePageId=84868 http://www.arl.org/stats/annualsurveys/arlstats/ http://library.brown.edu/dashboard/info/ TRENDS AT A GLANCE: A MANAGEMENT DASHBOARD OF LIBRARY STATISTICS | MORTON-OWENS AND HANSON 51 8 Stephen Few, Information Dashboard Design: The Effective Visual Communication of Data (Beijing: O’Reilly, 2006), 98. 9 Nils Rasmussen, Claire Y. Chen, and Manish Bansal, Business Dashboards: A Visual Catalog for Design and Deployment (Hoboken, NJ: Wiley, 2009), ch. 4. 10 Richard J. Roiger and Michael W. Geatz, Data Mining: A Tutorial-Based Primer (Boston: Addison Wesley, 2003), 186. 11 One example: Stefan Waner and Steven R. Costenoble, “Fitting Functions to Data: Linear and Exponential Regression,” February 2008, http://people.hofstra.edu/Stefan_Waner/RealWorld/calctopic1/regression.html (accessed January 5, 2012). 12 Emily G. Morton-Owens, “Editorial and Technological Workflow Tools to Promote Website Quality,” Information Technology &Llibraries 30, no 3 (September 2011):92–98. 13 Google, “GAPI—Google Analytics API PHP Interface,” http://code.google.com/p/gapi-google-analytics- php-interface (accessed January 5, 2012). 14 Karen A. Coombs, “Lessons Learned from Analyzing Library Database Usage Data,” Library HiTech 23 (2005): 4, 598–609, doi: 10.1108/07378830510636373. http://people.hofstra.edu/Stefan_Waner/RealWorld/calctopic1/regression.html http://code.google.com/p/gapi-google-analytics-php-interface/ http://code.google.com/p/gapi-google-analytics-php-interface/ 1926 ---- Learning to Share: Measuring Use of a Digitized Collection on Flickr and in the IR Melanie Schlosser and Brian Stamper INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2012 85 ABSTRACT There is very little public data on usage of digitized library collections. New methods for promoting and sharing digitized collections are created all the time, but very little investigation has been done on the effect of those efforts on usage of the collections on library websites. This study attempts to measure the effects of reposting a collection on Flickr on use of the collection in a library-run institutional repository (IR). The results are inconclusive, but the paper provides background on the topic and guidance for future efforts. INTRODUCTION Inspired by the need to provide relevant resources and make wise use of limited budgets, many libraries measure the use of their collections. From circulation counts and in-library use studies of print materials, to increasingly sophisticated analyses of usage of licensed digital resources, the techniques have changed even as the need for the data has grown. New technologies have simultaneously presented challenges to measuring use, and allowed those measurements to become more accurate and more relevant. In spite of the relative newness of the digital era, “librarians already know considerably more about digital library use than they did about traditional library use in the print environment.”1 ARL’s LibQUAL+,2 one of the most widely- adopted tools for measuring users’ perceptions of service quality, has recently been joined by DigiQUAL and MINES for Libraries. These new StatsQUAL tools3 extend the familiar LibQUAL focus on users into the digital environment. There are tools and studies for seemingly every type of licensed digital content, all with an eye toward better understanding their users and making better-informed collection management decisions. Those same tools and studies for measuring use of library-created digital collections are conspicuous in their absence. Almost two decades into library collection digitization programs, there is not a significant body of literature on measuring use of digitized collections. A number of articles have been written about measuring usage of library websites in general; Arendt and Wagner4 is a recent example. In one of the few studies to specifically measure use of a digitized collection, Herold5 uses Google Analytics to uncover the geographical location of users of a digitized archival image collection. Otherwise, a literature search on usage studies uncovers very little. Less formal communication channels are similarly quiet, and public usage data on digitized collections on library sites is virtually nonexistent. Commercial sites for disseminating and sharing Melanie Schlosser (Schlosser.40@osu.edu) is Digital Publishing Librarian and Brian Stamper (Stamper.10@osu.edu) is Administrative Associate, The Ohio State University Libraries, Columbus, Ohio. mailto:Schlosser.40@osu.edu mailto:Stamper.10@osu.edu INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2012 86 digital media frequently display simple use metrics (image views, for example, or file downloads) alongside content; such features do not appear on digitized collections on library sites. Usage and digitization projects Digitized library collections are created with an eye toward use from their early planning stages. An influential early CLIR publication on selecting collections for digitization written by a Harvard task force6 included current and potential use of the analog and digitized collection as a criterion for selection. The factors to be considered include the quantitative (“How much is the collection used?”) and the qualitative (“What is the nature of the use?”). More than ten years later, Ooghe and Moreels7 find that use is still a criterion for selection of collections to digitize, tied closely to the value of the collection. Facilitating discovery and use of the digitized collection is a major consideration during project development. Payette and Rieger8 is an early example of a study of the needs of users in digital library design. Usability testing of the interface is frequently a component of site design; see Jeng9 for a good overview of usability testing in the digital library environment. Increasing usage of the digitized collection is also a major theme in metadata research and development. Standards such as the Open Archives Initiative’s Protocol for Metadata Harvesting10 and Object Reuse and Exchange11 are meant to allow discovery and reuse of objects in a variety of environments, and the linked data movement promises to make library data even more relevant and reusable in the World Wide Web environment.12 Digital collection managers have also found more radical methods of increasing usage of their collections. Inserting references into relevant Wikipedia articles has become a popular way to drive more users to the library’s site.13 Some librarians have taken the idea a step further and have begun reposting their digital content on third-party sites. The Smithsonian pioneered one reposting strategy in 2008 when they partnered with Flickr, the popular photo-sharing site, to launch Flickr Commons.14 The Commons is a walled garden within Flickr that contains copyright- free images held by cultural heritage institutions such as libraries, archives, and museums. Each partner institution has its own branded space - “photostream” in Flickr parlance - organized into collections and sets. This model aggregates content from different organizations and locates it where users already are, but it still maintains the traditional institution/collection structure. Flickr Commons has been, by all measures, a very successful experiment in sharing collections with users. The Smithsonian,15 the Library of Congress,16 the Alcuin Society,17 and the London School of Economics18 have all written about their experiences with the Commons. Stephens19 and Michel and Tzoc20 give advice on how libraries can work with Flickr, and Garvin21 and Vaughan22 take a broad view of the project and the partners. Another sharing strategy is beginning to emerge, where digital collection curators contribute individual or small groups of images to thematic websites. A recent example is Pets in Collections,23 a whimsical Tumblr photo blog created by the Digital Collections Librarian at Bryn Mawr College. LEARNING TO SHARE: MEASURING USE OF A DIGITIZED COLLECTION ON FLICKR AND IN THE IR| SCHLOSSER AND STAMPER 87 The site’s description states, “Come on - if you work in a library, archive, or museum, you know you’ve seen at least one of these - a seemingly random image of that important person and his dog or a man and a monkey wearing overalls … so now you finally have a place to share them with the world!” The site requires submissions to include only the image and a link back to the institution or repository that houses it, although submitters may include more information if they choose. Although more lighthearted than most traditional library image collections, it still performs the desired function of introducing users to digital collections they may never have encountered otherwise. Clearly, these creative and thoughtful strategies are not dreamed up by digital librarians unconcerned with end use of their collections, so why do stewards of digitized collections so rarely collect, or at least publicly discuss, statistics on the use of their content? The one notable exception to this may shed some light on the matter. Institutional repositories (IRs) have been the one area of non-licensed digital library content where usage statistics are frequently collected and publicized. DSpace,24 the widely-adopted IR platform developed by MIT and Hewlett-Packard, has increasingly sophisticated tools for tracking and sharing use of the content it hosts. Digital Commons,25 the hosted IR solution created by Bepress, provides automated monthly download reports for scholars who use it to archive their content. The development of these features has been driven by the need to communicate value to faculty and administrators. Encouraging participation by faculty has been a major focus of IR managers since the initial ‘build it and they will come’ optimism faded and the challenge of adding another task to already busy faculty schedules became clear.26 Having a clear need (outreach) and a defined audience (participating scholars) has led to a thriving program of usage tracking in the IR community. The lack of an obvious constituency and the absence of pointed questions about use in the digitized collections world have, one suspects, led to the current dearth of measurement tools and initiatives. Still, questions about use do arise, particularly when libraries undertake labor- intensive usability studies or venture into the somewhat controversial landscape of sharing library-created digital objects on third party sites.27 Anecdotally, the thought of sharing library content elsewhere on the web also raises concerns about loss of context and control, as well as a fear of ‘dilution’ of the library’s web presence. “If patrons can use the library’s collections on other sites,” a fellow librarian once exclaimed, “they won’t come to the library’s website anymore!” Without usage data, we cannot adequately answer questions about the value of our projects or the way they impact other library services. Justification for study and research questions There were three major motivations for this project. First, inspired by the success of the Flickr Commons project, we wanted to explore a method for sharing our collections more widely. An image collection and a third-party image-sharing platform were an obvious choice, since image display is not a strength of our DSpace-based repository. Flickr is currently a major presence in INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2012 88 the image sharing landscape, and the existence of the Commons was an added incentive for choosing Flickr as our platform. Second, the collection we selected for the project (described more fully below) is not fully described, and we wanted to take advantage of Flickr’s annotation tools to allow user-generated metadata. Since further description of the images would have required an unusual depth of expertise, we were not optimistic that we would receive much useful data, and in fact we did not. Still, we lost nothing by asking, and gained familiarity with Flickr’s capabilities for metadata capture. The final motivation for the project, and the focus of the study, was the desire to investigate the effect of third-party platform sharing of a local collection on usage of that collection on library sites. The data gathered were meant partly to inform our local practice, but also to address a concern that may hold librarians back from exploring such means of increasing collection usage - the fear that doing so will divert traffic from library sites. We suspected that sharing collections more widely would actually increase usage of the items on library-owned sites, and the study was developed to explore the issue in a rigorous way. The research question for this study was: Does reposting digitized images from a library site to a third-party image sharing site have an effect on usage of the images on the library site? About the study Platforms For the study, the images were submitted to two different platforms - the Knowledge Bank (KB),28 a library-managed repository, and Flickr, a commercial image sharing site. The KB is an institutional repository built on DSpace software with a Manakin (XML-based) user interface. Established in 2005, it holds more than 45,000 items, including faculty and student research, gray literature, institutional records, and digitized library collections. Image collections like the one used in this study make up a small percentage of the items in the repository. In the KB’s organizational structure, the images in the study were submitted as a collection in the library’s community, under a sub-community for the special collection that contributed them. Each image was submitted as an item consisting of one image file and Dublin Core metadata.29 The project originally called for submitting the images to Flickr Commons, but the Commons was not accepting new partners during the study period. Instead, we created a standard Flickr Pro account for the Libraries, while following the Commons guidelines in image rights and settings. In contrast to DSpace’s community/sub-community/collection structure, Flickr images are organized in sets, sets belong to collections, and all images make up the account owner’s photostream. A set was created for the images, with accompanying text giving background information and inviting users to contribute to the description of the images.30 The images were accompanied by the same metadata as the items in the KB, but the files themselves were higher resolution, to take advantage of Flickr’s ability to display a range of sizes for each image. All items in the set were publicly LEARNING TO SHARE: MEASURING USE OF A DIGITIZED COLLECTION ON FLICKR AND IN THE IR| SCHLOSSER AND STAMPER 89 available for viewing, commenting, and tagging, and each image was accompanied by links back to the KB at the item, collection, and repository level. The collection The choice of a collection for the study was limited by a number of factors. First, and most obviously, it needed to be an image collection. Second, it needed to be in the public domain, both to allow our digitization and distribution of the images, and also to satisfy Flickr Commons’ “no known copyright restrictions” requirement.31 This could be accomplished either by choosing a collection whose copyright protections had expired, or by removing restrictions from a collection to which the Libraries owned the rights. Third, the curator of the collection needed to be willing and able to post the images on a commercial site. This required not only an open-minded curator, but also a collection without a restrictive donor agreement or items containing sensitive or private information. Finally, we wanted the collection to be of broad public interest. The collection chosen for the study was a set of 163 photographs from OSU’s Charles H. McCaghy Collection of Exotic Dance from Burlesque to Clubs, held by the Jerome Lawrence and Robert E. Lee Theatre Research Institute.32 The photographs, mainly images of burlesque dancers, were published on cabinet and tobacco cards in the 1890s, putting them solidly in the public domain. Figure 1. "The Devil's Auction," J. Gurney & son (studio). http://hdl.handle.net/1811/47633 (KB), http://www.flickr.com/photos/60966199@N08/5588351865/ (flickr) http://hdl.handle.net/1811/47633 LEARNING TO SHARE: MEASURING USE OF A DIGITIZED COLLECTION ON FLICKR AND IN THE IR| SCHLOSSER AND STAMPER 87 METHODOLOGY Phases The study took place in 2011 and was organized in three ten-week phases. For the first phase (January 31 through April 11), the images were submitted to the KB. The purpose of this phase was to provide a baseline level of usage for the images in the repository. In phase two (April 12 through June 20), half of the images were randomly selected and submitted to Flickr (Group A). The purpose of this phase was to determine what effect reposting would have on usage of items in the repository - both on those images that were reposted, and on other images in the same collection that had not been reposted. In phase three (June 21 through August 29), the rest of the images (Group B) were submitted to Flickr. In this phase, we began publicizing the collection. Publicity consisted of sharing links to the collection on social media and sending emails to scholars in relevant fields via email lists. These efforts led to further downstream publicity on popular and scholarly blogs.33 Data collection The unit of measurement for the study was views of individual images. To understand the notion of a “view,” we must contrast two different ways that an image may be viewed in the Knowledge Bank. Each image in the collection has an individual web page (the item page) where it is presented along with metadata describing it. In addition, from that page a visitor may download and save the image file itself (in this collection, a JPEG). In the former case, the image is an element in a web page, while in the latter it is an image file independent of its web context. Search engines and other sources commonly link directly to such files, so it is not unusual for a visitor to download a file without ever having seen it in context. In light of this, we produced two data sets, one for visits to item pages, and another for file downloads. Depending on one’s interpretation, either could be construed as a “view.” Ultimately there was little distinction in usage patterns between the two types of measure. The data were generated by making use of DSpace’s Apache SOLR-based statistics system, which provides a queryable database of usage events. For each item in the study, we made two queries; one for per-day counts of item page views, and another for per-day counts of image file downloads (called “bitstream” downloads in DSpace parlance.) In both cases, views that came from automated sources such as search engine indexing agents were excluded from our counts. Views of the images in Flickr were noted and used as a benchmark, but were not the focus of the study. Unlike cumulative views, which are tabulated and saved indefinitely, Flickr saves daily view numbers for only thirty days. As a result, daily view numbers for most of the study period were not available for analysis, and the discussion of the trends in the Flickr data is necessarily anecdotal. INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2012 88 RESULTS At the end of the study period, the data showed very little usage of the collection in the repository. This lack of usage was relatively consistent through the three phases of the study, and in rough terms translates to less than one view of each item per day. Of the two ways of measuring an image "view" - either by counting views of the web page where the item can be found or by counting how many times the image file was downloaded - there was little distinction. Knowledge Bank item pages received between 5 and 38 views per item, while files were downloaded between 5 and 34 times. Further, there were no significant differences in number of views received between the first group released to Flickr and the second. KB Item page views Image file downloads min median max min median max Group A (images released to Flickr in Phase II) 5 10 35 5 9 25 Group B (images released to Flickr in Phase III) 6 10 38 4 9 34 Table 1. The items in the study are divided into Group A and Group B, depending on when the images were placed on Flickr. This table shows that both groups received similar traffic over the course of the study, with items having between 5 and 38 views in both groups, with a median of 10 for both, and between 4 and 34 downloads, with a median of 9 for both groups. The items attracted more visitors on Flickr, with the images receiving between 100 and 600 views each. With a few exceptions, the items that appeared towards the beginning of the set (as viewed by a user who starts from the set home page) received more views than items towards its end. This suggests a particular usage pattern - start at the beginning, browse through a certain number of images, and navigate away. A more significant trend in the Flickr data is that most views of the images came after publicity for the collection began (approximately midway through the third phase of the study). Again, the lack of daily usage numbers on Flickr makes it impossible to demonstrate the publicity ‘bump,’ but it was dramatic. We witnessed a similar, if smaller, ‘bump’ in usage of the items in the KB after publicity started. We were also able to identify 65 unique visitors to the KB who came to the site via a link on Flickr, out of 449 unique visitors overall. Of those who came to the KB from Flickr, 31 continued on to other parts of the KB, and the rest left after viewing a single item or image. LEARNING TO SHARE: MEASURING USE OF A DIGITIZED COLLECTION ON FLICKR AND IN THE IR| SCHLOSSER AND STAMPER 89 DISCUSSION With so little data, we cannot reliably answer the primary research question. Reposting certainly does not seem to have lowered usage of the items in the KB, but the numbers of views in all phases were so small as to preclude drawing meaningful conclusions. A larger issue is the fact that much of the usage came immediately following our promotional efforts. This development complicated the research in a number of ways. First, because the promotional emails and social media messages specifically pointed users to the collection in Flickr, it is impossible to know how the use may have differed if the primary link in the promotion had been to the Knowledge Bank. Would the higher use seen on Flickr simply have transferred to the KB? Would the unfamiliarity and non-image-centric interface of the Knowledge Bank have thwarted casual users in their attempt to browse the collection? The centrality of the promotion efforts also suggests that one of the underlying assumptions of the study may have been wrong. This research project was premised on the idea that an openly available collection on a library website will attract a certain number of visitors (number dependent on the popularity and topicality of the subject of the collection) who find the content spontaneously via searching and browsing. Placing that same content on a third-party site could theoretically divert a percentage of those users, who would then never visit the library’s site. The percentage of users diverted would likely depend on how many more users browse the third party site than the library site, as well as the relative position of the two in search rankings. The McCaghy collection should have been a good candidate for this type of use pattern. Flickr is certainly heavily used and browsed, and burlesque, while not currently making headlines, is a subject with fairly broad popular appeal. The fact that users did not spontaneously discover the collection on either platform in significant numbers suggests that this may not be how discovery of library digitized collections works. It is not surprising that email lists and social media should drive larger numbers of users to a collection than happenstance - the power of link curation by trusted friends via informal communication channels is well known. What is surprising is that it was the only significant use pattern in evidence. The primary takeaway is that promotion is key. If we do not promote our collections to the people who are likely to be interested in them, barring a stroke of luck, it is unlikely that they will be found. Anecdotally, promotional efforts are often an afterthought in digital collections work - a pleasant but unnecessary ‘extra.’ In our environment, the repository staff often feel that promotion is the work of the collection owner, who may not think of promoting the collection in the digital environment, nor know how to do so. As a result, users who would benefit from the collections simply do not know they exist. These results also suggest that librarians worried about the consequences of sharing their collections on third party sites may be worrying about the wrong thing. The sheer volume of information on any given topic makes it unlikely that any but the most dedicated researcher will INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2012 90 explore all available sources. Most other users are likely to rely on trusted information sources (traditional media, blogs, social networking sites) to steer them towards the items that are most likely to interest them. Instead of wondering if users will still come to the library’s site if the content is available elsewhere, perhaps we should be asking of our digital collections, “Is anyone using them on any site?” And if the answer is no, the owners and caretakers of those collections should explore ways to bring them to the attention of relevant audiences. CONCLUSION As a usage study of a collection hosted on a library site and a commercial site, this project was not a success. Flawed assumptions and a lack of usable data resulted in an inability to address the primary research question in a meaningful way. However, it does shed light on the questions that motivated it. Are our digitized collections being used? What effect do current methods of sharing and promotion have on that use? Librarians working with digitized collections have fallen behind our colleagues in the print and institutional repository arenas in measuring use of collections, but we have the same needs for usage data. In the current climate of heightened accountability in higher education and publicly funded institutions, we need to demonstrate the value of what we do. We need to know when our efforts to promote our collections are working, and determine which projects have been most successful and merit continued development. And as always, we need to share our results, both formally and informally, with our colleagues. Measuring use of digital resources is challenging, and obtaining accurate usage statistics requires not only familiarity with the tools involved, but also some understanding of the ways in which the numbers can be unrepresentative of actual use. The organizations that do collect usage statistics on their digitized collections should share their methods and their results with others to help foster an environment where such data are collected and used. Next steps in this area could take the shape of further research projects, or simply more visible work collecting usage statistics on digital collections. Of greatest utility to the field would be data demonstrating the relative effectiveness of different methods of increasing use. Do labor-intensive usability studies deliver returns in the form of increased use of the finished site? Which forms of reposting generate the most views? What types of publicity are most effective in bringing users to collections? How does use of a collection change over time? There are also more policy-driven questions to be answered. For example, should further investment in a collection or site be tied to increasing use of low-traffic collections, or capitalizing on success? Differences in topic, format, and audience make it difficult to generalize in this area, but we can begin building a body of knowledge that helps us learn from each other’s successes and failures. LEARNING TO SHARE: MEASURING USE OF A DIGITIZED COLLECTION ON FLICKR AND IN THE IR| SCHLOSSER AND STAMPER 91 REFERENCES 1 Brinley Franklin, Martha Kyrillidou, and Terry Plum. "From Usage to User: Library Metrics and Expectations for the Evaluation of Digital Libraries." In Evaluation of Digital Libraries: An Insight into Useful Applications and Methods, ed. Giannis Tsakonas and Christos Papatheodorou, 17-39. (Oxford: Chandos Publishing, 2009). http://www.libqual.org/publications (accessed February 29, 2012) 2 “LIBQUAL+,” accessed February 29, 2012. http://www.libqual.org/home 3 “StatsQUAL,” accessed February 29, 2012. http://www.digiqual.org/ 4 Julie Arendt and Cassie Wagner. "Beyond Description: Converting Web Site Usage Statistics into Concrete Site Improvement Ideas." Journal of Web Librarianship 4, no. 1 (2010): 37-54. 5 Irene M. H. Herold. "Digital Archival Image Collections: Who are the Users?" Behavioral & Social Sciences Librarian 29, no. 4 (2010): 267-282. 6 Dan Hazen, Jeffrey Horrell, and Jan Merrill-Oldham. Selecting Research Collections for Digitization. (Council on Library and Information Resources, 1998). http://www.clir.org/pubs/reports/hazen/pub74.html (accessed February 29, 2012) 7 Bart Ooghe and Dries Moreels. "Analysing Selection for Digitisation: Current Practices and Common Incentives." D-Lib Magazine 15, no. 9 (2009): 28. http://www.dlib.org/dlib/september09/ooghe/09ooghe.html. 8 Sandra D. Payette and Oya Y. Rieger. "Supporting Scholarly Inquiry: Incorporating Users in the Design of the Digital Library." The Journal of Academic Librarianship 24, no. 2 (1998): 121-129. 9 Judy Jeng. "What is Usability in the Context of the Digital Library and How Can It Be Measured?" Information Technology & Libraries 24, no. 2 (2005): 47-56. 10 “Open Archives Initiative Protocol for Metadata Harvesting,” accessed February 29, 2012. http://www.openarchives.org/pmh/ 11 “Open Archives Initiative Object Reuse and Exchange,” accessed February 29, 2012. http://www.openarchives.org/ore/ 12 Eric Miller and Micheline Westfall. "Linked Data and Libraries." Serials Librarian 60, no. 1&4 (2011): 17-22. 13 Ann M. Lally and Carolyn E. Dunford. “Using Wikipedia to Extend Digital Collections,” D-Lib Magazine 13, no. 5&6 (2007). Accessed February 29, 2012. doi:10.1045/may2007-lally 14 “Flickr: The Commons,” accessed February 29, 2012. http://www.flickr.com/commons/ 15 Martin Kalfatovic, Effie Kapsalis, Katherine Spiess, Anne Camp, and Michael Edson. "Smithsonian Team Flickr: A Library, Archives, and Museums Collaboration in Web 2.0 Space." Archival Science 8, no. 4 (2008): 267-277. http://www.libqual.org/publications http://www.libqual.org/home http://www.digiqual.org/ http://www.clir.org/pubs/reports/hazen/pub74.html http://www.dlib.org/dlib/september09/ooghe/09ooghe.html http://www.openarchives.org/pmh/ http://www.openarchives.org/ore/ http://www.flickr.com/commons/ INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2012 92 16 Josh Hadro. "LC Report Positive on Flickr Pilot." Library Journal 134, no. 1 (2009): 23. 17 Jeremiah Saunders. “Flickr as a Digital Image Collection Host: A Case Study of the Alcuin Society,” Collection Management 33, no. 4 (2008): 302-309. doi: 10.1080/01462670802360387 18 Victoria Carolan and Anna Towlson. "A History in Pictures: LSE Archives on Flickr." ALISS Quarterly 6 (2011): 16-18. 19 Michael Stephens. "Flickr." Library Technology Reports 42, 4 (2006): 58-62. 20 Jason Paul Michel and Elias Tzoc. "Automated Bulk Uploading of Images and Metadata to Flickr." Journal of Web Librarianship 4, no. 4 (10, 2010): 435-448. 21 Peggy Garvin. "Photostreams to the People." Searcher 17, no. 8 (2009): 45-49. 22 Jason Vaughan. "Insights into the Commons on Flickr." Portal: Libraries & the Academy 10, no. 2 (2010): 185-214. 23 “Pets-in-Collections,” accessed February 29, 2012. http://petsincollections.tumblr.com/ 24 “DSpace,” accessed February 29, 2012. http://www.dspace.org/ 25 “Digital Commons,” accessed February 29, 2012. http://digitalcommons.bepress.com/ 26 Dorothea Salo. "Innkeeper at the Roach Motel." Library Trends 57, no. 2 (2008): 98-123. 27 For an example of the type of debate that tends to surround projects like Flickr commons, see http://www.foundhistory.org/2008/12/22/tragedy-at-the-commons/. (accessed February 29, 2012) 28 “The Knowledge Bank,” accessed February 29, 2012. http://kb.osu.edu 29 “Charles H. McCaghy Collection of Exotic Dance from Burlesque to Clubs,” accessed February 29, 2012. http://hdl.handle.net/1811/47556 30 “Charles H. McCaghy Collection of Exotic Dance from Burlesque to Clubs,” accessed February 29, 2012. http://flic.kr/s/aHsjua3BGi 31 “Flickr: The Commons (usage),” accessed February 29, 2012. http://www.flickr.com/commons/usage/ 32 “The Jerome Lawrence and Robert E. Lee Theatre Research Institute,” http://library.osu.edu/find/collections/theatre-research-institute/; “Charles H. McCaghy Collection of Exotic Dance from Burlesque to Clubs,” http://library.osu.edu/find/collections/theatre-research-institute/personal-papers-and- special-collections/charles-h-mccaghy-collection-of-exotic-dance-from-burlesque-to-clubs/; “Loose Women in Tights Digital Exhibit,” http://library.osu.edu/find/collections/theatre- research-institute/digital-exhibits-projects/loose-women-in-tights-digital-exhibit/. Accessed February 29, 2012. http://petsincollections.tumblr.com/ http://www.dspace.org/ http://digitalcommons.bepress.com/ http://www.foundhistory.org/2008/12/22/tragedy-at-the-commons/.%29 http://hdl.handle.net/1811/47556 http://flic.kr/s/aHsjua3BGi http://www.flickr.com/commons/usage/ http://library.osu.edu/find/collections/theatre-research-institute/ http://library.osu.edu/find/collections/theatre-research-institute/personal-papers-and-special-collections/charles-h-mccaghy-collection-of-exotic-dance-from-burlesque-to-clubs/ http://library.osu.edu/find/collections/theatre-research-institute/personal-papers-and-special-collections/charles-h-mccaghy-collection-of-exotic-dance-from-burlesque-to-clubs/ http://library.osu.edu/find/collections/theatre-research-institute/digital-exhibits-projects/loose-women-in-tights-digital-exhibit/ http://library.osu.edu/find/collections/theatre-research-institute/digital-exhibits-projects/loose-women-in-tights-digital-exhibit/ LEARNING TO SHARE: MEASURING USE OF A DIGITIZED COLLECTION ON FLICKR AND IN THE IR| SCHLOSSER AND STAMPER 93 33 For an example of the kind of coverage it received, see http://flavorwire.com/195225/fascinating-photos-of-19th-century-vaudeville-and-burlesque- performers (accessed February 29, 2012) http://flavorwire.com/195225/fascinating-photos-of-19th-century-vaudeville-and-burlesque-performers http://flavorwire.com/195225/fascinating-photos-of-19th-century-vaudeville-and-burlesque-performers 1930 ---- Autocomplete as a Research Tool: A Study on Providing Search Suggestions David Ward, Jim Hahn, and Kirsten Feist INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2012 6 ABSTRACT As the library website and its online searching tools become the primary “branch” many users visit for their research, methods for providing automated, context-sensitive research assistance need to be developed to guide unmediated searching toward the most relevant results. This study examines one such method, the use of autocompletion in search interfaces, by conducting usability tests on its use in typical academic research scenarios. The study reports notable findings on user preference for autocomplete features and suggests best practices for their implementation. INTRODUCTION Autocompletion, a searching feature that offers suggestions for search terms as a user types text in a search box (see figure 1), has become ubiquitous on both larger search engines as well as smaller, individual sites. Debuting as the “Google Suggest” feature in 20041, autocomplete has made inroads into the library realm through inclusion in vendor search interfaces, including the most recent ProQuest interface and in EBSCO products. As this feature expands its presence in the library realm, it is important to understand how patrons include it in their workflow and the implications for library site design as well as for reference, instruction, and other library services. An analysis of search logs from our library federated searching tool reveals both common errors in how search queries are entered, as well as patterns in the use of library search tools. For example, spelling suggestions are offered for more than 29 percent of all searches, and more than half (51 percent) of all searches appear to be for known items.2 Additionally, punctuation such as commas and a variety of correct and incorrect uses of Boolean operators are prevalent. These patterns suggest that providing some form of guidance in keyword selection at the point of search- term entry could improve the accuracy of composing searches and subsequently the relevance of search results. This study investigates student use of an autocompletion implementation on the initial search entry box for a library’s primary federated searching feature. Through usability studies, the authors analyzed how and when students use autocompletion as part of typical library research, asked the students to assess the value and role of autocompletion in the research process, and noted any drawbacks of implementing the feature. Additionally, the study sought to analyze how David Ward (dh-ward@illinois.edu) is Reference Services Librarian, Jim Hahn (jimhahn@illinois.edu) is Orientation Services and Environments Librarian, Undergraduate Library, University of Illinois at Urbana-Champaign. Kirsten Feist (kmfeist@uh.edu) is Library Instruction Fellow, M.D. Anderson Library, University of Houston. INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2012 7 Figure 1. Autocomplete Implementation implementing autocompletion on the front end of a search affected providing search suggestions on the back end (search result pages). LITERATURE REVIEW Autocomplete as a plug-in has become ubiquitous on site searches large and small. Research on autocomplete includes a variety of technical terms that refer to systems using this architecture. Examples include Real Time Query Expansion (RTQE), interactive query expansion, Search-as- you-Type (SayT), query completion, type-ahead search, auto-suggest, and suggestive searching/search suggestions. The principal research concerns for autocomplete include issues related to both back-end architecture and assessments of user satisfaction and systems for specific implementations. Nandi and Jagadish present a detailed system architecture model for their implementation of autocomplete, which highlights many of the concerns and desirable features of constructing an index that the autocomplete will query against.3 They note in particular that the quality of suggestions presented to the user must be high to compensate for the user interface distraction of having suggestions appear as a user types. This concern is echoed by Hanmin et al. in their analysis of how the results offered by their autocomplete implementation met user expectations.4 Their findings emphasize configuring systems to display only keywords that bring about successful searches, noting “precision [of suggested terms] is closely related with satisfaction.” An additional analysis of their implementation also noted that suggesting search facets (or “entity types”) is a way to enhance autocomplete implementations and aid users in selecting suitable keywords for their search.5 Wu also suggests using facets to help group suggestions by type, which improves comprehension of a list of possible keyword combinations.6 In defining important design characteristics for AUTOCOMPLETE AS A RESEARCH TOOL | WARD, HAHN, AND FEIST 8 autocomplete implementations, Wu advocates building in a tolerance for misplaced keywords as a critical component. Chaudhuri and Kaushik examine possible algorithms to use in building this type of tolerance into search systems. Misplaced keywords include typing terms in the wrong field (e.g., an author name in a title field), as well as spelling and word order errors.7 Systems that are tolerant in this manner “should enumerate all the possible interpretations and then sort them according to their possibilities,” a specification Wu refers to as “interpret-as-you-type.”8 Additionally, both Wu and Nandi and Jagadish specify fast response time (or synchronization speed) as a key usability feature in autocomplete interfaces, with Nandi and Jagadish indicating 100ms as a maximum.9,10 Speed also is a concern in mobile applications, which is part of the reason Paek et al. recommend autocomplete as part of mobile search interfaces, in which reducing keystrokes is a key usability feature.11 On the usability end, White and Marchionini12 assess best practices for implementation of search- term-suggestion systems and users’ perceptions of the quality of suggestions and search results retrieved. They find that offering keyword suggestions before the first set of results has been displayed generated more use of the suggestions than displaying them as part of a results page, even though the same terms were displayed in both cases. Providing suggestions at this initial stage also led to better-quality initial queries, particularly in cases where users may have little knowledge of the topic for which they are searching. The researchers also warn that, while presenting “query expansion terms before searchers have seen any search results has the potential to speed up their searching . . . it can also lead them down incorrect search paths.”13 METHOD Usability Study We conducted two rounds of usability testing on a version of University of Illinois at Urbana- Champaign’s Undergraduate Library website that contained a search box for the library’s federated/broadcast search tool with autocomplete built in. The testing followed Nielsen’s guidelines, using a minimum of five students for each round, with iterative changes to the interface made between rounds based on feedback from the first group.14 We conducted the initial round in summer 2011 with five library undergraduate student workers. The second round was conducted in September 2011 and included eight current undergraduate students with no affiliation to the library. By design, this method does not allow us to state definitive trends for all autocomplete implementations. It is not a statistically significant method by quantitative standards—rather, it gives us a rich set of qualitative data about the particular implementation (Easy Search) and specific interface (Undergrad Library homepage) being studied. The study’s questions were approved by the campus institutional review board (IRB), and each participant signed an IRB waiver before participating. Students for the September round were recruited via advertisements on the website and flyers in the library. Gift certificates to a local coffee shop provided the incentive for the study. INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2012 9 The procedure for each interview focused on two steps (see appendix). First, each participant was asked to use the search tool to perform a series of common research tasks, including three queries for known item searches (locating a specific book, journal, and movie), and two searches that asked the student to recall and describe a current or previous semester’s subject-based search, then use the search interface to find materials on that topic. Participants were asked to follow a speak-aloud protocol, dictating the decision-making process they went through as they conducted their search, including noting why they made each choice that they made along the way. Researchers observed and took notes, including transcribing user comments and noting mouse movements, clicks, and other choices made during the searches. Because part of the hypothesis of the study was that the autocomplete feature would be used as an aid for spelling search queries correctly, titles with possibly challenging spelling were chosen for the known item searches. Participants were not told about or instructed in the use of autocomplete; rather, it was left to each of them to discover it and individually decide whether to use it during each of the searches they conducted as a part of the study. In the second part of the interview, researchers asked students questions about their use (or lack thereof) of the autocomplete feature during the initial set of task-based questions. This set of questions focused on identifying when students felt the autocomplete feature was helpful as part of the search process, why they used it when they did, and why they did not use it in other cases. Students also were asked more general questions about ways to improve the implementation of the feature. In the second round of testing (with students from the general campus populace), an additional set of questions was asked to gather student demographic information and to have the participants assess the quality of the choices the autocomplete feature presented them with. These questions were based in part on the work of White and Marchionini, who had study participants conduct a similar quality analysis.15 Autocomplete Implementation The autocomplete feature was JavaScript and based on the jQuery autocomplete plugin (http://code.google.com/p/jquery-autocomplete/). Autocomplete plugins generally pull results either from a set of previous searches on a site or from a set of known products and pages within a site. For the study, the initial dataset used was a list of thousands of previous searches using the library’s Easy Search federated search tool. However, this data proved to be extremely messy and slow to search. In particular, a high number of problematic searches were in the data, including entire citations pasted in, misspelled words, and long natural-language strings. Constructing an algorithm to clean up and make sense of these difficult queries would have required too much time and overhead, so we investigated other sources. Researchers looked at autocomplete APIs for both Bing (http://api.bing.com/osjson.aspx?query=test) and Google (the Suggest toolbar API: http://google.com/complete/search?output=toolbar&q=test). Both worked well and produced AUTOCOMPLETE AS A RESEARCH TOOL | WARD, HAHN, AND FEIST 10 similar relevant results for the test searches. Significantly, the search algorithms behind each of these APIs were able to process the search query into far more meaningful and relevant results than what was achieved through the test implementation using local data. These algorithms also included correcting misspelled words entered by users by presenting correctly spelled results from the dropdown list. We ultimately chose the Google API on the basis of its XML output. FINDINGS The study’s findings were consistent across both rounds of usability testing. Notable themes include using autocomplete to correct spelling on known-item searches (specific titles, authors, etc.), to build student confidence with an unfamiliar topic, to speed up the search process, to focus broad searches, and to augment search-term vocabulary. The study also details important student perceptions about autocomplete that can guide the implementation process in both library systems and instructional scenarios. These student perceptions include themes of autocomplete’s popularity, desire for local resource suggestions, various cosmetic page changes, and user perception of the value of autocomplete to their peers. Spelling “It definitely helps with spelling,” said one student, responding to a prompt of how they would explain the autocomplete feature to friends. Correcting search-term spelling is a key way in which students chose to make use of the autocomplete feature. For known-item searches, all eight students in the second round of testing selected suggestions from auto-complete at least two times out of the three searches conducted. Of those eight students, four (50 percent) used auto-complete every time (three out of three opportunities), and four (50 percent) used it 67 percent of the time (two out of three opportunities). We found that of this latter group who only selected auto-complete suggestions two out of the three opportunities presented, three of them did in fact refer to the dropdown selections when typing their inquiries, but did not actively select these suggestions from the dropdown all three times. In choosing to use autocomplete for spelling correction, one student noted that autocomplete was helpful “if you have an idea of a word but not how it’s spelled.” It is interesting to note, with regard to clicking on the correct spellings, that students do not always realize they are choosing a different spelling than what they had started typing. An example is the search for Journal of Chromatography, which some students started spelling as “Journal of Chormo,” then picked the correct spelling (starting “Chroma”) from the list, without apparently realizing it was different. This is an important theme: if a student does not have an accurate spelling from which to begin, the search might fail, or the student will assume the library does not have any information on the chosen topic. This is particularly true in many current library catalog interfaces, which do not provide spelling suggestions on their search result pages. Locating Known Items INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2012 11 Another significant use of the autocomplete feature was in cases where students were looking for a specific item but had only a partial citation. In one case, a student used autocomplete to find a specific course text by typing in the general topic (e.g., “Africa”) and then an author’s name that the course instructor had recommended. The Google implementation did an excellent job of combining these pieces of information into a list of actual book titles from which to choose. This finding also echoes those of White and Marchioni, who note that autocomplete “improved the quality of initial queries for both known item and exploratory tasks.”16 The study also found this to be an important finding because overall, students are looking for valid starting points in their research (see “Confidence” below), and autocomplete was found to be one way to support finding instructor-approved items in the library. This echoes findings from Project Information Literacy, which shows students typically turn to instructor-sanctioned materials first when beginning research.17 This use case typically arises when an instructor suggests an author or seminal text on a research topic to a student, often with an incomplete or inaccurate title. One participant also mentioned that they wanted the autocomplete feature to suggest primary or respected authors based on the topic they entered. Confidence “[Autocomplete is] an assurance that it [the research topic] is out there . . . you’re not the first person to look for it.”—student participant There were multiple themes related to the concept of user confidence discovered in the study. First, some participants noted that when they see the suggestions provided by autocomplete it verifies that what they are searching is “real”—validating their research idea and giving them the sense that others have been successful previously searching for their topic. When students were asked the source of the autocomplete suggestions, most thought that results were generated based on previous user searches. Their response to this particular question highlighted the notion of “popularity ranking,” in that many were confident that the suggestions presented were a result of popular local queries. In addition, one participant thought that results generated were based on synonyms of the word they typed, while another believed that the results generated were included only if the text typed matched descriptions of materials or topics currently present in the library’s databases. Some students did indicate the similarity of search results to Google suggestions, but they did not make an exact connection between the two. This assumption that the terms are vetted seems to lend authority to the suggestions themselves and parallels the research of Jung et al., who investigated satisfaction based on the connection between user expectations on selecting an autocomplete keyword and results.18 The benefit of autocomplete-provided suggestions in this context was noted even in cases when participants did not explicitly select items from the autocomplete list. Students’ confidence in their own knowledge of a topic also factored into when they used autocomplete. Participants reported that if they knew a topic well (particularly if the topic chosen was one that they had previously completed a paper on), it was faster to just type it in without AUTOCOMPLETE AS A RESEARCH TOOL | WARD, HAHN, AND FEIST 12 choosing a suggestion from the autocomplete suggestion list. One participant also noted that common topics (e.g., “someone’s name and biography”) would also be cases in which they would not use the suggestions. After the first round of usability testing, a question was added to the post–test assessment asking students to rate their confidence as a researcher on a five-point scale. All participants in the second round rated themselves as a four or five out of five. While this confirms findings on student confidence from studies like Project Information Literacy, this assessment question ultimately had no correlation to actual use of autocomplete suggestions during the subject-based research phase of the study. Rather, confidence in the topic itself seemed to be the defining factor in use. Speed The study also showed that speed is a factor in deciding when to use autocomplete functionality. Specifically, autocomplete should be implemented in a way in which they are not perceived as slowing down the search process. This includes having results displayed in a way that is easily ignored if students want to type in an entire search phrase themselves, and having the presentation and selection of search suggestions done in a way that is easy to read and quick to be selected. Autocomplete is perceived as a time-saver when clicking on an item will shorten the amount of typing students need to do. However, some students will ignore autocomplete altogether; they do this when they know what they want, and they feel that speed is compromised if they need to stop and look at the suggestions when they already know what they want to search. In the study, different participants would often cite speed as a reason for both selecting and not selecting an item for the same question, particularly with the known-item searches. This finding indicates that a successful implementation should include both a speedy response (as noted above in Nandi and Jagadish’s research on delivering suggestions within 100ms, Paek et al.’s research on reducing keystrokes, and White and Marchioni’s finding that providing suggested words was “a real time-saver”),19 as well as an interface which does not force users to select an item to proceed, or obscure the typing of a search query. Focusing Topics “It helps to complete a thought.” “[Autocomplete is] extra brainstorming, but from the computer.”— participant responses The above quotes indicate the use of autocomplete as a tool for query formulation and search- term identification, a function closely related to the Association of College and Research Libraries (ACRL) Information Literacy Standard Two, which includes competencies for selecting appropriate search keywords and controlled vocabulary related to a topic.20 This quote also parallels a similar finding from White and Marchioni, 21 who had a user comment that autocomplete “offered words (paths) to go down that I might not have thought of on my own.” The use of autocomplete for scoping and refining a topic also parallels elements of the reference interview, specifically the open and closed questions typically asked to help a student define what INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2012 13 aspects of a topic they are interested in researching. This finding has many exciting implications for how elements and best practices from both classroom instruction and reference methodologies can be injected directly into search interfaces, to aid students who may not consult with a librarian directly during the course of their research. Autocomplete was used at a lower rate, and in different ways, for subject searching compared to kown-item searching. Three out of eight participants (38 percent) from the second round of testing did not use autocomplete at all for subject-based searching (zero of two opportunities). Five out of eight participants (62 percent) used autocomplete on one of two search opportunities (50 percent). No participants used autocomplete on both of the search opportunities. The stage of research a student was in helped to indicate where and how autocomplete could be useful in topic formulation and search-term selection for subject searches. Participants indicated that they would use autocomplete for narrowing ideas if they were at a later stage in a paper, when they knew more about what they wanted or needed specifics on their topic. However, early in a paper, some participants indicated they just wanted broad information and did not want to narrow possible results too early. This finding also supports previous research from Project Information Literacy, which describes student desire to learn the “big-picture context” as a key function in the early part of the research process.22 At this topic-focusing stage, some participants told us that the search suggestions reminded them of topics that were discussed in class. Further, the study showed that autocomplete suggests aspects of topics to student that they had not previously considered, and one participant indicated that she might change her topic if she saw something interesting from the list of suggestions, particularly something she had not thought of yet. Interface Implementation Though students who opted to utilize the autocomplete feature were generally satisfied with the results generated, some students recommended increasing the number of autocomplete suggestions in the dropdown menu to increase the probability of finding their desired topic or known item or to potentially lead to other related topics to narrow their search. In addition, students recommended increasing the width of the autocomplete text box, as its present proportions are insufficient for displaying longer suggestions without text wrapping. Some students also noted that increasing the height of the dropdown menu containing the autocomplete suggestions might help reduce the necessity to scroll through the results and may help to draw user attention to all results for those who elect not to use the scroll bar. Beyond the suggested improvements for the functionality of the autocomplete feature, students also noted a few cosmetic changes they would like to see implemented. In particular, students would prefer to see larger text and a better use of fonts and font colors when using autocomplete. One student noted that if different fonts and colors were used in this feature, the results generated might stand out more and better attract users, or better draw users’ attention to the recommended search terms. AUTOCOMPLETE AS A RESEARCH TOOL | WARD, HAHN, AND FEIST 14 Perceived Value to Peers Most students who participated in the study stated that they would recommend that their fellow classmates utilize the autocomplete feature for two primary purposes: known-item searches and locating alternative options for research topics. One student noted that she would recommend using this feature to search keywords “easily and efficiently,” while another student indicated that the feature helps to link to other related keywords. This finding also revealed that users were not intimidated by the feature and did not see it as a distraction from the search process, an initial researcher concern. CONCLUSION AND FUTURE DIRECTIONS Implementation Implications Implementing autocomplete functionality that accounts for the observed research tendencies and preferences of users makes for a compelling search experience. Participant selection of autocomplete suggestions varied between the types of searches studied. Spelling correction was the one universally acknowledged use. For subject-based searching, confidence in the topic searched and the stage of research emerged as indicators of the likelihood of autocomplete suggestions being taken. The use and effectiveness of providing subject suggestions requires further study, however. Students expect suggestions to produce usable results within a library’s collections, so the source of the suggestions should incorporate known, viable subject taxonomies to maximize benefits and not lead students down false search paths. There is an ongoing need to investigate possible search-term dictionaries outside of Google, such as lists of library holdings, journal titles, article titles, and controlled vocabulary from key library databases. The “brainstorming” aspect of autocomplete for subject searching is an intriguing benefit that should be more fully explored and supported. In combination with these findings, participant’s positive responses to some of the assessment questions (including first impressions of autocomplete and willingness to recommend it to friends) indicate that autocomplete is a viable tool to incorporate site-wide into library search interfaces. Instruction Implications Traditional academic library instruction tends to focus on thinking of all possible search terms, synonyms, and alternative phrasing before the onset of actual searching and engagement with research interfaces. This process is later refined in the classroom by examining controlled vocabulary within a set of search results. However, observations from this study (as well as researcher experience with users at the reference desk) indicate that students in real-world situations often skip this step and rely on a more trial-and-error method for choosing search terms, beginning with one concept or phrasing rather than creating a list of options that they try sequentially. The implication for classroom practice is that instruction on search-term formulation should include a review of autocomplete suggestions as well as practical methods for integrating these suggestions into the research process. This is particularly important as vendor databases INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2012 15 move toward making autocomplete a default feature. Proper instruction in its use can help advance ACRL Information Literacy goals and provide a practical, context-sensitive way to explain how a varied vocabulary is important for achieving relevant results in a research setting.23 Reference Implications As with classroom instruction, traditional reference practice emphasizes a prescriptive path for research that involves analyzing which aspects of a topic or alternate vocabulary will be most relevant to a search before search-term entry. Open and closed questioning techniques encourage users to think about different facets of their topic, such as time period, location, and type of information (e.g., statistics) that might be relevant. An enhanced implementation of autocomplete can incorporate these best practices from the reference interview into the list of suggestions to aid unmediated searching. One way this might be incorporated is through presenting faceted results that change on the basis of user selection of the type and nature of information they are looking for, such as a time period, format, or subject. For broadcast and federated searching interfaces, this could extend into the results users are then presented with, specifically attempting to use items or databases on the basis of suggestions made during the search entry phase, rather than presenting users with a multitude of options for users to make sense of, some of which may be irrelevant to the actual information need. Finally, the findings on use of autocomplete also have implications for search-results pages. Many of the common uses (e.g., spelling suggestions and additional search-term suggestion) also should be standard on results pages. This, too, is a common feature of commercial interfaces. Bing, for example, includes a Related Searches feature (on the left of a standard results page), that suggests context-specific search terms based on the query. This feature is also part of their API (http://www.bing.com/developers/s/APIBasics.html). Providing these reference-without-a- librarian features is essential both in establishing user confidence in library research tools and in developing research skills and an understanding of the information literacy concepts necessary to becoming better researchers. Our autocomplete use findings draw attention to user needs and library support across search processes; specifically, autocomplete functionality offers support while forming search queries and can improve the results of user searching. For this reason, we recommend that autocomplete functionality be investigated for implementation across all library interfaces and websites to provide unified support for user searches. The benefits that can be realized from autocomplete can be maximized by consulting with reference and instruction personnel on the benefits noted above and collaboratively devising best practices for integrating autocomplete results into search- strategy formulation and classroom-teaching workflows. http://www.bing.com/developers/s/APIBasics.html AUTOCOMPLETE AS A RESEARCH TOOL | WARD, HAHN, AND FEIST 16 REFERENCES 1. “Autocomplete—Web Search Help,” Google, support.google.com/websearch/bin/answer.py?hl=en&answer=106230 (accessed February 7, 2012). 2. William Mischo, internal use study, unpublished, 2011. 3. Arnab Nandi and H. V. Jagadish, “Assisted Querying Using Instant-Response Interfaces,” in Proceedings of the 2007 ACM SIGMOD International Conference on Management of data (New York: ACM, 2007), 1156–58, doi: 10.1145/1247480.1247640. 4. Hanmin Jung et al., “Comparative Evaluation of Reliabilities on Semantic Search Functions: Auto-complete and Entity-centric Unified Search,” in Proceedings of the 5th International Conference on Active Media Technology (Berlin, Heidelberg: Springer-Verlag, 2009), 104–13, doi: 10.1007/978-3-642-04875-3_15. 5. Hanmin Jung et al., “Auto-complete for Improving Reliability on Semantic Web Service Framework,” in Proceedings of the Symposium on Human Interface 2009 on Human Interface and the Management of Information. Information and Interaction. Part II: Held as part of HCI International 2009 (Berlin, Heidelberg: Springer-Verlag, 2009), 36–44, doi: 10.1007/978-3-642- 02559-4_5. 6. Hao Wu,“Search-As-You-Type in Forms: Leveraging the Usability and the Functionality of Search Paradigm in Relational Databases,” VLDB 2010, 36th International Conference on Very Large Data Bases, September 13–17, 2010, Singapore, p. 36–41, www.vldb2010.org/proceedings/files/vldb_2010_workshop/PhD_Workshop_2010/PhD%20Wor kshop/Content/p7.pdf (accessed February 7, 2012). 7. Surajit Chaudhuri and Raghav Kaushik, “Extending Autocompletion to Tolerate Errors,” in Proceedings of the 35th SIGMOD International Conference on Management of Data (New York,: ACM, 2009), 707–18, doi: 10.1145/1559845.1559919,. 8. Wu, “Search-As-You_Type in Forms,” 38. 9. Wu, “Search-As-You-Type in Forms.” 10. Ibid. 11. Tim Paek, Bongshin Lee, and Bo Thiesson, “Designing Phrase Builder: A Mobile Real-Time Query Expansion Interface,” in Proceedings of the 11th International Conference on Human- Computer Interaction with Mobile Devices and Services (New York: ACM, 2009), 7:1–7:10, doi: 10.1145/1613858.1613868. http://support.google.com/websearch/bin/answer.py?hl=en&answer=106230 http://www.vldb2010.org/proceedings/files/vldb_2010_workshop/PhD_Workshop_2010/PhD%20Workshop/Content/p7.pdf http://www.vldb2010.org/proceedings/files/vldb_2010_workshop/PhD_Workshop_2010/PhD%20Workshop/Content/p7.pdf INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2012 17 12. Ryen W. White and Gary Marchionini, “Examining the Effectiveness of Real-Time Query Expansion,” Information Processing and Management 43, no. 3 (2007): 685–704, doi: 10.1016/j.ipm.2006.06.005. 13. White and Marchionini, “Examining the Effectiveness of Real-Time Query Expansion,” 701. 14. Jakob Nielsen, “Why You Only Need to Test with 5 Users,” Jakob Nielsen’s Alertbox (blog), March 19, 2000, www.useit.com/alertbox/20000319.html (accessed February 7, 2012). See also Walter Apai, “Interview with Web Usability Guru, Jakob Nielsen,” Webdesigner Depot (blog), September 28, 2009, www.webdesignerdepot.com/2009/09/interview-with-web-usability-guru- jakob-nielsen/ (accessed February 7, 2012). 15. White and Marchionini, “Examining the Effectiveness of Real-Time Query Expansion.” 16. Ibid. 17. Alison J. Head and Michael B. Eisenberg, “Lessons Learned: How College Students Seek Information in the Digital Age,” Project Information Literacy Progress Report, December 1, 2009, projectinfolit.org/pdfs/PIL_Fall2009_finalv_YR1_12_2009v2.pdf (accessed February 7, 2012). 18. Jung et al., “Comparative Evaluation of Reliabilities on Semantic Search Functions.” 19. Jung et al., “Comparative Evaluation of Reliabilities on Semantic Search Functions”; Paek, Lee, and Thiesson, “Designing Phrase Builder”; White and Marchionini, “Examining the Effectiveness of Real-Time Query Expansion.” 20. Association of College and Research Libraries (ACRL), “Information Literacy Competency Standards for Higher Education,” http://www.ala.org/acrl/standards/informationliteracycompetency (accessed February 7, 2012). 21. White and Marchionini, “Examining the Effectiveness of Real-Time Query Expansion.” 22. Head and Eisenberg, “Lessons Learned.” 23. Association of College and Research Libraries (ACRL), “Information Literacy Competency Standards for Higher Education.” http://www.useit.com/alertbox/20000319.html http://www.webdesignerdepot.com/2009/09/interview-with-web-usability-guru-jakob-nielsen/ http://www.webdesignerdepot.com/2009/09/interview-with-web-usability-guru-jakob-nielsen/ http://projectinfolit.org/pdfs/PIL_Fall2009_finalv_YR1_12_2009v2.pdf http://www.ala.org/acrl/standards/informationliteracycompetency AUTOCOMPLETE AS A RESEARCH TOOL | WARD, HAHN, AND FEIST 18 APPENDIX. Questions Task-Based Questions 1. Does the library have a copy of “The Epic of Gilgamesh?” 2. Does the library own the movie “Battleship Potempkin?” 3. Does the library own the journal/article “Journal of Chromatography?” 4. For this part, we would like you to imagine you are doing research for a recent paper, either one you have already completed or one you are currently working on. a. What is this paper about? (What is your research question?) b. What class is it for? c. Search for an article on YYY 5. Same as 4, but different class/topic, and search for a book on YYY Autocomplete-Specific Questions 1. What is your first impression of the autocomplete feature? 2. Have you seen this feature before? a. If so where have you used it? 3. Why did you/did you not use the suggested words? (words in the dropdown) 4. Where do you think the suggestions are coming from? Or, How are they being chosen? 5. When would you use this? 6. When would you not use it? 7. How can it be improved? 8. Overall, what do you like/not like about this option? 9. Would you suggest this feature to a friend? 10. If you were to explain this feature to a friend how might you explain it to them? Assessment and Demographic Questions Autocomplete Feature 1. [KNOWN ITEM] Rate the quality/appropriateness of each of the first five autocomplete dropdown suggestions for your search: (5 point scale) 1—Poor Quality/Not Appropriate 2—Low Quality 3—Acceptable 4—Good Quality –5—High Quality/Very Appropriate INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2012 19 2. [SUBJECT/TOPIC SEARCH] Rate the quality/appropriateness of each of the first five autocomplete dropdown suggestions for your search: (5 point scale) 1—Poor Quality/Not Appropriate 2—Low Quality –3—Acceptable 4—Good Quality –5—High Quality/Very Appropriate 3. Please indicate how strongly you agree or disagree with the following statement: “The autocomplete feature is useful for narrowing down a research topic.” (5 point scale): 1—Strongly Disagree 2—Disagree –3—Undecided –4—Agree –5—Strongly Agree Demographics 1. Please indicate your current class status a.  Freshman b.  Sophomore c.  Junior d.  Senior 2. What is your declared or anticipated major? 3. Have you had a librarian come talk to one of your classes or give an instruction session in one of your classes? If yes, which class(es)? 4. Please rate your overall confidence level when beginning research for classes that require library resources for a paper or assignment. (5 point scale): 1—No Confidence 2—Low Confidence 3—Reasonable Confidence 4—High confidence –5—Very High Confidence 5. What factors influence your confidence level when beginning research for classes that require library resources for a paper or assignment? 1941 ---- Information Retrieval Using a Middleware Approach Danijela Boberić Krstićev INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2013 54 ABSTRACT This paper explores the use of a mediator/wrapper approach to enable the search of an existing library management system using different information retrieval protocols. It proposes an architecture for a software component that will act as an intermediary between the library system and search services. It provides an overview of different approaches to add Z39.50 and Search/Retrieval via URL (SRU) functionality using a middleware approach that is implemented on the BISIS library management system. That wrapper performs transformation of Contextual Query Language (CQL) into Lucene query language. The primary aim of this software component is to enable search and retrieval of bibliographic records using the SRU and Z39.50 protocols, but the proposed architecture of the software components is also suitable for inclusion of the existing library management system into a library portal. The software component provides a single interface to server-side protocols for search and retrieval of records. Additional protocols could be used. This paper provides practical demonstration of interest to developers of library management systems and those who are trying to use open-source solutions to make their local catalog accessible to other systems. INTRODUCTION Information technologies are changing and developing very quickly, forcing continual adjustment of business processes to leverage the new trends. These changes affect all spheres of society, including libraries. There is a need to add new functionality to existing systems in ways that are cost effective and do not require major redevelopment of systems that have achieved a reasonable level of maturity and robustness. This paper describes how to extend an existing library management system with new functionality supporting easy sharing of bibliographic information with other library management systems. One of the core services of library management systems is support for shared cataloging. This service consists of the following activities: a librarian when processing a new bibliographical unit first checks whether the bibliographic unit has already been recorded in another library in the world. If it is found, then the librarian stores that electronic records to his/her local database of bibliographic records. In order to enable those activities, it is necessary that standard way of communication between different library management systems exists. Currently, the well-known standards in this area are Z39.501 and SRU.2 Danijela Boberić Krstićev (dboberic@uns.ac.rs) is a member Department of Mathematics and Informatics, Faculty of Sciences, University of Novi Sad, Serbia. mailto:dboberic@uns.ac.rs INFORMATION RETRIEVAL USING A MIDDLEWARE APPROACH | KRSTIĆEV 55 In this paper, a software component that integrates services for retrieval bibliographic records using the Z39.50 and SRU standard is described. The main purpose of that component is to encapsulate server sides of the appropriate protocols and to provide a unique interface for communication with the existing library management system. The same interface may be used regardless of which protocols are used for communication with the library management system. In addition, the software component acts as an intermediary between two different library management systems. The main advantage of the component is that it is independent of library management system with which it communicates. Also, the component could be extended with new search and retrieval protocols. By using the component, the functionality of existing library management systems would be improved and redevelopment of the existing system would not be necessary. It means that the existing library management system would just need to provide an interface for communication with that component. That interface can even be implemented as an XML web service. Standards Used for Search and Retrieval The Z39.50 standard was one of the first standards that defined a set of services to search for and retrieve data. The standard is an abstract model that defines communication between the client and server and does not go into details of implementation of the client or server. The model defines abstract prefixes used for search that do not depend on the implementation of the underlying system. It also defines the format in which data can be exchanged. The Z39.50 standard defines query language type-1, which is required when implementing this standard. The Z39.50 standard has certain drawbacks that new generation of standards, like SRU, is trying to overcome. SRU tries to keep functionality defined by Z39.50 standard, but to allow its implementation using current technologies. One of the main advantages of the SRU protocol, as opposed to Z39.50, is that it allows messages to be exchanged in a form of XML documents, which was not the case with the Z39.50 protocol. The query language used in SRU is called Contextual Query Language (CQL).3 The SRU standard has two implementations, one in which search and retrieval is done by sending messages via the HyperText Transfer Protocol (HTTP) GET and POST methods (SRU version) and the other for sending messages using the Simple Object Access Protocol (SOAP) (SRW version). The main difference between SRU and SRW is in the way of sending messages.4 The SRW version of the protocol packs messages in the SOAP Envelope element, while the SRU version of the protocol sends messages based on parameter/value pairs that are included in the URL. Another difference between the two versions is that the SRU protocol for messages transfer uses only HTTP, while SRW, in can use Secure Shell (SSH) and Simple Mail Transfer Protocol (SMTP), in addition to HTTP. INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2013 56 RELATED WORK A common approach for adding SRU support to library systems, most of which already support, the Z39.50 search protocol,5 has been to use existing software architecture that supports the Z39.50 protocol. Simultaneously supporting both protocols is very important because individual libraries will not decide to move to the new protocol until it is widely adopted within the library community. One approach in the implementation of a system for retrieval of data using both protocols is to create two independent server-side components for Z39.50 and SRU, where both software components access a single database. This approach involves creating a server implementation from the scratch without the utilization of existing architectures, which could be considered a disadvantage. Figure 1. Software Architecture of a System with Separate Implementations of Server- Side Protocols This approach is good if there is an existing Z39.50 or SRU server-side implementation, or if there is a library management system, for example, that supports just the Z39.50 protocol, but has open programming code and allows changes that would allow the development of an SRU service. The system architecture that is based on this approach is shown in Figure 1 as a Unified Modeling Language (UML) component diagram. In this figure, the software components that constitute the implementation of the client and the server side for each individual protocol are clearly separated, while the database is shared. The main disadvantage of this approach is that adding support for new search and retrieval protocols requires the transformation of the query language supported by that new protocol into the query language of target system. For example, if the existing library management system uses a relational database to store bibliographic records, for every a new protocol added, its query language must be transformed into the Structured Query Language (SQL) supported by the database. Z39.50 server side SRU server side database Z39.50 client side SRU client side Zservice SRUservice JDBC INFORMATION RETRIEVAL USING A MIDDLEWARE APPROACH | KRSTIĆEV 57 However, in most commercial library management systems that support server-side Z39.50, local development and maintenance of additional services may not be possible due to the closed nature of the systems. One of the solutions in this case would be to create a so-called “gateway” software component that implements both an SRU server and a Z39.50 client, used to access the existing Z39.50 server. That is, if a SRU client's application sends search request, the gateway will accept that request, transform it into the Z39.50 request and forward the request to the Z39.50 server. Similarly, when the gateway receives a response from the Z39.50 server, the gateway will transform this response in SRU response and forward it to the client. In this way, the client will have the impression that communicates directly with the SRU server, while the existing Z39.50 server will think that it sends response directly to the Z39.50 client. Figure 2 presents a component diagram that represents the architecture of the system that is based on this approach. Figure 2. Software Architecture of a System with a Gateway The software architecture shown in the Figure 2 is one of the most common approaches and is used by the Library of Congress (LC),6 which uses the commercial Voyager7 library information system, which allows searching by the Z39.50 protocol. In order to support search of the LC database using SRU, IndexData8 developed the YazProxy software component,9 which is an SRU- Z39.50 gateway. The same idea10 was used in the implementation of the "The European Library”11 database SRU client side JDBC gateway SRU server side Z39.50 client side SRUtoZ3950Converter Zservice Z39.50 server side SRUservice INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2013 58 portal, which aims to provide integrated access to the major collections of all the European national libraries. Another interesting approach in designing software architecture for systems dealing with retrieval of information can be observed in the systems involved in searching heterogeneous information sources. The architecture of these systems is shown in Figure 3. The basic idea in most of these systems is to provide the user with a single interface to search different systems. This means that there is a separate component that will accept a user query and transform it into a query that is supported by the specific system component that offers search and data retrieval. This component is also known as a mediator. A separate wrapper component must be created for each system to be searched, to convert the user's query to a query that is understood by the particular target system.12 Figure 3. Architecture with the Mediator/Wrapper Approach Figure 3 shows a system architecture that enables communication with three different systems (system1, system2 and systemN), each of which may use a different query language and therefore need different wrapper components (wrapper1, wrapper2 and wrapperN ). In this architecture, each system can be a new mediator component that will interact with other systems. That is, the wrapper component can communicate with the system or with another mediator. The role of the mediator is to accept the request defined by the user and send it to all wrapper components. The wrapper components know how to transform the query that is sent by a mediator into a query that is supported by the target system with which the wrapper communicates. In addition, the wrapper has to transform data received from the target system in a format prescribed by the mediator. Communication between client applications and the mediator client mediator system1 system2 systemN wrapper1 wrapper2 wrapperN converter1 concrete query languageNconcrete query language2concrete query language1 converter2 converterN uniform query language INFORMATION RETRIEVAL USING A MIDDLEWARE APPROACH | KRSTIĆEV 59 may be through one of the protocols for search and retrieval of information, for example through the SRU or Z39.50 protocols, or it may be a standard HTTP protocol. Systems in which the architecture is based on the mediator/wrapper approach are described in several papers. Coiera et al (2005)13 describe the architecture of a system that deals with the federated search of journals in the field of medicine, using the internal query language Unified Query Language (UQL). For each information source with which the system communicates, a wrapper was developed to translate queries from UQL into the native query language of the source. The wrapper also has the task of returning search results to the mediator. Those results are returned as an XML document, with a defined internal format called a Unified Response Language (UReL). As an alternative to using particular defined languages (UQL and UReL), a CQL query language and the SRU protocol could be used. Another example of the use of mediators is described by Cousins and Sanders (2006),14 who address the interoperability issues in cross-database access and suggest how to incorporate a virtual union catalogue into the wider information environment through the application of middleware, using the Z39.50 protocol to communicate with underlying sources. Software Component for Services Integration This paper describes a software component that would enable the integration of services for search and retrieval of bibliographic records into an existing library system. The main idea is that the component should be modular and flexible in order to allow the addition of new protocols for search and easy integration into the existing system. Based on the papers analyzed in the previous section, it was concluded that a mediator/wrapper approach would work best. The architecture of system that would include the component and that would allow search and retrieval of bibliographic records from other library systems is shown in Figure 4. Z39.50 client SRU client library information system RecordManager intermediary mediator wrapper Z39.50 server SRU server INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2013 60 Figure 4. Architecture of System for Retrieval of Bibliographic Records In Figure 4, the central place is occupied by the intermediary component, which consists of a mediator component and a wrapper component. This component is an intermediary between the search service and an existing library system. The library system provides an interface (RecordManager) which is responsible for returning records that match the received query. Figure 4 also shows the components that are client applications that use specific protocols for communication (SRU and Z39.50), as well as the components that represent the server-side implementation of appropriate protocols. This paper will not describe the architecture of components that implement the server side of the Z39.50 and SRU protocols, primarily because there are already a lot of open-source solutions15 that implement those components and can easily be connected with this intermediary component. In order to test the intermediary component, we used the server side of the Z39.50 protocol developed through the JAFER project16 ; for the SRU server side, we developed a special web service in the Java programming language. In further discussion, it is assumed that the intermediary component receives queries from server-side Z39.50 and SRU services, and that this component does not contain any implementation of these protocols. The mediator component, which is part of the intermediary component, must accept queries sent by the server-side search and retrieval services. The mediator component uses its own internal representation of queries, so it is therefore necessary to transform received queries into the appropriate internal representation. After that, the mediator will establish communication with the wrapper component, which is in charge of executing queries in existing library system. The basic role of the wrapper component is to transform queries received from the mediator into queries supported by library system. After executing the query, the wrapper sends search results as an XML document to the mediator. Before sending those results to server side of protocol, the mediator must transform those results into the format that was defined by the client. Mediator software component The mediator is a software component that provides a unique interface for different client applications. In this study, as shown in Figure 4, a slightly different solution was selected. Instead of the mediator communicating directly with the client application, which in the case of protocols for data exchange is client side of that protocol, it actually communicates with the server components that implement the appropriate protocols, and the client application exchanges messages with the corresponding server-side protocol. The Z39.50 client exchanges messages with the appropriate Z39.50 server, and it communicates with the mediator component. A similar process is done when communication is done using the SRU protocol. What is important to emphasize is that the Z39.50 and SRU servers communicate with the mediator through a unified user interface, represented in Figure 5 by class MediatorService. In this way the same method is used to submit the query and receive results, regardless of which protocol is used. That means INFORMATION RETRIEVAL USING A MIDDLEWARE APPROACH | KRSTIĆEV 61 that our system becomes more scalable and that it is possible to add some new search and retrieval protocols without refactoring the mediator component. Figure 5 shows the UML class diagram that describes the software mediator component. The MediatorService class is responsible for communication with the server-side Z39.50 and SRU protocols. This class accepts queries from the server side of protocols and returns bibliographic records in the format defined by the server. The mediator can accept queries defined by different query languages. Its task is to transform these queries to an internal query language, which will be forwarded to the wrapper component. In this implementation, accepted queries are transformed into an object representation of CQL, as defined by the SRU standard. One of the reasons for choosing CQL is that concepts defined in the Z39.50 standard query language can be easily mapped to the corresponding concepts defined by CQL. CQL is semantically rich, so can be used to create various types of queries. Also, because it is based on the concept of context set, it is extensible and allows usage of various types of context sets for different purposes. So, CQL is not just limited to the function of searching bibliographic material. It could, for example, be used for searching geographical data. Accordingly, it was assumed that CQL is a general query language and that probably any query language could be transformed into it. In this implementation, the object model of CQL query defined in project CQL- Java17 was used. In the case that there is a new query language, it would be necessary to perform mapping of the new query language into CQL or to extend the object model of CQL with new concepts. This implementation of the mediator component could transform two different types of queries into the CQL object model. Currently, it can transform type-1 queries (used by Z39.50) and CQL queries into CQL object representation. To to add a new query language, it would just be necessary to add a new class that would implement the interface QueryConverter shown in Figure 5, but the architecture of component mediator remains the same. One task of the mediator component is to return records in the format that was defined by the client that sent the request. INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2013 62 Figure 5. UML Class Diagram of Mediator Component As the mediator communicates with the Z39.50 and SRU server side, the task of the Z39.50 and SRU server side will be to check whether the format that the client requires is supported by the underlying system. If it is not supported, the request is not sent to mediator. Otherwise, the mediator ensures the transformation of retrieved records into the chosen format. The mediator obtains bibliographic records from the wrapper in the form of an XML document that is valid according to the appropriate XML schema.18 The XML schema allows the creation of an XML document describing bibliographic records according to the UNIMARC19 or MARC2120 format. The current implementation of the mediator component supports transformation of bibliographic records into an XML document that can be an instance of the UNIMARCslim XML schema,21 the MARC21slim XML schema,22 or the Dublin Core XML schema.23 Adding support for a new format would require creating a new class that would extend the class RecordSerializer (Figure 5). Because this mediator component works with XML, the transformation of bibliographic records into a new format also could be done by using Exstensible Stylesheet Language Transformations (XSLT). 0..11..1 0..1 1..* 0..1 0..1 MediatorService + getRecords (Object query, String format) : String[] Wrapper + executeQuery (CQLNode cqlQuery) : String[] CQLStringConverter + parseQuery (Object query) : CQLNode RPNConverter + parseQuery (Object query) : CQLNode QueryConverter + parseQuery (Object query) : CQLNode Marc21Serializer + serialize (String r) : Sting DublinCoreSerializer + serialize (String r) : Sting UnimarcSerializer + serialize (String r) : Sting RecordSerialize + serialize (String r) : Sting INFORMATION RETRIEVAL USING A MIDDLEWARE APPROACH | KRSTIĆEV 63 Wrapper software component The wrapper software component is responsible for ensuring communication between the mediator and the existing library system. That is, the wrapper component is responsible for transforming the CQL object representation into a concrete query that is supported by the existing library system and for obtaining results that match the query. Implementation of the wrapper component directly depends on the architecture of the existing library system. Figure 7 proposes a possible architecture of the wrapper component. This proposed architecture assumes that the existing library system provides some kind of service that will be used by the wrapper component to send the query and obtain results. The RecordManager interface in Figure 7 is an example of such a service. RecordManager has two operations, one which executes the query and returns the number of hits and the second operation which returns bibliographic records. This proposed solution is useful for libraries that use a library management system that can be extended. It may not be appropriate for libraries using an “off the self” library management system that cannot be extended. The proposed architecture of the wrapper component is based on a strategy design pattern,24 primarily because of the need for transformation of the CQL query into a query that is supported by the library system. According to the CQL concept of context sets, all prefixes that can be searched are grouped in context sets, and these sets are registered with the Library of Congress. The concept of context sets enables specific communities and users to define their own prefixes, relations, and modifiers without fear that their name will be identical to the name of prefix defined in another set. That is, it is possible to define two prefixes with the same name, but they belong to different sets and therefore have different semantics. CQL offers the possibility of combining in a single query elements that are defined in different context sets. When parsing a query, it is necessary to check which context set a particular item belongs to and then to apply appropriate mapping of the element from the context set to the corresponding element defined by the query language used in the library system. The strategy design pattern includes patterns that describe the behavior of objects (behavioral patterns), which determine the responsibility of each object and the way in which objects communicate with each other. The main task of a strategy pattern is to enable easy adjustment of the algorithm that is applied by an object at runtime. Strategy pattern defines a family of algorithms, each of which is encapsulated in a single object. Figure 6 is shows a class diagram from the book “Design Patterns: Elements of Reusable Object-Oriented Software,“25 which describes basic elements of strategy patterns. INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2013 64 Figure 6. Strategy Design Pattern The basic elements of this pattern are the classes Context, Strategy, ConcreteStrategyA and ConcreteStrategyB. The class Context is in charge of choosing and changing algorithms in a way that creates an instance of the appropriate class, which implements the interface Strategy. Interface Strategy contains the method AlgorityInterface(), which should implement all classes that implement that interface. Class ConcreteStrategyA implements one concrete algorithm. This design pattern is used when transforming CQL queries primarily because CQL queries can consist of elements that belong to different context sets, whose elements are interpreted differently. Classes Context, Strategy, CQLStrategy and DcStrategy, shown in Figure 7, are elements of strategy pattern responsible for mapping concepts defined by CQL. The class Context is responsible for selection of appropriate strategies for parsing, depending on which context set the element that is going to be transformed belongs to. Class CQLStrategy and DcStrategy are responsible for mapping the elements belonging respectively to the CQL or Dublin Core context set in the appropriate elements of a particular query language used by the library system. The use of strategy pattern makes it possible, in real time, to change the algorithm that will parse the query depending on what context set is used. The described implementation of a wrapper component enables the parsing of queries that contain only elements that belong to CQL and/or the Dublin Core context set. In order to provide support for a new context set, a new implementation of interface Strategy (Figure 7) would be required, including an algorithm to parse the elements defined by this new set. INFORMATION RETRIEVAL USING A MIDDLEWARE APPROACH | KRSTIĆEV 65 Figure 7. UML Class Diagram of Wrapper Component Integration of Intermediary Software Components into the BISIS Library System The BISIS library system was developed at the Faculty of Science and the Faculty of Technical Sciences in Novi Sad, Serbia, and has had several versions since its introduction in 1993. The fourth and current version of the system is based on XML technologies. Among the core functional units of BISIS26 are: • circulation of library material • cataloging of bibliographic records • indexing and retrieval of bibliographic records • downloading bibliographic records through Z39.50 protocol • creation of a card catalog • creation of statistical reports An intermediary software component has been integrated into the BISIS system. The intermediary component was written in the Java programming language and implemented as a web application. Communication between server applications that support the Z39.50 and SRU protocols and the intermediary component is done using the software package Hessian.27 Hessian offers a simple implementation of two protocols to communicate with Web services, a binary protocol and its corresponding XML protocol, both of which rely on HTTP. Use of Hessian package makes it easy to create a Java servlet on the server side and proxy object on client-side, which will be used to 0..1 1..1 0..11..1 0..1 1..1 Context + + + setStrategy (String strategy) mapIndexT oUnderlayingPrefix (String index) parseOperand (String index, CQLT ermNode node) : void : String : Object Strategy + + mapIndexT oUnderlayingPrefix (String index) parseOperand (String underlayingPref, CQLT ermNode node) : String : Object CQLStrategy + + mapIndexT oUnderlayingPrefix (String index) parseOperand (String underlayingPref, CQLT ermNode node) : String : Object DcStrategy + + mapIndexT oUnderlayingPrefix (String index) parseOperand (String underlayingPref, CQLT ermNode node) : String : Object RecordManager + + select (Object query) getRecords (int hits[]) : int[] : String[] Wrapper + - executeQuery (CQLNode cqlQuery) makeQuery (CQLNode cql, Object underlayingQuery) : String[] : Object INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2013 66 communicate with the servlet. In this case, the proxy object is deployed on the server side of protocol and the intermediary component contains a servlet. Communication between the intermediary and BISIS is also realized using the Hessian software package, which leads to the possibility of creating a distributed system because the existing library system, the intermediary component, and server applications that implement the protocols can be located on physically separate computers. The BISIS library system uses the Lucene software package for indexing and searching. Lucene has defined its own query language,29 so the wrapper component that is integrated into BISIS has to transform to the CQL query object model the object representation of the query defined by Lucene. Therefore the wrapper first needs to determine to which context set the index belongs and then apply the appropriate strategy for mapping the index. The rules for mapping the index to Lucene fields are read from the corresponding XML document that is defined for every context set. Listing 1 below provides an example of an XML document that contains some rules for mapping indexes of the Dublin Core context set to Lucene index fields. The XML element index represents the name of index which is going to be mapped, while the XML element mappingElement contains the name of Lucene field. For example, the title index defined in the DublinCore context set, which denotes search by title of the publication, is mapped to the field TI, which is used by the search engine of BISIS system. title TI creator AU subject SB Listing 1. XML Document with Rules for Mapping the DublinCore Context Set After the index is mapped to corresponding fields in Lucene, a similar procedure is repeated for a relationship that may belong to some other context set or may have modifiers that belong to some INFORMATION RETRIEVAL USING A MIDDLEWARE APPROACH | KRSTIĆEV 67 other context set. It is therefore necessary to change the current strategy for mapping into a new one. By doing this, all elements of the CQL query are converted into a Lucene query, so the new query can be sent to BISIS to be executed. Approximately 40 libraries in Serbia currently use the BISIS system, which includes a Z39.50 client, allowing the libraries to search the collections of other libraries that support communication through the Z39.50 protocol. By integrating the intermediary component in the BISIS system, non-BISIS libraries may now search the collections of libraries that use BISIS. As a first step, the intermediary component was just integrated in a few libraries, without any major problems. The component is most useful to the city libraries that use system BISIS, because they have many branches, which can now search and retrieve bibliographic records from their central libraries. The component could potentially be used by other library management system, assuming the presence of an appropriate wrapper component to transform CQL to the target query language. CONCLUSION This paper describes an independent, modular software component that enables the integration of a service for search and retrieval of bibliographic records into an existing library system. The software component provides a single interface to server-side protocols to search and retrieve records, and could be extended to support additional server-side protocols. The paper describes the communication of this component with Z39.50 and SRU servers. The software component was developed for integration with the BISIS library system, but is an independent component that could be integrated in any other library system. The proposed architecture of the software component is also suitable for inclusion of the existing library system into a single portal. The architecture of the portal should involve one mediator component whose task would be to communicate with wrapper components of individual library systems. Each library system would implement its own search and store functionalities and could function independently of the portal. The basic advantage of this architecture is that it is possible to include new library systems that provide search services. It is only necessary to add a new wrapper that will perform the appropriate transformation of the query obtained from the mediator component in a query that the library system can process. The task of the mediator is to send queries to the wrapper, while each wrapper can establish communication with a specific library system. After obtaining the results from underlying library system, the mediator should be able to combine results, remove duplicate, and sort results. In this way end user would have impression that he has been searched a single database. REFERENCES 1. “Information Retrieval (Z39.50): Application Service Definition and Protocol Specification,” http://www.loc.gov/z3950/agency/Z39-50-2003.pdf (accessed February 22, 2013). http://www.loc.gov/z3950/agency/Z39-50-2003.pdf INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2013 68 2. “Search/Retrieval via URL,” http://www.loc.gov/standards/sru/. 3. “Contextual Query Language – CQL,” http://www.loc.gov/standards/sru/specs/cql.html. 4. Eric Lease Morgan, "An Introduction to the Search/Retrieve URL Service (SRU),” Ariadne 40 (2004), http://www.ariadne.ac.uk/issue40/morgan. 5. Larry E. Dixson, "YAZ Proxy Installation to Enhance Z39.50 Server Performance,” Library Hi Tech 27, no. 2 (2009): 277-285, http://dx.doi.org/10.1108/07378830910968227; Mike Taylor and Adam Dickmeiss, “Delivering MARC/XML records from the library of congress catalogue using the open protocols SRW/U and Z39.50,” (paper presented at World Library and Information Congress: 71st IFLA General Conference and Council, Oslo, 2005). 6. Mike Taylor and Adam Dickmeiss,“Delivering MARC/XML Records from the Library of Congress Catalogue Using the Open Protocols SRW/U and Z39.50,” (paper presented at World Library and Information Congress: 71st IFLA General Conference and Council, Oslo, 2005). 7. “Voyager Integrated Library System,” http://www.exlibrisgroup.com/category/Voyager. 8. “IndexData,” http://www.indexdata.com/. 9. “YazProxy,” http://www.indexdata.com/yazproxy. 10. Theo van Veen and Bill Oldroyd, “Search and Retrieval in The European Library,” D-Lib Magazine 10, no. 2 (2004), http://www.dlib.org/dlib/february04/vanveen/02vanveen.html.. 11. “Тhe European Library,” http://www.theeuropeanlibrary.org./tel4/. 12. Gio Wiederhold ,“Mediators in the Architecture of Future Information Systems,” Computer 25, no. 3 (1992): 38-49, http://dx.doi.org/10.1109/2/121508. 13. Enrico Coiera, Martin Walther, Ken Nguyen, and Nigel H. Lovell, “Architecture for Knowledge- Based and Federated Search of Online Clinical Evidence,” Journal of Medical Internet Research 7, no. 5 (2005), http://www.jmir.org/2005/5/e52/. 14. Shirley Cousins and Ashley Sanders, “Incorporating a Virtual Union Catalogue into the Wider Information Environment through the Application of Middleware: Interoperability Issues in Cross- database Access,” Journal of Documentation 62, no. 1 (2006): 120-144, http://dx.doi.org/10.1108/00220410610642084. 15. “SRU Software and Tools,” http://www.loc.gov/standards/sru/resources/tools.html; “Z39.50 Registry of Implementators,” http://www.loc.gov/z3950/agency/register/entries.html. 16. “JAFER ToolKit Project,” http://www.jafer.org. 17. “CQL-Java: a free CQL compiler for Java,” http://zing/z3950.org/cql/java/. http://www.loc.gov/standards/sru/ http://www.loc.gov/standards/sru/specs/cql.html http://www.ariadne.ac.uk/issue40/morgan http://dx.doi.org/10.1108/07378830910968227 http://www.exlibrisgroup.com/category/Voyager http://www.indexdata.com/ http://www.indexdata.com/yazproxy http://www.dlib.org/dlib/february04/vanveen/02vanveen.html http://www.theeuropeanlibrary.org./tel4/ http://dx.doi.org/10.1109/2/121508 http://www.jmir.org/2005/5/e52/ http://dx.doi.org/10.1108/00220410610642084 http://www.loc.gov/standards/sru/resources/tools.html http://www.loc.gov/z3950/agency/register/entries.html http://www.jafer.org/ http://zing/z3950.org/cql/java/ INFORMATION RETRIEVAL USING A MIDDLEWARE APPROACH | KRSTIĆEV 69 18. Bojana Dimić, Branko Milosavljević and Dušan Surla,“XML Schema for UNIMARC and MARC 21 formats,” The Electronic Library 28, no. 2 (2010): 245-262, http://dx.doi.org/10.1108/02640471011033611. 19. “UNIMARC formats and related documentation,” http://www.ifla.org/en/publications/unimarc- formats-and-related-documentation. 20. “MARC 21 Format for Bibliographic Data,” http://www.loc.gov/marc/bibliographic/. 21. “UNIMARCSlim XML Schema,” http://www.bncf.firenze.sbn.it/progetti/unimarc/slim/documentation/unimarcslim.xsd. 22. “Marc21Slim XML Schema,” http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd. 23. “DublinCore XML Schema,” http://www.loc.gov/standards/sru/resources/dc-schema.xsd. 24. Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides, Design Patterns: Elements of Reusable Object-Oriented Software (Indianapolis: Addison–Wesley, 1994), 315-323. 25. Ibid. 26. Danijela Boberić and Branko Milosavljević, “Generating Library Material Reports in Software System BISIS,” (Proceedings of the 4th international conference on engineering technologies - ICET, Novi Sad, 2009); Danijela Boberić and Dušan Surla, “XML Editor for Search and Retrieval of Bibliographic Records in the Z39.50 Standard”, The Electronic Library 27, no. 3 (2009): 474-495, http://dx.doi.org/10.1108/02640470910966916 (accessed February 22, 1013); Bojana Dimić and Dušan Surla, “XML Editor for UNIMARC and MARC21 cataloguing,” The Electronic Library 27, no. 3 (2009): 509-528, http://dx.doi.org/10.1108/02640470910966934 (accessed February 22, 2013); Jelena Rađenović, Branko Milosavljеvić and Dušan Surla, “Modelling and Implementation of Catalogue Cards using FreeMarker,” Program: electronic library and information systems 43, no. 1 (2009): 63-76, http://dx.doi.org/10.1108/00330330934110 (accessed February 22, 2013); Danijela Tešendić, Branko Milosavljević and Dušan Surla, “A Library Circulation System for City and Special Libraries”, The Electronic Library 27, no. 1 (2009): 162-186, http://dx.doi.org/10.1108/02640470910934669. 27. “Hessian,” http://hessian.caucho.com/doc/hessian-overview.xtp. 28. Branko Milosavljević, Danijela Boberić, and Dušan Surla, “Retrieval of Bibliographic Records Using Apache Lucene,” The Electronic Library 28, no. 4 (2010): 525-539, http://dx.doi.org/10.1108/02640471011065355. ACKNOWLEDGEMENT The work is partially supported by the Ministry of Education and Science of the Republic of Serbia, through project no. 174023: "Intelligent techniques and their integration into wide-spectrum decision support." http://dx.doi.org/10.1108/02640471011033611 http://www.ifla.org/en/publications/unimarc-formats-and-related-documentation http://www.ifla.org/en/publications/unimarc-formats-and-related-documentation http://www.loc.gov/marc/bibliographic/ http://www.bncf.firenze.sbn.it/progetti/unimarc/slim/documentation/unimarcslim.xsd http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd http://www.loc.gov/standards/sru/resources/dc-schema.xsd http://dx.doi.org/10.1108/02640470910966916 http://dx.doi.org/10.1108/02640470910966934 http://dx.doi.org/10.1108/00330330934110 http://dx.doi.org/10.1108/02640470910934669 http://hessian.caucho.com/doc/hessian-overview.xtp http://dx.doi.org/10.1108/02640471011065355 ABSTRACT 1946 ---- Examining Attributes of Open Standard File Formats for Long-term Preservation and Open Access Eun G.Park and Sam Oh INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2012 44 ABSTRACT This study examines the attributes that have been used to assess file formats in literature and compiles the most frequently used attributes of file formats to establish open-standard file-format- selection criteria. A comprehensive review was undertaken to identify the current knowledge regarding file-format-selection criteria. The findings indicate that the most common criteria can be categorized into five major groups: functionality, metadata, openness, interoperability, and independence. These attributes appear to be closely related. Additional attributes include presentation, authenticity, adoption, protection, preservation, reference, and others. INTRODUCTION File format is one of the core issues in the fields of digital content management and digital preservation. As many different types of file formats are available for texts, images, graphs, audio recordings, videos, databases, and web applications, the selection of appropriate file formats poses an ongoing challenge to libraries, archives, and other cultural heritage institutions. Some file formats appear to be more widely accepted: Tagged Image File Format (TIFF), Portable Document Format (PDF), PDF/A, Office Open XML (OOXML), and Open Document Format (ODF), to name a few. Many institutions, including the Library of Congress (LC), possess guidelines on file format applications for long-term preservation strategies that specify requisite characteristics of acceptable file formats (e.g., they are independent of specific operating systems, are independent of hardware and software functions, conform to international standards, etc.).1 The Format Descriptions database of the Global Digital Format Registry is an effort to maintain a detailed representation of information and sustainability factors for as many file formats as possible (the PRONOM technical registry is another such database).2 Despite these developments, file format selection remains a complex task and prompts many questions that range from a general interest (“Which selection criteria are appropriate?”) to more specific (“Are these international standard file formats sufficient for us to ensure long term preservation and access?” or “How should we define and implement standard file formats in harmony with our local context?”). In this study, we investigate the definitions and features of standard file formats and examine the Eun G. Park (eun.park@mcgill.ca) is Associate Professor, School of Information Studies, McGill University, Montreal, Canada. Sam Oh (samoh@skku.edu) is corresponding author and Professor, Department of Library and Information Science, Sungkyunkwan University, Seoul, Korea. mailto:eun.park@mcgill.ca mailto:samoh@skku.edu INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2012 45 major attributes of assessing file formats. We discuss relevant issues from the viewpoint of open- standard file formats for long-term preservation and open access. BACKGROUND ON STANDARD FILE FORMATS The term file format is generally defined as what “specifies the organization of information at some level of abstraction, contained in one or more byte streams that can be exchanged between systems.”3 According to InterPARES 2, file format is “the organization of data within files, usually designed to facilitate the storage, retrieval, processing, presentation, and/or transmission of the data by software.”4 The PREMIS Data Dictionary for Preservation Metadata observes that, technically, file format is “a specific, pre-established structure for the organization of a digital file or bitstream.”5 In general, file format can be divided into two types: an access format and a preservation format. An access format is “suitable for viewing a document or doing something with it so that users access the on-the-fly converted access formats.”6 In comparison, a preservation format is “suitable for storing a document in an electronic archive for a long period”7; it provides “the ability to capture the material into the archive and render and disseminate the information now and in the future.”8 While the ability to ensure long-term preservation focuses on the sustainability of preservation formats, the document in its access format tends to emphasize that it should be accessible and available by users, presumably all of the time. Many researchers have discussed file formats and long-term preservation in relation to various types of resources. For example, Folk and Barkstrom describe and adopt several attributes of file formats that may affect the long-term preservation of scientific and engineering data (e.g., the ease of archival storage, ease of archival access, usability, data scholarship enablement, support for data integrity, and maintainability and durability of file formats).9 Barnes suggests converting word processing documents in digital repositories, which are unsuitable for long-term storage, into a preservation format.10 The evaluation by Rauch, Krottmaier, and Tochtermann illustrates the practical use of file formats for 3D objects in terms of long-term reliability.11 Others have developed and/or applied numerous criteria in different settings. For instance, Sullivan uses a list of desirable properties of a long-term preservation format to explain the purpose of PDF)/A from an archival and records management prospective.12 Sullivan cites device independence, self-containment, self-describing, transparency, accessibility, disclosure, and adoption as such properties. Rauch, Krottmaier, and Tochtermann’s study applies criteria that consist of technical characteristics (e.g., open specification, compatibility, and standardization) and market characteristics (e.g., guarantee duration, support duration, market penetration, and the number of independent producers). Rog and van Wijk propose a quantifiable assessment method to calculate composite scores of file formats.13 They identify seven main categories of criteria: openness, adoption, complexity, technical protection mechanism, self-documentation, robustness, and dependencies. Sahu focuses on the criteria developed by the UK’s National Archives, which include open standards, ubiquity, stability, metadata support, feature set, EXAMINING ATTRIBUTES OF OPEN STANDARD FILE FORMATS FOR LONG-TERM PRESERVATION AND OPEN ACCESS | PARK AND OH 46 interoperability, and viability.14 A more comprehensive evaluation by the LC reveals three components—technical factors, quality, and functionality—while placing a particular emphasis on the balance between the first two.15 Hodge and Anderson use seven criteria for sustainability, which are similar to the technical factors of the LC study: disclosure, adoption, transparency, self- documentation, external dependencies, impact of patents, and technical protection mechanisms.16 Some institutions adopt another term, standard file formats, to differentiate accepted and recommended file formats from others. According to the DAVID project, “standard file formats owe their status to (official) initiatives for standardizing or to their widespread use.”17 Standard may be too general to specify the elements of file formats. However, there is a recognition that only those file formats accepted and recommended by national or international standard organizations (such as the International Standardization Organization [ISO], International Industry Imaging Association [I3A], WWW Consortium, etc.) are genuine standard file formats. For example, ISO has announced several standard file formats for images: TIFF/IT (ISO 12639:2004), PNG (ISO/IEC 15948:2004), and JPEG 2000 (ISO/IEC 15444:2003, 2004, 2005, 2007, 2008). For document file formats, PDF/A-1 (ISO Standard 19005-1. Document File Format for Long-Term Preservation) is one example. This format is proprietary to maintain archival and records- management requirements and to preserve the visual appearance and migration needs of electronic documents. Office Open XML file format (ISO/IEC 29500–1:2008. Information Technology—Document Description and Processing Languages) is another open standard that can be implemented from Microsoft Office applications on multiple platforms. ODF (ISO/IEC 26300:2006. Information Technology—Open Document Format for Office Applications [OpenDocument] v1.0) is an XML-based open file format. Regardless of ISO-announced standards, some errors in these file formats have been reported. For example, although PDF/A-1 is for long- term preservation of and access to documents, studies reveal that the feature-rich nature of PDF can create difficulties in preserving PDF information over time.18 To overcome the barriers of PDF and PDF/A-1, XML technology seems prevalent for digital resources in archiving systems and digital preservation.19 The digital repository community is treating XML technology as a panacea and converting most of their digital resources to XML. The Netherlands Institute for Scientific Information Service (NISIS) adopts another noteworthy definition of standard file formats. It observes that standard image file formats “are widely accepted, have freely available specifications, are highly interoperable, incorporate no data compression and are capable of supporting preservation metadata.”20 This definition implies specific and advanced ramifications for cost-free interoperability and metadata, which closely relates to open access. Open standard is another relevant term to consider in file formats. Although perspectives vary greatly between researchers, open standards can be acquired and used without any barrier or cost.21 In other words, open standard products are free from restrictions, such as patents, and are independent of proprietary hardware or software. Since the 1990s, open standard has been broadly adopted in many fields and is now an almost compulsory feature in information services. INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2012 47 To follow the National Archives’ definition, open standard formats are “formats for which the technical specifications have been made available in the public domain.”22 In comparison, the Folk and Barkstrom approach opens standards from institutional support perspectives, relying on user communities for standards that are widely available and used.23 On a more specific level, Stanescu emphasizes independence as the basic selection criteria for file formats.24 Others, such as Todd, propose determining whether a standard should be more open than others by applying criteria: adoption, platform independence, disclosure, transparency, and metadata support.25 Other factors considered by Todd include reusability and interoperability; robustness, complexity, and viability; stability; and intellectual property (IP) and rights management.26 Echoing the LC, Hodge and Anderson also suggest a list of selection criteria that have been grouped under the banner of “technical factors”: disclosure, adoption, transparency, self-documentation, external dependencies, impact of patents, and technical protection mechanisms.27 Researchers agree that open standard file formats are less obsolete and more reliable than proprietary formats.28 Close examination of the NISIS definition mentioned above reveals that standard file formats are in reality not free, nor do they allow unrestricted access to resources. The three file formats that ISO has announced (PDF/A, OOXML, and ODF) are proprietary and sometimes costly. They also prohibit the purchase of access to a proprietary standard, although there is an assumption that a standard should be free from legal and financial restrictions. The ISO-announced file formats, in short, are only standard file formats, not open standard file formats. For cultural heritage institutions, questions regarding appropriate selection criteria and the sufficiency of existing international standard file formats for long-term preservation and access remain unanswered. There exists neither a uniform method to compare the specifications of different file formats nor an objective approach to assess format specifications that would ensure long-term preservation and persistent access. OBJECTIVES OF THE STUDY In this study, we attempt to better define and establish open-standard file-format-selection criteria. To that end, we assess and compile the most frequently used attributes of file formats to establish open-standard file-format-selection criteria. METHOD We performed a comprehensive review of published articles, institutional reports, and other literature to identify the current knowledge regarding file-format-selection criteria. We included literature that deals with the three standard file formats (PDF, PDF/A, and XML) but excluded the recently announced ODF format due to the scarcity of literature on ODF. Among more than the thirty articles initially reviewed, only twenty-five that use their own clear attributes were included in this study. All of the attributes that we have employed are listed by frequency and grouped according to similarities in meaning (see appendix). The original definitions or descriptions that we used are listed in the second column. The file formats that we assessed by their attributes are EXAMINING ATTRIBUTES OF OPEN STANDARD FILE FORMATS FOR LONG-TERM PRESERVATION AND OPEN ACCESS | PARK AND OH 48 listed in the third column. When we give attributes without specific definitions or descriptions, “no definite term” is inserted. FINDINGS As illustrated in the appendix, the criteria identified by the studies vary. Although the requirements and context of the studies may differ, the most common criteria can be divided into five categories: functionality, metadata, openness, interoperability, and independence. First, functionality refers to the ability of a format to do exactly what it is supposed to be doing.29 It is important to distinguish between two broad uses: preservation of document structure and formatting and preservation of useable content. To preserve document formatting, a “published view” of a given piece of content is critical for distribution. Other content, such as database information or device-specific documents, needs to be preserved as well. Functionality criteria include various attributes related to formats and structure or physical and technical specifications of files (e.g., robustness, feature set, viability, color maintenance, clarity, compactness, modularity, compression algorithms, etc.). Second, metadata indicates that a format allows rich descriptive and technical metadata to be embedded in files. Metadata can be expressed as metadata support, self-documentation (self- documenting), documentation, content-level (as opposed to presentation-level) description, self- describing, self-describing files, formal description of format, etc. Third, openness refers to specifications of a file format that are publicly available and accessible and formats that are not proprietary. Whether seen as a single definition or as a set of criteria, the characteristic that appears to be at the core of the open standard movement is its independence from outside proprietary or commercial control. Openness also may refer to the autonomy of a file format, which relies on several factors. First, the document should be self-contained in terms of the content information (e.g., the text), the structural information (i.e., for those documents that are structured), the formatting information (e.g., fonts, colours, styles, etc.), and the metadata information. Self-containment does not necessarily mean that an archivist will only have one document to deal with. It does mean, however, that they will have documents that will provide them with all the information to access and process the content, structure, formatting, and metadata. Openness is expressed as open availability by some researchers.30 Other researchers adopt the term disclosure for expressing that specification is publicly available.31 Fourth, is the independence of a document from proprietary or commercial hardware and software configurations, especially to prevent any issues resulting from different versions of software, hardware, and operating systems. This aspect is expressed in the appendix as open standards, open source software or equivalent, standard/proprietary, etc. This also closely relates to independence, one of the five categories in the appendix, expressed as device independencies, independent implementations, no external dependency, no external dependencies, portability, and monitoring obsolescence. Having documents in a proprietary format controlled by a third party INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2012 49 implies that, at one time or another, this format may no longer be supported, or that a change in the user agreement may lead to restricted access, access to outdated material, or patent and copyright issues. This fact means that the document must be freely accessible, without password restrictions or protection, and without any digital rights management scheme. Blocking access to a document with a password can lead to serious problems if the password gets lost. In addition, the size and compactness of the document will influence the selection of a file format. Fifth, interoperability primarily refers to the ability of a file format to be compatible with other formats and to exchange documents without loss of information.32 Specifically, it refers to the ability of a given software to open a document without requiring any special application, plug-in, codec, or proprietary add-on. Adherence to open source standards is usually a good indication of the interoperability of a format. In general, an open standard is released after years of bargaining and agreements between major players. Supervision by an international standard (such as ISO or the W3C) commonly helps propagate the format. In addition to the five categories mentioned above, other attributes are often used. Presentation, authenticity, adoption, protection, preservation and reference are such examples. Among these attributes, authenticity, although this is the seventh in the appendix, is one of the most important attributes in archives and records management. It refers to the ability to guarantee that a file is what it originally was without any corruption or alteration.33 Specific to authenticity is data integrity, which assesses the integrity of the file through an internal mechanism (e.g., PNG files include byte sequences to validate against errors). Another method of validating the authenticity of a document is to look at its traceability,34 that is, the traces left by the original author and those who modified or opened a file. One example is the difference between the creation date, modification date, and access date of any file on a personal computer. These three dates correspond to a moment when someone (often a different person each time) opened the file. Other mechanisms may require log information, which is external to the file. Another good indication of authenticity is the stability of a format.35 A format that is widely used is more likely to be stable. A stable format is also more likely to cause less data loss and corruption; hence it is a better indicator of authenticity. Presentation includes attributes related to presenting and rendering data, expressed as distributing a page image, normal rendering, self-containment, self- contained, and beyond normal rendering. Adoption indicates how popular and widely a file format is adopted by user communities, also represented as popularity, widely used formats, ubiquity, or continuity. Protection includes the technical protection mechanism or source verification to protect with security skills. Preservation means long-term preservation, institutional support, or ease of transformation and preservation. Reference indicates citability, or referential extensibility. Among other attributes, transparency is interesting to note because it indicates the degree to which files are open to direct analysis with basic tools and human readability. Another important aspect across these criteria is that the terminologies used in the studies may be quite different yet describe the same or similar concepts from different angles. For instance, Rog and van Wijk use openness for standardization and specification without restrictions,36 while EXAMINING ATTRIBUTES OF OPEN STANDARD FILE FORMATS FOR LONG-TERM PRESERVATION AND OPEN ACCESS | PARK AND OH 50 several other researchers use open availability to convey the same thing.37 They in turn adopt the term disclosure to express that specification is publicly available.38 DISCUSSION AND CONCLUSION Functionality, metadata, openness, interoperability, and independence appear to be the most important factors when selecting file formats. When file formats for long-term preservation and open access are under discussion, cultural heritage institutions need to consider many issues. Despite several efforts, it is still tricky for them to identify the most appropriate file format or even to discern acceptable formats from unacceptable formats. Where it is difficult to prevent the creation of a new file format, format selection is not an easy task, both in theory and in practice. It is critical, however, to base the decision on a clear understanding of the purpose for which the document is preserved: access preservation or repurposing preservation. Cultural heritage institutions and digital repository communities need to guarantee long-term preservation of digital resources in selected file formats. Additionally, users find it necessary to have access to digital information in these file formats. Additional consideration involves the level of access users may enjoy (e.g., long-term access, permanent access, open access, persistent access, etc.). When determining international standard file formats, an aspect of open access should be included because it is a well-liked topic. It is necessary to develop a scale or measurement to assess open-standard format specifications to ensure long-term preservation and open access. Identifying which attributes are required to be an open-standard file format and which digital format is most apt for the use and sustainability of long-term preservation is a meaningful task. The outcome of our study provides a framework for appropriate strategies when selecting file formats for long-term preservation and access to digital content. We hope that the criteria described in this study will benefit librarians, preservers, record creators, record managers, archivists, and users. We are reminded of Todd’s remark that “the most important action is to align the recognition and weighting of criteria with a clear preservation strategy and keep them under review using risk management techniques.”39 The question of how to adopt and implement these attributes can only be answered in the local context and decisions of each cultural heritage institution.40 Each institution should consider implementing a file format throughout the entire life cycle of digital resources, with a holistic approach to managerial, technical, procedural, archival, and financial issues for the purpose of long-term preservation and persistent access. The criteria may change over time, as is necessary for any format to adequately serve its purpose. Maintaining its quality may be an ongoing task that cultural heritage institutions should take into account at all times. Even more importantly, cultural heritage institutions need to establish and implement a set of standard guidelines specific to each context for the selection of open-standard file formats. Note: This research was supported by the Sungkyunkwan University Research Fund (2010-2011). INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2012 51 REFERENCES AND NOTES 1. Library of Congress, “Sustainability of Digital Formats: Planning for Library of Congress Collections,” www.digitalpreservation.gov/formats/intro/intro.shtml (accessed November 21, 2011). 2. Global Digital Format Registry, www.gdfr.info (accessed November 17, 2011); The Technical Registry PRONOM, www.nationalarchives.gov.uk/aboutapps/pronom (accessed November 21, 2011). 3. Mike Folk and Bruce R. Barkstrom, “Attributes of File Formats for Long-Term Preservation of Scientific and Engineering Data in Digital Libraries” (paper presented at the Joint Conference on Digital Libraries (JCDL), Houston, TX, May 27–31, 2003), 1, www.larryblakeley.com/Articles/storage_archives_preservation/mike_folk_bruce_barkstrom2 00305.pdf (accessed November 21, 2011). 4. InterPARES 2 Project Glossary, p. 24, www.interpares.org/ip2/ip2_term_pdf.cfm?pdf=glossary (accessed November 21, 2011). 5. PREMIS Editorial Committee, PREMIS Data Dictionary for Preservation Metadata, ver. 2.0, March 2008, p. 195, www.loc.gov/standards/premis/v2/premis-2-0.pdf (accessed November 21, 2011). 6. Ian Barnes, “Preservation of Word Processing Documents,” July 14, 2006, p. 4, http://apsr.anu.edu.au/publications/word_processing_preservation.pdf (accessed November 21, 2011). 7. Ibid. 8. Gail Hodge and Nikkia Anderson, “Formats for Digital Preservation: A Review of Alternatives and Issues,” Information Services & Use 27 (2007): 46. 9. Folk and Barkstrom, “Attributes of File Formats.” 10. Barnes, “Preservation of Word Processing Documents.” 11. Carl Rauch, Harald Krottmaier, and Klaus Tochtermann, “File-Formats for Preservation: Evaluating the Long-Term Stability of File-Formats,” in Proceedings of the 11th International Conference on Electronic Publishing 2007 (Vienna, Austria, June 13–15, 2007): 101–6. 12. Susan J. Sullivan, “An Archival/Records Management Perspective on PDF/A,” Records Management Journal 16, no. 1 (2006): 51–56. 13. Judith Rog and Caroline van Wijk, “Evaluating File Formats for Long-Term Preservation,” 2008, www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/KB_file_format_evaluation_method_2 7022008.pdf (accessed November 21, 2011). http://www.digitalpreservation.gov/formats/intro/intro.shtml http://www.nationalarchives.gov.uk/aboutapps/pronom http://www.larryblakeley.com/Articles/storage_archives_preservation/mike_folk_bruce_barkstrom200305.pdf http://www.larryblakeley.com/Articles/storage_archives_preservation/mike_folk_bruce_barkstrom200305.pdf http://www.interpares.org/ip2/ip2_term_pdf.cfm?pdf=glossary http://www.loc.gov/standards/premis/v2/premis-2-0.pdf http://apsr.anu.edu.au/publications/word_processing_preservation.pdf http://www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/KB_file_format_evaluation_method_27022008.pdf http://www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/KB_file_format_evaluation_method_27022008.pdf EXAMINING ATTRIBUTES OF OPEN STANDARD FILE FORMATS FOR LONG-TERM PRESERVATION AND OPEN ACCESS | PARK AND OH 52 14. D. K. Sahu, “Long Term Preservation: Which File Format to Use” (paper presented in Workshops on Open Access & Institutional Repository, Chennai, India, May 2–8, 2004), http://openmed.nic.in/1363/01/Long_term_preservation.pdf (accessed November 21, 2011). 15. CENDI Digital Preservation Task Group, “Formats for Digital Preservation: A Review of Alternatives and Issues,” www.cendi.gov/publications/CENDI_PresFormats_WhitePaper_03092007.pdf (accessed November 21, 2011). 16. Hodge and Anderson, “Formats for Digital Preservation.” 17. DAVID 4 Project (Digital ArchiVing, guIdeline and aDvice 4), “Standards for Fileformats,” 1, www.expertisecentrumdavid.be/davidproject/teksten/guideline4.pdf (accessed November 21, 2011). 18. Sullivan, “An Archival/Records Management Perspective on PDF/A”; John Michael Potter, “Formats Conversion Technologies Set to Benefit Institutional Repositories,” http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.124.7881&rep=rep1&type=pdf (accessed November 21, 2011). 19. Eva Müller et al., “Using XML for Long-Term Preservation: Experiences from the DiVA Project,” in Proceedings of the 6th International Symposium on Electronic Theses and Dissertations (May 20–24, 2003): 109–16, https://edoc.hu-berlin.de/conferences/etd2003/hansson- peter/HTML/index.html (accessed November 21, 2011). 20. Rene van Horik, “Image Formats: Practical Experiences” (paper presented in Erpanet Training, Vienna, Austria, May 10–11, 2004), 22, www.erpanet.org/events/2004/vienna/presentations/erpaTrainingVienna_Horik.pdf (accessed November 21, 2011). 21. Open standard is related to open access, which comes from the Open Access movement that allows resources to be freely available to the public and permits any user to use those resources (e.g., mainly electronic journals, repositories, databases, software applications, etc.) without financial, legal, or technical barriers. See Amy E. C. Koehler, “Some Thoughts on the Meaning of Open Access for University Library Technical Services,” Serials Review 32, no. 1 (March 2006): 17–21; Budapest Open Access Initiative, “Read the Budapest Open Access Initiative,” www.soros.org/openaccess/read.shtml (accessed November 21, 2011). 22. National Archives, “Selecting File Formats for Long-term Preservation,” 6, www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/KB_file_format_evaluation_method_2 7022008.pdf (accessed November 21, 2011). 23. Folk and Barkstrom, “Attributes of File Formats.” http://openmed.nic.in/1363/01/Long_term_preservation.pdf http://www.cendi.gov/publications/CENDI_PresFormats_WhitePaper_03092007.pdf http://www.expertisecentrumdavid.be/davidproject/teksten/guideline4.pdf http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.124.7881&rep=rep1&type=pdf https://edoc.hu-berlin.de/conferences/etd2003/hansson-peter/HTML/index.html https://edoc.hu-berlin.de/conferences/etd2003/hansson-peter/HTML/index.html http://www.erpanet.org/events/2004/vienna/presentations/erpaTrainingVienna_Horik.pdf http://www.soros.org/openaccess/read.shtml http://www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/KB_file_format_evaluation_method_27022008.pdf http://www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/KB_file_format_evaluation_method_27022008.pdf INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2012 53 24. Andreas Stanescu, “Assessing the Durability of Formats in a Digital Preservation Environment: The INFORM Methodology,” D-Lib Magazine 10, no. 11 (November 2004), www.dlib.org/dlib/november04/stanescu/11stanescu.html (accessed November 21, 2011). 25. Malcolm Todd, “Technology Watch Report: File Formats for Preservation,” www.dpconline.org/advice/technology-watch-reports (accessed November 21, 2011). 26. Ibid. 27. Hodge and Anderson, “Formats for Digital Preservation.” 28. Edward M. Corrado, “The Importance of Open Access, Open Source, and Open Standards for Libraries,” Issues in Science & Technology Librarianship (Spring 2005), www.library.ucsb.edu/istl/05-spring/article2.html (accessed November 21, 2011); Carl Vilbrandt et al., “Cultural Heritage Preservation Using Constructive Shape Modeling,” Computer Graphics Forum 23, no. 1 (2004): 25–41; Marshall Breeding, “Preserving Digital Information,” Information Today 19, no. 5 (2002): 48–49. 29. Eun G. Park, “XML: Examining the Criteria to be Open Standard File Format,” (paper presented at the InterPARES 3 International Symposium, Oslo, Norway, September 17, 2010), www.interpares.org/display_file.cfm?doc=IP3_isym04_presentation_3–3_korea.pdf (accessed November 21, 2011). 30. Adrian Brown, “Digital Preservation Guidance Note: Selecting File Formats for Long-Term Preservation,” www.nationalarchives.gov.uk/documents/selecting-file-formats.pdf (accessed November 21, 2011); Barnes, “Preservation of Word Processing Documents”; Sahu, “Long Term Preservation”; Potter, “Formats Conversion Technologies.” 31. Stephen Abrams et al., “PDF-A: The Development of a Digital Preservation Standard” (paper presented at the 69th Annual Meeting for the Society of American Archivists, New Orleans, Louisiana, August 14–21, 2005), www.aiim.org/documents/standards/PDF-A.ppt (accessed November 21, 2011); Sullivan, “An Archival/Records Management Perspective on PDF/A”; CENDI, “Formats for Digital Preservation”; and Hodge & Anderson, “Formats for Digital Preservation.” 32. The National Archives, http://www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/KB_file_format_evaluation_me thod_27022008.pdf (accessed November 21, 2011); ECMA International, “Office Open XML File Formats—ECMA-376,” www.ecma-international.org/publications/standards/Ecma-376.htm (accessed November 21, 2011). 33. Christoph Becker et al., “Systematic Characterisation of Objects in Digital Preservation: The Extensible Characterisation Languages,” www.jucs.org/jucs_14_18/systematic_characterisation_of_objects/jucs_14_18_2936_2952_bec ker.pdf (accessed November 21, 2011); National Archives, http://www.dlib.org/dlib/november04/stanescu/11stanescu.html http://www.dpconline.org/advice/technology-watch-reports http://www.library.ucsb.edu/istl/05-spring/article2.html http://www.interpares.org/display_file.cfm?doc=IP3_isym04_presentation_3–3_korea.pdf http://www.nationalarchives.gov.uk/documents/selecting-file-formats.pdf http://www.aiim.org/documents/standards/PDF-A.ppt http://www.ecma-international.org/publications/standards/Ecma-376.htm http://www.jucs.org/jucs_14_18/systematic_characterisation_of_objects/jucs_14_18_2936_2952_becker.pdf http://www.jucs.org/jucs_14_18/systematic_characterisation_of_objects/jucs_14_18_2936_2952_becker.pdf EXAMINING ATTRIBUTES OF OPEN STANDARD FILE FORMATS FOR LONG-TERM PRESERVATION AND OPEN ACCESS | PARK AND OH 54 www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/KB_file_format_evaluation_method_2 7022008.pdf (accessed November 21, 2011). 34. Folk and Barkstrom, “Attributes of File Formats.” 35. National Archives, www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/KB_file_format_evaluation_method_2 7022008.pdf (accessed November 21, 2011); Rog and van Wijk, “Evaluating File Formats for Long-term Preservation.” 36. Rog and van Wijk, “Evaluating File Formats for Long-term Preservation.” 37. See Brown, “Digital Preservation Guidance Note: Selecting File Formats for Long-Term Preservation,” www.nationalarchives.gov.uk/documents/selecting-file-formats.pdf (accessed November 21, 2011); Barnes, “Preservation of Word Processing Documents”; Sahu, “Long Term Preservation”; Potter, “Formats Conversion Technologies.” 38. Stephen Abrams et al., “PDF-A: The Development of a Digital Preservation Standard” (paper presented at the 69th Annual Meeting for the Society of American Archivists, New Orleans, Louisiana, August 14–21, 2005), www.aiim.org/documents/standards/PDF-A.ppt (accessed November 21, 2011).; Sullivan, “An Archival/Records Management Perspective on PDF/A”; CENDI, “Formats for Digital Preservation”; and Hodge & Anderson, “Formats for Digital Preservation.” 39. Todd, “Technology Watch Report,” 33. 40. Evelyn Peters McLellan, “Selecting Digital File Formats for Long-Term Preservation: InterPARES 2 Project General Study 11 Final Report,” www.interpares.org/display_file.cfm?doc=ip2_file_formats(complete).pdf (accessed November 21, 2011). http://www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/KB_file_format_evaluation_method_27022008.pdf http://www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/KB_file_format_evaluation_method_27022008.pdf http://www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/KB_file_format_evaluation_method_27022008.pdf http://www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/KB_file_format_evaluation_method_27022008.pdf http://www.nationalarchives.gov.uk/documents/selecting-file-formats.pdf http://www.aiim.org/documents/standards/PDF-A.ppt http://www.interpares.org/display_file.cfm?doc=ip2_file_formats(complete).pdf INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2012 55 APPENDIX: File Format Attributes No. Attribute Definition/Description Assessed File Format 1. F U N C T I O N A L I T Y Robustness Robust against single point of failure, support for file corruption detection, file format stability, backward compatibility and forward compatibility (Rog & van Wijk, 2008; Wijk & Rog, 2007) PDF/A-1 (Limited) Microsoft Word (Limited) A robust format contains several layers of defense against corruption (Frey, 2000). N/A Feature Set Formats supporting the full range of features and functionality (Brown, 2003) N/A Not defined (Sahu, 2006) N/A Viability Error-detection facilities to allow detection of file corruption (Brown, 2003). PNG format (Yes) Not defined (Sahu, 2006) N/A Support for Graphic Effects and Typography Not defined (CENDI, 2007; Hodge & Anderson, 2007) TIFF_G4 (No) Color Maintenance Not defined (CENDI, 2007; Hodge & Anderson, 2007) TIFF_G4 (Limited) Clarity Support for high image resolution (CENDI, 2007; Hodge & Anderson, 2007) TIFF_G4 (Yes) Quality This pertains to how well the format fulfills its task today: (1) Low space costs, (2) highly encompassing, (3) robust, (4) simplicity, (5) highly tested, (6) loss-free, (7) supports metadata (Clausen, 2004). N/A Compactness To minimize storage and I/O costs (Folk & Barkstrom, 2003) N/A Simplicity Ease of implementing readers (Folk & Barkstrom, 2003) N/A File Corruption Detection To be able to detect that a file has been corrupted; to provide error- correction (Folk & Barkstrom, 2003) N/A Raw I/O Efficiency Formats that are organized for fast sequential access (Folk & Barkstrom, 2003) N/A Availability of Readers To maintain ease of data access for readers (Folk & Barkstrom, 2003) N/A Ease of Subsetting To process only part of data files (Folk & Barkstrom, 2003) N/A Size To transfer data in large blocks (Folk & Barkstrom, 2003) N/A Ability to Aggregate Many Objects in a Single File To maintain as small as archive “name space” as possible (Folk & Barkstrom, 2003) N/A Ability to Embed Data Extraction Software in the Files The files come with read software embedded (Folk & Barkstrom, 2003). N/A Ability to Name File Elements To work with data based on manipulating the element names instead of binary offsets, or other references (Folk & Barkstrom, 2003) N/A Rigorous Definition To be defined in a sufficient rigorous way (Folk & Barkstrom, 2003) N/A Multilanguage Implementation of Library Software To have multiple implementations of readers for a single format (Folk & Barkstrom, 2003) N/A Memory Some formats emphasize the presence or absence of memory (Frey, 2000). TIFF (Yes) EXAMINING ATTRIBUTES OF OPEN STANDARD FILE FORMATS FOR LONG-TERM PRESERVATION AND OPEN ACCESS | PARK AND OH 56 Accuracy In some cases, the accuracy of the data can be decreased to save memory, e.g., through compression. In the case of a digital master, however, accuracy is very important (Frey, 2000). N/A Speed The ability to access or display a data set at a certain speed is critical to certain applications (Frey, 2000). N/A Extendibility A data format can be modified to allow for new types of data and features in the future (Frey, 2000). N/A Modularity A modular data set definition is designed to allow some of its functionality to be upgraded or enhanced without having to propagate changes through all parts of the data set (Frey, 2000). N/A Plugability Related to modularity, this permits the user of an implementation of a data set reader or writer to replace a module with private code (Frey, 2000). N/A Interpretability Not binary formats (Barnes, 2006) RTF (Yes) MS Word (No) XML (Yes) The standard should be written in characters that people can read (Lesk, 1995). N/A Complexity Human readability, compression, variety of features (Rog & van Wijk, 2008; Wijk & Rog, 2007). N/A Simple raster formats are preferred (Puglia et al., 2004). N/A Compression Algorithms The format uses standard algorithms (Puglia et al., 2004). N/A Accessibility To prohibit encryption in the file trailer (Sullivan, 2006) PDF/A (Yes) Component Reuse Not defined (Sahu, 2006) PDF (No) HTML (Limited) SGML (Excellent) XML (Excellent) Repurposing Not defined (Sahu, 1999) PDF (Limited) HTML (Limited) SGML (Excellent) XML (Excellent) Packaging formats In general, packaging formats should be acceptable as transfer mechanisms for image file formats (Puglia et al., 2004). Zip (Yes) Significant Properties The format accommodates high-bit, high-resolution (detail), color accuracy, and multiple compression options (Puglia et al., 2004). N/A Processability The requirement to maintain a processable version of the record to have any reuse value (Brown, 2003) Conversion of a word-processed document into PDF format. (No) Searching Not defined (Sahu, 2006) PDF (Limited) HTML (Good) SGML (Excellent) XML (Excellent) No Definite Term To support the automatic validation of document conversions and the evaluation of conversion quality by hierarchically decomposing documents from different sources and representing them in an abstract XML language (Becker et al., 2008a; Becker et al., 2008b) N/A XCL (Yes) To make transferring data easy (Johnson, 1999) N/A XML (Yes) A format that is easy to restore and understand by both humans and machines (Müller et al., 2003) N/A XML (Yes) INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2012 57 Inability to be backed out into a usable format (Potter, 2006) PDFs (No) 2. M E T A D A T A Self-Documentation Self-documenting digital objects that contain basic descriptive, technical, and other administrative metadata (CENDI, 2007; Hodge & Anderson, 2007) PDF (Yes) PDF/A (Yes) TIFF_G4 (Yes) XML (Yes) Metadata and technical description of format embedded (Rog & van Wijk, 2008; Wijk & Rog, 2007) PDF/A-1 (Limited) Microsoft Word (Limited) The ability of a digital format to hold (in a transparent form) metadata beyond that needed for basic rendering of the content (Arms & Fleischhauer, 2006) N/A Self-Documenting To contain its own description (Abrams et al., 2005) N/A Documentation Deep technical documentation publicly and fully is available. It is maintained for older versions of the format (Puglia et al., 2004). N/A Metadata Support File formats making provision for the inclusion of metadata (Brown, 2003) TIFF (Yes) Microsoft Word 2000 (Yes) Not defined (Kenney, 2001) FIFF 6.0 (Yes) GIF 89a (Yes) JPEG (Yes) Flashpix 1.0.2 (Yes) ImagePac, Photo CD (No) PNG 1.2 (Yes) PDF (Yes) Not defined (Sahu, 2006) N/A Metadata The format allows for self-documentation (Puglia et al., 2004). N/A Content-Level Description Not presentation-level description; structural markup, not formatting (Barnes, 2006) PDF (No) DocBook (Yes) TEI (Yes) XHTML (Yes) XML (Yes) Content-Level, Not Presentation-Level, Descriptions Where possible, the labeling of items should reflect their meaning, not their appearance (Lesk, 1995). SGML (Yes) Self-Describing Many different types of metadata are required to decipher the contents of a file (Folk & Barkstrom, 2003). N/A Self-Describing Files Embed metadata in PDF files (Sullivan, 2006) PDF/A (Adobe Extensible Metadata Platform Required) Formal (BNF- or XML-Like) Description of Format To create new readers solely on the basis of formal descriptions of the file content (Folk & Barkstrom, 2003) N/A No Definite Term Its self-describing tags identify what your content is all about (Johnson, 1999). N/A XML (Yes) A format for strong descriptive and administrative metadata and the complete content of the document (Müller et al., 2003) N/A XML (Yes) EXAMINING ATTRIBUTES OF OPEN STANDARD FILE FORMATS FOR LONG-TERM PRESERVATION AND OPEN ACCESS | PARK AND OH 58 3. O P E N N E S S Disclosure Authoritative specification publicly available (Abrams et al., 2005) PDF/A (Yes) Microsoft Word (No) The degree to which complete specifications and tools for validating technical integrity exist and are accessible to those creating and sustaining digital content (CENDI, 2007; Hodge & Anderson, 2007; Arms & Fleischhauer, 2006) PDF (Yes) PDF/A (Yes) TIFF_G4 (Yes) XML (Yes) Authoritative specification is publicly available (Sullivan, 2006). PDF/A (Yes) Open Availability No proprietary formats (Barnes, 2006) ODF (Yes) GIF (No) PDF (No) RTF (No) Microsoft Word (No) Any manufacturer or researcher should have the ability to use the standard, rather than having it under the control of only one company (Lesk, 1995). Kodak PhotoCD (No) GIF (No) Openness Standardization, restrictions on the interpretation of the file format, reader with freely available source (Rog & van Wijk, 2008; Wijk & Rog, 2007) PDF/A-1 (Yes) MS Word (No) A standard is designed to be implemented by multiple providers and Guide 5: File Formats for Digital Masters employed by a large number of users (Frey, 2000). N/A Formats that are described by publicly available specifications or open-source source code can, with some effort, be reconstructed later: (1) open publicly available specification, (2) specification in public domain, (3) viewer with freely available source, (4) viewer with GPL’ed source, (5) not encrypted (Clausen, 2004). N/A Open-Source Software or Equivalent To move toward obtaining open-source arrangements for all parts of the file format and associated libraries (Folk & Barkstrom, 2003) N/A Open Standard Formats for which the technical specification has been made available in the public domain (Brown, 2003) JPEG (Yes) PDF (Limited) ASCII (Limited) Not defined (Sahu, 2006) N/A Standard/ Proprietary Not defined (Kenney, 2001) FIFF 6.0 (Yes) GIF 89a (Yes) JPEG (Yes) Flashpix 1.0.2 (Yes) ImagePac, Photo CD (No) PNG 1.2 (Yes) PDF (Yes) Nonproprietary Formats The specification is independent of a particular vendor (Public Records Office of Victoria, 2004). N/A No Definite Term To avoid vendor-lock (Potter, 2006) ODF (Yes) INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2012 59 4. I N T E R O P E R A B I L I T Y Interoperability Is the format supported by many software applications/OS platforms or is it linked closely with a specific application (Puglia et al., 2004)? N/A The ability to exchange electronic records with other users and IT systems (Brown, 2003) N/A Not defined (Sahu, 2006) N/A Data Interchange Not defined (Sahu, 2006) PDF (No) HTML (Limited) SGML (Excellent) XML (Excellent) Compatibility Compatibility with prior versions of data set definitions often is needed for access and migration considerations (Frey, 2000). N/A Stability Compatibility between versions (Folk & Barkstrom, 2003) N/A Stable, not subject to constant or major changes over time (Brown, 2003) N/A The format is supported by current applications and backward compatible, and there are frequent updates to the format or the specification (Puglia et al., 2004). N/A Not defined (Sahu, 2006). N/A Scalability The design should be applicable both to small and large data sets and to small and large hardware systems (Frey, 2000). N/A Markup Compatibility and Extensibility To support a much broader range of applications (ECMA, 2008) N/A XML (Yes) Suitability for a Variety of Storage Technologies The format should not be geared toward any particular technology (Folk & Barkstrom, 2003). N/A No Definite Term To allow data to be shared across information systems and remain impervious to many proprietary software revisions (Potter, 2006) OpenOffice (Yes) 5. I N D E P E N D E N C E Device Independencies Can be reliably and consistently rendered without regard to the hardware/software platform (Abrams et al., 2005) PDF/A (Yes) TIFF (No) Static visual appearance can be reliably and consistently rendered and printed without regard to the hardware or software platform used (Sullivan, 2006). PDF/A (Yes) PDF/X (Yes) This is a very important aspect for master files because they will be most likely used on various systems (Frey, 2000). N/A Independent Implementations Independent implementations help ensure that vendors accurately implement the specification (Public Records Office of Victoria, 2004). N/A External- Dependency Degree to which the format is dependent on specific hardware, operating system, or software for rendering or use and the complexity of dealing with those dependencies in future technical environments (Arms & Fleischhauer, 2006) N/A External Dependencies The degree to which a particular format depends on particular hardware, operating system, or software for rendering or use and the predicted complexity of dealing with those dependencies in future technical environments (CENDI, 2007; Hodge & Anderson, 2007) PDF (Limited) PDF/A (No) TIFF_G4 (No) XML (No) EXAMINING ATTRIBUTES OF OPEN STANDARD FILE FORMATS FOR LONG-TERM PRESERVATION AND OPEN ACCESS | PARK AND OH 60 Portability A format that makes extensive use of specific hardware or operating system features is likely to be unusable when that hardware or operating system falls into disuse. A format that is defined in an independent way will be much easier to use in the future: (1) independent of hardware; (2) independent of operating system; (3) independent of other software; (4) independent of particular institutions, groups, or events; (5) widespread current use; (6) little built-in functionality; and (7) single version or well-defined versions (Clausen, 2004). N/A Monitoring Obsolescence Information gathered through regular web harvesting can give us some information about what file types are approaching obsolescence, at least for the more frequently used types (Clausen, 2004). N/A No Definite Term A human-readable text format and internationalized character sets are supported (Müller et al., 2003). N/A XML (Yes) Not dependent on specific hardware, not dependent on specific operating systems, not dependent on one specific reader, not dependent on other external resources (Rog & van Wijk, 2008; Wijk & Rog, 2007) PDF/A-1 (Limited) Microsoft Word (Little) The format requires a plug-in for viewing if appropriate software is not available or relies on external programs to function (Puglia et al., 2004). N/A 6. P R E S E N T A T I O N Distributing Page Image Not defined (Sahu, 2006) PDF (Excellent) HTML (Good) SGML (Good) XML (Good) Normal Rendering Not defined (CENDI, 2007; Hodge & Anderson, 2007). PDF (Yes) PDF/A (Limited) TIFF_G4 (Yes) XML (Yes) Presentation Preservation of its original look and feel (Brown, 2003) N/A Self-Containment Everything that is necessary to render or print a PDF/A file must be contained within the file (Sullivan, 2006). PDF/A (Yes) Self-Contained To contain all resources necessary for rendering (Abrams et al., 2005) N/A Beyond Normal Rendering Not defined (CENDI, 2007; Hodge & Anderson, 2007). PDF (Yes) PDF/A (Yes) TIFF_G4 (Yes) XML (Limited) 7. A U T H E N T I C I T Y Authenticity The format must preserve the content (data and structure) of the record and any inherent contextual, provenance, referencing and fixity information (Brown, 2003). N/A Provenance Traceability Ability to trace the entire configuration of data production (Folk & Barkstrom, 2003) N/A Integrity of Layout Not defined (CENDI, 2007; Hodge & Anderson, 2007) PDF (Yes) PDF/A (Yes) TIFF_G4 (N/A) XML (Yes) Integrity of Rendering of Equations Not defined (CENDI, 2007; Hodge & Anderson, 2007) PDF (Yes) PDF/A (Yes) TIFF_G4 (N/A) XML (Limited) Integrity of Structure Not defined (CENDI, 2007; Hodge & Anderson, 2007) PDF (Limited) PDF/A (Limited) TIFF_G4 (N/A) INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2012 61 XML (Yes) 8. A D O P T I O N Adoption Degree to which the format is already used by the primary creators, disseminators, or users of information resources (CENDI, 2007; Hodge & Anderson, 2007) PDF (Yes) PDF/A (Yes) TIFF_G4 (Yes) XML (Yes) Worldwide usage, usage in the cultural heritage sector as archival format (Rog & van Wijk, 2008; Wijk & Rog, 2007) PDF/A-1 (Yes) Microsoft Word (Limited) The degree to which the format is already used by the primary creators, disseminators, or users of information resources (Arms & Fleischhauer, 2006) N/A Widespread use may be the best deterrent against preservation risk (Abrams et al., 2005). TIFF (Yes) The format is widely used by the imaging community in cultural institutions (Puglia et al., 2004). N/A Flexibility of implementation to promote its wide adoption (Sullivan, 2006) PDF/A (Yes) Popularity A format that is widely used (Folk & Barkstrom, 2003) N/A Widely Used Formats It is far more likely that software will continue to be available to render the format (Public Records Office of Victoria, 2004). N/A Ubiquity Popular formats supported by as much software as possible (Brown, 2003) N/A Not defined (Sahu, 2006) N/A Continuity The file format is mature (Puglia et al., 2004) N/A 9. P R O T E C T I O N Technical Protection Mechanism Password protection, copy protection, digital signature, printing protection and content extraction protection (Rog & van Wijk, 2008; Wijk & Rog, 2007) PDF/A-1 (Limited) Microsoft Word (Limited) Implementation of a mechanism such as encryption that prevents the preservation of content by a trusted repository (CENDI, 2007; Hodge & Anderson, 2007) PDF (Yes) PDF/A (No) TIFF_G4 (No) XML (No) It must be able to replicate the content on new media, migrate and normalize it in the face of changing technology, and disseminate it to users at a resolution consistent with network bandwidth constraints (Arms & Fleischhauer, 2006). N/A No encryption, passwords, etc. (Abrams et al. (2005) N/A Protection The format accommodates error detection, correction mechanisms, and encryption options (Puglia et al., 2004). N/A Source Verification Cryptographic encoding of files or digital watermarks without overburdening the data centers or archives (Folk & Barkstrom, 2003) N/A EXAMINING ATTRIBUTES OF OPEN STANDARD FILE FORMATS FOR LONG-TERM PRESERVATION AND OPEN ACCESS | PARK AND OH 62 10. P R E S E R V A T I O N Preservation The format contains embedded objects (e.g., fonts, raster images) or links to external objects (Puglia et al., 2004). N/A Long-Term Institutional Support To ensure the long-term maintenance and support of a data format by placing responsibility for these operations on institutions (Folk & Barkstrom, 2003) N/A Ease of Transformation/ Preservation The format will be supported for fully functional preservation in a repository setting, or the format guarantee can currently only be made at the bitstream (content data) level (Puglia et al., 2004). N/A No Definite Term To create files with either a very high or very low preservation value (Becker et al., 2008a, Becker et al., 2008b) PDF (No) TIFF (No) 11. R E F E R E N C E Citability A machine-independent ability to reference or “cite” the individual data element in a stable way (Folk & Barkstrom, 2003) N/A Referential Extensibility Ability to build annotations about new interpretations of the data (Folk & Barkstrom, 2003) N/A No Definite Term An open and established notation (Müller et al., 2003) N/A XML (Yes) Data is easily repurposed via tags or translated to any medium (Johnson, 1999) N/A XML (Yes) Creating, using, and reusing tags is easy, making it highly extensible (Johnson, 1999). N/A XML (Yes) 12. O T H E R S Transparency Degree to which the digital representation is open to direct analysis with basic tools, such as human readability using a text-only editor (CENDI, 2007, Hodge & Anderson, 2007). PDF (Limited) PDF/A (Limited) TIFF_G4 (Limited) XML (Yes) In natural reading order (Sullivan, 2006). PDF/A (Yes) Microsoft Notepad (Yes) The degree to which the format is already used by the primary creators, disseminators, or users of information resources (Arms & Fleischhauer, 2006) N/A Amenable to direct analysis with basic tools (Abrams et al., 2005) N/A Ample Comment Space To allow rich metadata (Barnes, 2006) N/A Items should be labeled, as far as possible, with enough information to serve for searching or cataloging (Lesk, 1995). TIFF (Yes) A digital format may inhibit the ability of archival institutions to sustain content in that format (Arms & Fleischhauer, 2006). N/A INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2012 63 Table Bibliography Abrams, Stephen et al. 2005. “PDF-A: The Development of a Digital Preservation Standard.” Paper presented at the 69th Annual Meeting for the Society of American Archivists, New Orleans, Louisiana, August 14–21, http://www.aiim.org/documents/standards/PDF-A.ppt (accessed November 21, 2011). Arms, Caroline R. and Carl Fleischhauer. 2006. “Sustainability of Digital Formats: Planning for Library of Congress Collections.” http://www.digitalpreservation.gov/formats/sustain/sustain.shtml (accessed November 21, 2011). Barnes, Ian. 2006. “Preservation of Word Processing Documents.” http://apsr.anu.edu.au/publications/word_processing_preservation.pdf (accessed November 21, 2011). Becker, Christoph et al. 2008. “A Generic XML Language for Characterising Objects to Support Digital Preservation.” In Proceedings of the 2008 ACM Symposium on Applied Computing, Fortaleza, Ceara, Brazil, March 16–20. Becker, Christoph et al. 2008. “Systematic Characterization of Objects in Digital Preservation: The eXtensible Characterization Language.” Journal of Universal Computer Science 14, no 18: 2936– 2952. Brown, Adams. 2003. “The National Archives. Digital Preservation Guidance Note: Selecting File Formats for Long-Term Preservation.” http://www.nationalarchives.gov.uk/documents/selecting-file-formats.pdf (accessed November 21, 2011). CENDI Digital Preservation Task Group. 2007. “Formats for Digital Preservation: A Review of Alternatives and Issues.” http://www.cendi.gov/publications/CENDI_PresFormats_WhitePaper_03092007.pdf (accessed November 21, 2011). Clausen, Lars R. 2004. “Handling File Formats.” http://netarchive.dk/publikationer/FileFormats- 2004.pdf (accessed November 21, 2011). ECMA. 2008. “Office Open XML File Formats—Part 1.” 2nd ed. http://www.ecma- international.org/publications/standards/Ecma-376.htm (accessed November 21, 2011). Folk, Mike, and Bruce Barkstrom. 2003. “Attributes of File Formats for Long-Term Preservation of Scientific and Engineering Data in Digital Libraries.” Paper presented at the Joint Conference on Digital Libraries, Houston, TX, May 27–31. http://www.hdfgroup.org/projects/nara/Sci_Formats_and_Archiving.pdf (accessed November 21, 2011). http://www.digitalpreservation.gov/formats/sustain/sustain.shtml http://apsr.anu.edu.au/publications/word_processing_preservation.pdf http://www.nationalarchives.gov.uk/documents/selecting-file-formats.pdf http://www.cendi.gov/publications/CENDI_PresFormats_WhitePaper_03092007.pdf http://netarchive.dk/publikationer/FileFormats-2004.pdf http://netarchive.dk/publikationer/FileFormats-2004.pdf http://www.ecma-international.org/publications/standards/Ecma-376.htm http://www.ecma-international.org/publications/standards/Ecma-376.htm http://www.hdfgroup.org/projects/nara/Sci_Formats_and_Archiving.pdf EXAMINING ATTRIBUTES OF OPEN STANDARD FILE FORMATS FOR LONG-TERM PRESERVATION AND OPEN ACCESS | PARK AND OH 64 Frey, Franziska. 2000. “5. File Formats for Digital Masters.” In Guides to Quality in Visual Resource Imaging, Research Libraries Group and Digital Library Federation. http://imagendigital.esteticas.unam.mx/PDF/Guides.pdf (accessed November 21, 2011). Hodge, Gail and Nikkia Anderson. 2007. “Formats for Digital Preservation: A Review of Alternatives and Issues.” Information Services & Use 27: 45–63. Johnson, Amy Helen. 1999. “XML Xtends its Reach: XML Finds Favor in Many IT Shops, but It’s Still Not Right for Everyone.” Computerworld 33, no. 42: 76–81. Lesk, Michael E. 1995. “Preserving Digital Objects: Recurrent Needs and Challenges.” In Proceedings of the 2nd NPO Conference on Multimedia Preservation. Brisbane, Australia. http://www.lesk.com/mlesk/auspres/aus.html (accessed November 21, 2011). Müller, Eva et al. 2003. “Using XML for Long-Term Preservation: Experiences from the DiVA Project.” In Proceedings of the Sixth International Symposium on Electronic Theses and Dissertations. Berlin, May: 109–116, https://edoc.hu-berlin.de/conferences/etd2003/hansson- peter/PDF/index.pdf (accessed December 8, 2012). Potter, John Michael. 2006. “Formats Conversion Technologies Set to Benefit Institutional Repositories.” http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.124.7881\u0026rep=rep1\u0026typ e=pdf (accessed November 21, 2011). Public Records Office of Victoria (Australia). 2006. “Advice on VERS Long-Term Preservation Formats PROS 99/007 (Version2) Specification 4.” Department for Victorian Communities. http://prov.vic.gov.au/wp-content/uploads/2012/01/VERS_Advice13.pdf (accessed November 21, 2011). Puglia, Steven, Jeffrey Reed, and Erin Rhodes. 2004. “Technical Guidelines for Digitizing Archival Materials for Electronic Access: Creation of Production Master Files—Raster Images.” US National Archives and Records Administration. http://www.archives.gov/preservation/technical/guidelines.pdf (accessed November 21, 2011). Rog, Judith, and Caroline van Wijk. 2008. “Evaluating File Formats for Long-term Preservation.” National Library of the Netherlands. http://www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/KB_file_format_evaluation_metho d_27022008.pdf (accessed November 21, 2011). Sahu, D.K. 2004. “Long Term Preservation: Which File Format to Use.” Presentation at Workshops on Open Access & Institutional Repository, Chennai, India, May 2–8, http://openmed.nic.in/1363/01/Long_term_preservation.pdf (accessed November 21, 2011). Sullivan, Susan J. 2006. “An Archival/Records Management Perspective on PDF/A.” Records Management Journal 16, no. 1: 51–56. http://imagendigital.esteticas.unam.mx/PDF/Guides.pdf http://www.lesk.com/mlesk/auspres/aus.html https://edoc.hu-berlin.de/conferences/etd2003/hansson-peter/PDF/index.pdf https://edoc.hu-berlin.de/conferences/etd2003/hansson-peter/PDF/index.pdf http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.124.7881\u0026rep=rep1\u0026type=pdf http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.124.7881\u0026rep=rep1\u0026type=pdf http://prov.vic.gov.au/wp-content/uploads/2012/01/VERS_Advice13.pdf http://www.archives.gov/preservation/technical/guidelines.pdf http://www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/KB_file_format_evaluation_method_27022008.pdf http://www.kb.nl/hrd/dd/dd_links_en_publicaties/publicaties/KB_file_format_evaluation_method_27022008.pdf http://openmed.nic.in/1363/01/Long_term_preservation.pdf INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2012 65 van Wijk, Caroline, and Judith Rog. 2007. “Evaluating File Formats for Long-Term Preservation.” Presentation at International Conference on Digital Preservation, Beijing, China, Oct 11–12. http://ipres.las.ac.cn/pdf/Caroline-iPRES2007-11-12oct_CW.pdf (accessed November 21, 2011). http://ipres.las.ac.cn/pdf/Caroline-iPRES2007-11-12oct_CW.pdf 2161 ---- Editorial Board Thoughts: Doesn’t Work Mark Cyzyk EDITORIAL BOARD THOUGHTS | CYZYK 3 The proof of the pudding’s in the eating. Miguel de Cervantes Saavedra. The ingenious hidalgo Don Quixote de la Mancha. Part I, Chapter XXXVII, John Rutherford, trans. About fifteen years ago I had two students from Germany working for me, Jens and Andreas. Those guys were great. They were smart and funny and interesting and always did their best. I would send them out to fix things around the library, and they would dutifully report back with success or failure. I told them that, particularly if there was a problem with a staff workstation, “If it breaks in the morning, it must be fixed by lunchtime; if it breaks in the afternoon, it must be fixed by 5:00.” They understood that if a staff workstation was down, then that probably also meant a staff member was just sitting there, waiting for it to be fixed. If we had to we could slap a sign on a broken public workstation and get back to it later—there were plenty of other working public stations after all—but staff workstations must be working at all times. Insofar as we had an aged fleet of PCs whose CMOS batteries were rapidly giving out, I kept Jens and Andreas running around the building quite a bit. On occasion, though, they would report back with the dreaded, “Hey boss, doesn’t work.” This was the one thing that would raise my ire. “Of course it doesn’t work, that’s why I sent you down there!” I would think. The phrase “doesn’t work” became for me a Pavlovian signal that I was about to drop everything and go take a look myself. It now occurs to me, though, that this notion of “work” is precisely the point of technology, and that sometimes this gets lost for those of us employed fulltime as technologists in libraries. Let me explain: In my opinion and for the most part, the proper role of the technologist in a library is that of a consultant on loan to the departments to work on projects there, embedded.1 Two of the best bosses I ever had said essentially the same thing to me in our introductory first-day-on-the-job chit-chat: “You report to me, but you work for them.” Such is the proper attitude in any service- oriented profession. Does this not frequently get inverted, subverted, lost? What happens is that technology starts to take on an importance undeserved. It becomes self- referential and insular; a technology-for-technology’s-sake attitude arises. Mark Cyzyk (mcyzyk@jhu.edu) is Scholarly Communication Architect in The Sheridan Libraries, John Hopkins University. mailto:mcyzyk@jhu.edu INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2012 4 But technology-for-technology’s-sake is just wrong. Technology is merely a means to an end, not an end in itself. The word itself derives from the ancient Greek technê, most frequently translated into English as “craft” and frequently distinguished in the Greek philosophical literature from epistêmê or (certain) knowledge.2 So it is here that the crucial distinction in the Western world between practical and theoretical activities is made, and technology is clearly a practical, not theoretical activity. As such, it has by its very nature practical outcomes in the world: technology works in the world. Technology is instrumental in achieving certain practical outcomes; its value is as a tool, instrumentally valuable, not inherently valuable. It is not for its own sake that we implement technology; we implement technology to get some sort of work accomplished in the world. Our programming languages, application servers, Web application frameworks, AJAX libraries, integrated development environments, source-code repositories, build tools, testing harnesses, switches, routers, single-signon utilities, proxy servers, link resolvers, repositories, bibliographic management utilities, help-desk ticketing applications, and elaborate project-management protocols are all for naught if the final product of our labor, at the end of the day, doesn’t work. Our product is not only literally useless, it is worse than useless because the library in which we labor has devoted precious resources to it only to result in a service or product that does not properly function, and those are precious resources that could have been spent elsewhere. Hey there fellow technologists, why am I being so dismal? I would prefer the term “grave” to “dismal.” Significant portions of the library budget are put toward technology each year, and as those whose duty it is to carry our local technology strategies into the future, we need to always be mindful of the fact that each and every dollar spent on technology is a dollar not available for building our collections—surely the direct center of the mission of anyone who calls himself a librarian, A.K.A., a cultural conservationist. (Shouldn’t we be wearing badges that read, “To Collect and Preserve”?) Making it work is Job One for the technologist in the library. … A colleague and friend of mine once told me, a decade ago, that our fellow colleague made a snippy comment about an important and major Web application I had written, “Just because it works doesn’t mean it’s right.” Now, admittedly, I was a very sloppy Code Formatter, and yet I certainly would never say that the applications I wrote were steaming plates of spaghetti. On the contrary, I think the code I wrote consisted of good, solid procedural programming. What my disgruntled colleague meant, I think, was that I failed to follow a framework, and by “framework” he naturally meant the same framework to which he’d recently hitched his own coding wagon. My response to his snippiness was, “Ah, pretty-it-up all you want, organize it any-which-way, but functional code-- code that works--is actually the Number One criterion for being Good Code.” Just ask your clients. EDITORIAL BOARD THOUGHTS | CYZYK 5 That app I wrote has been in production, happily working away as a key piece of the enterprise network infrastructure at a prominent, multi-campus, East Coast university since 2002.3 REFERENCES 1. And here I heartily agree with my fellow Editorial Board member, Michael Witt, when he notes that “[p]art of this process is attempting to feel our users’ pain…”, and I even extend this to the point of us technologists actively working with our users toward a common goal, literally sitting with them, among them, not merely being present to offer occasional support, not merely feeling their pain but being so invested in our common project that their pain is our pain. [Did I really just suggest we take on more pain?! Yep.] See: Michael Witt. “Eating Our Own Dogfood.” Information Technology and Libraries 30, no. 3 (September 2011) 90. http://www.ala.org/lita/ital/sites/ala.org.lita.ital/files/content/30/3/pdf/witt.pdf 2. I’m no classics scholar, but this is my recollection from taking a graduate seminar many years ago on this very topic. So while I’m not pulling this entirely out of thin air, I am pulling it from the musty mists of middle-aged memory – that, and a quick scan of Professor Richard Parry’s fine article on this topic in The Stanford Encyclopedia of Philosophy, particularly the section on Aristotle’s views. Regarding my comments below on technology being instrumentally valuable, I cite Parry’s words: “Presumably, then, the craftsman does not choose his activity for itself but for the end; thus the value of the activity is in what is made”. See: Richard Parry. "Episteme and Techne," The Stanford Encyclopedia of Philosophy, Fall 2008 Edition, Edward N. Zalta, editor. http://plato.stanford.edu/archives/fall2008/entries/episteme-techne/ 3. Mark Cyzyk, "The Johns Hopkins Address Registration System (JHARS): Anatomy of an Application," Educause Quarterly 26, no. 3 (2003). https://jscholarship.library.jhu.edu/handle/1774.2/32800 http://www.ala.org/lita/ital/sites/ala.org.lita.ital/files/content/30/3/pdf/witt.pdf http://plato.stanford.edu/archives/fall2008/entries/episteme-techne/ https://jscholarship.library.jhu.edu/handle/1774.2/32800 2163 ---- Reference Information Extraction and Processing Using Conditional Random Fields Tudor Groza, Gunnar AAstrand Grimnes, and Siegfried Handschuh REFERENCE INFORMATION EXTRACTION AND PROCESSING |GROZA, GRIMNES, AND HANDSCHUH 6 ABSTRACT Fostering both the creation and the linking of data with the scope of supporting the growth of the Linked Data Web requires us to improve the acquisition and extraction mechanisms of the underlying semantic metadata. This is particularly important for the scientific publishing domain, where currently most of the datasets are being created in an author-driven, manual manner. In addition, such datasets capture only fragments of the complete metadata, omitting usually, important elements such as the references, although they represent valuable information. In this paper we present an approach that aims at dealing with this aspect of extraction and processing of reference information. The experimental evaluation shows that, currently, our solution handles very well diverse types of reference format, thus making it usable for, or adaptable to, any area of scientific publishing. 1. INTRODUCTION The progressive adoption of Semantic Web 1 techniques resulted in the creation of a series of datasets connected by the Linked Data 2 initiative, and via the Linked Data principles, into a universal Web of Linked Data. In order to foster the continuous growth of this Linked Data Web, we need to improve the acquisition and extraction mechanisms of the underlying semantic metadata. Unfortunately, the scientific publishing domain, a domain with an enormous potential for generating large amounts of Linked Data, still promotes trivial mechanisms for producing semantic metadata. 3 As an illustration, the metadata acquisition process of the Semantic Web Dog Food Server, 4 the main Linked Data publication repository available on the Web, consists of two steps:  the authors manually fill-in submission forms corresponding to different publishing venues (e.g., conferences or workshops), with the resulting (usually XML) information being transformed via scripts into semantic metadata, and  the entity URIs (i.e., authors and publications) present in this semantic metadata are then manually mapped to existing Web URIs for linking/consolidation purposes. Tudor Groza (tudor.groza@uq.edu.au) is Postdoctoral Research Fellow, School of Information Technology and Electrical Engineering, University of Queensland, Gunnar AAstrand Grimnes (grimnes@dfki.uni-kl.de) is Researcher, German Research Center for Artificial Intelligence (DFKI) GmbH, Kaiserslautern, Germany, Siegfried Handschuh (msiegfried.handschuh@deri.org) is Senior Lecturer/Associate Professor, National University of Ireland, Galway, Ireland. mailto:tudor.groza@uq.edu.au mailto:grimnes@dfki.uni-kl.de mailto:msiegfried.handschuh@deri.org INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2012 7 Moreover, independent of the creation/acquisition process, one particular component of the publication metadata, i.e., the reference information, is almost constantly neglected. The reason is mainly the amount of work required to manually create it, or the complexity of the task, in the case of automatic extraction. As a result, currently, there are no datasets in the Linked Data Web exposing reference information, while the number of digital libraries providing search and link functionality over references is rather limited. This is quite a problematic gap if we consider the amount of information provided by references and their foundational support for other application techniques that bring value to researchers and librarians, such as citation analysis and citation metrics, tracking temporal author-topic evolution 5 or co-authorship graph analysis. 6,7 In this paper we focus on the first of the above-mentioned steps, i.e., providing the underlying mechanisms for automatic extraction of reference metadata. We devise a solution that enables extraction and chunking of references using Conditional Random Fields (CRF). 8 The resulting metadata can then be easily transformed into semantic metadata adhering to particular schemas via scripts, the added value being the exclusion of the manual author-driven creation step from the process. From the domain perspective, we focus on computer science and health sciences only because these domains have representative datasets that can be used for evaluation and hence enable comparison against similar approaches. However, we believe that our model can be applied also in domains such as digital humanities or social sciences, and we intend, in the near future, to build a corresponding corpus that would allow us to test and adapt (if necessary) our solution to these domains. Figure 1. Examples of Chunked and Labeled Reference Strings Reference chunking represents the process of label sequencing a reference string, i.e., tagging the parts of the reference containing the authors, the title, the publication venue, etc. The main issue associated with this task is the lack of uniformity in the reference representation. Figure 1 presents three examples of chunked and labeled reference strings. One cannot infer generic patterns for all types of references. For example, the year (or date) of some of the references of this paper are similar to example 2 from the figure, i.e., they are located at the very end of the reference string. Unfortunately, this does not hold for some journal reference formats, such as the one presented in example 1. And at the same time, the actual date might not comprise only the year, but also the month (and even day). In addition to the placement of the particular types of tokens within the reference string, one of the major concerns when labeling these types of tokens is disambiguation. Generally, there are three categories of ambiguous elements: REFERENCE INFORMATION EXTRACTION AND PROCESSING |GROZA, GRIMNES, AND HANDSCHUH 8  names—can act as authors, editors, or even part of organization names (e.g., Max Planck Institute); in example 1 a name is used as part of the title;  numbers—can act as pages, years, days, volume numbers, or just numbers within the title;  locations—can act as actual locations or part of organization names (e.g., Univ. of Wisconsin) To help the chunker in performing disambiguation, one can use a series of markers, such as, pp. for pages, TR for technical reports, Univ. or Institute for organization. However, there are cases where such markers help in detecting the general category of the token, e.g., publication venue, but a more detailed disambiguation is required. For example, the Proc. marker generally signals the publication venue of the reference, without knowing exactly whether it represents a workshop, conference or even journal (as in the case of Proc. Natl. Acad. Sci.—Proceedings of the National Academy of Sciences). The solution we have devised was built to properly handle such disambiguation issues and the intrinsic heterogeneous nature of references. The features of the CRF chunker model were chosen to provide a representative discrimination between the different fields of the reference string. Consequently, as the experimental results show, the resulting chunker has a superior efficiency, while at the same time maintaining an increased versatility. The rest of the paper is structured as follows: in section 2 we briefly describe Conditional Random Fields and analyze the existing related work. Section 3 details the CRF-based reference chunker and before concluding in section 5, section 4 presents our experimental results. 2. BACKGROUND 2.1 Conditional Random Fields To have a better understanding of the Machine Learning technique used by our solution, in the following we give a brief description of the Conditional Random Fields paradigm. Figure 2. Example Linear CRF—Showing Dependencies Between Features X and Classes Y INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2012 9 Conditional Random Fields (CRF) is a probabilistic graphical model for classification. CRF, in general, can represent many different types of graphical models, however in the scope of this paper, we use the so-called linear-chain CRFs. A simple example of a linear dependency graph is shown in Figure 2, here only the features X of the previous item influences the class of the current item Y. The conditional probability is defined as: ( | ) ( ) (∑ ( ) ) where ( ) ∑ ( ) and ( ) ∑ (∑ ( ) ) . The model is usually trained by maximizing the log-likelihood of the training data by gradient methods. A dynamic algorithm is used to compute all the required probabilities p⍬(yi, yi+1) for calculating the gradient of the likelihood. This means that in contrast to traditional classification algorithms in Machine Learning (e.g., Support Vector Machines 9 ), it not only considers the attributes of the current element when determining the class, but also attributes of preceding and succeeding items. This makes it ideal for tagging sequences, such as chunking of parts of speech or parts of references, which is what we require for our chunking task. 2.2 Related Work In recent years, extensive research has been performed in the area of automatic metadata extraction from scientific publications. Most of the approaches focus on one of the two main metadata components, i.e., on the heading/bibliographic metadata or on the reference metadata, but there are also cases when the entire set is targeted. As this paper focuses only on the second component, within this section we present and discuss those applications that deal strictly with reference chunking. The ParsCit framework is the closest technique mapping to our goals and methodology. 10 ParsCit is an open-source reference-parsing package. While its first version used a Maximum Entropy model to perform reference chunking, 11 currently, inspired by the work of Peng et al. , 12 it uses a trained CRF model for label sequencing. The model was obtained based on a set of twenty-three token-oriented features tailored towards correcting the errors that Peng's CRF model produced. Our CRF chunker builds on the work of ParsCit. However, as we aimed at improving the chunking performance, we altered some of the existing features and introduced additional ones. Moreover, we have compiled significantly larger gazetteers required for detecting different aspects, such as names, places, organizations, journals, or publishers. One of the first attempts to extract and index reference information led to the currently well- known system, CiteSeer. 13 Around the same period, Seymore et al. developed one of the first reference chunking approaches that used Machine Learning techniques. 14 The authors trained a Hidden Markov Model (HMM) to build a reference sequence labeler using internal states for different parts of the fields. As it represented pioneering work, it also resulted in the first gold standard set, the CORA dataset. At a later stage, the same group applied CRF for the first time to perform reference chunking, which later inspired ParsCit. 15 REFERENCE INFORMATION EXTRACTION AND PROCESSING |GROZA, GRIMNES, AND HANDSCHUH 10 In the same learning-driven category is the work of Han et al. 16 The authors proposed an effective word clustering approach with the goal of reducing feature dimensionality when compared to HMM, while at the same time improving the overall chunking performance. The resultant domain, rule-based word clustering method for cluster feature representation used clusters formed from various domain databases and word orthographic properties. Consequently, they achieved an 8.5 percent improvement on the overall accuracy of reference fields classification combined with a significant dimensionality reduction. FLUX-CIM 17 is the only unsupervised 18 approach that targets reference chunking. The system uses automatically constructed knowledge bases from an existing set of sample references for recognizing the component fields of a reference. The chunking process features two steps:  a probability estimation of a given term within a reference which is a value for a given reference field based on the information encoded in their knowledge bases, and  the use of generic structural properties of references. Similarly to Seymore et al., 19 the authors have also created two datasets (specifically for the computer science and health science areas) to be used for comparing the achieved accuracies. A completely different, and novel, direction was developed by Poon and Domingos. 20 Unlike all the other approaches, they propose a solution where the segmentation (chunking) of the reference fields is performed together with the entity resolution in a single integrated inference process. They, thus, help in disambiguating the boundaries of less-clear chunked fields, using the already well-segmented ones. Although the results achieved are similar to, and even better than some of, the above-mentioned approaches, this is suboptimal from the computational perspective: the chunking/resolution time reported by the authors measured around thirty minutes. In addition to the previously described works, which were specifically tailored for bibliographic metadata extraction, there are a series of other approaches that could be used for the same purpose. For example, Cesario et al. propose an innovative recursive boosting strategy, with progressive classification, to reconcile textual elements to an existing attribute schema. 21 In the case of bibliographic metadata segmentation, the metadata fields would correspond to the textual elements, while an ontology describing them (e.g., DublinCore 22 or SWRC 23 ) would have the schema role. The authors even describe an evaluation of the method using the DBLP citation dataset, however, without giving precise details on the fields considered for segmentation. Some other approaches include, in general, any sequence labeling techniques, e.g., SLF, 24 named entity recognition techniques, 25 or even Field Association (FA) terms extraction, 26 the latter working on bibliographic metadata fields in a quasi-similar manner as the recursive boosting strategy. In conclusion, it is worth mentioning that retrieving citation contexts is an interesting research area especially in the context of digital libraries. Our current work does not feature this aspect, but we regard it as one of the key next steps to be tackled. Consequently, we mention the research performed by Schwartz et al. 27 Teufel et al., 28 or Wu et al. 29 that deal with using citation contexts for discerning a citation's function and analyzing how this influences or is influenced by the work it points to. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2012 11 3. METHOD This section presents the CRF chunker model. We start by defining the preprocessing steps that deal with the extraction of the references block, dividing the block into actual reference entries and cleaning the reference strings, and then detail the CRF reference chunker features. 3.1 Prerequisites Most of the features used by the CRF chunker require some forms of vocabulary entries. Therefore, we have manually compiled a comprehensive list of gazetteers (only for English, except for the names), explained as follows:  FirstName—25,155 entries gazetteer of the most common first names (independent of gender);  LastName—48,378 entries list of the most common surnames;  Month—month names gazetteer and associated abbreviations;  VenueType—a structured gazetteer with five categories: Conference, Workshop, Journal, TechReport, and Website. Each category has attached its own gazetteer, containing specific keywords and not actual titles. For example, the Conference gazetteerfeatures ten unigrams signaling conferences, such as Conference, Conf, or Symposium;  Location—places, cities, and countries gazetteer comprising 17,336 entries;  Organization—150 entries gazetteer listing organization prefixes and suffixes (e.g., e.V. or KGaA);  Proceedings—simple list of all possible appearances of the Proceedings marker;  Publisher—564 entries gazetteer comprising publisher unigrams (produced from around 150 publisher names);  JTitle—12,101 entries list of journal title unigrams (produced from around 1600 journal titles);  Connection—a 42 entries stop-word gazetteer (e.g., to, and, as). 3.2 Preprocessing In the preprocessing stage we deal with three aspects:  cleaning the provided input,  extracting the reference block, and  the division of the reference block into reference entries. The first step aims to clean the raw textual input received by the chunker of unwanted spacing characters while at the same time ensuring proper spacing where necessary. Since the source of the textual input is unknown to the chunker, we make no assumptions with regard to its structure or content. 30 Thus, in order to avoid inherent errors that might appear as a result of extracting the raw text from the original document, we perform the following cleaning steps:  we compress the text by eliminating unnecessary carriage returns, such that the lines containing less than 15 characters are merged with previous ones, 31  we introduce spaces after some punctuation characters, such as “,,” “.” or “-”, and finally,  we split the camel-cased strings, such as JohnDoe. REFERENCE INFORMATION EXTRACTION AND PROCESSING |GROZA, GRIMNES, AND HANDSCHUH 12 The result will be a compact and clean version of the input. Also, if the raw input is already compact and clean, this preprocessing step will not affect it. The extraction of the reference block is done using regular expressions. Generally, we search in the compacted and cleaned input for specific markers, like References or Bibliography, located mainly at the beginning of a line. If these are not directly found, we try different variations, such as, looking for the markers at the end of a line, or looking for split markers onto two lines (e.g., Ref – erences, or Refer – ences). This latter case is a typical consequence of the above-described compacting step if the initial input was erroneously extracted. The text following the markers is considered for division, although it may contain unwanted parts such as appendices or tables. The division into individual reference entries is performed on a case basis. After splitting the reference block based on new lines, we look for prefix patterns at the beginning of each line. As an example, we analyze which lines start with “[”, “(”, or a number followed by “.” or space, and we record the positions of these lines in the list of all lines. To ensure that we don't consider any false positives when merging the adjacent lines into a reference entry, we compute a global average of the differences between positions. Assuming that a reference does not span on more than four lines, if this average is between one and four, a reference entry is created. The same average is also used to extract the last reference in the list, thus detaching it from eventual appendices or tables. 3.3 The reference chunking model We have built the CRF learning model based on a series of features used in principle also by the other CRF reference chunking approaches such as ParsCit 32 or Peng and McCallum 33 . A set of feature values is used to characterize each token present in the reference string, where the reference's token list is obtained by dividing the reference string into space-separated pieces. The complete list of features is detailed as follows. We use example 1 from figure 1 toexemplify the feature values.  Token—the original reference token: Bronzwaer,  Clean token—the original token, stripped of any punctuation and lower cased: bronzwaer  Token ending—a flag signaling the type of ending (possible values: lower cap – c / upper cap – C / digit – 0 / punctuation character: ,  Token decomposition–start—five individual values corresponding to token's first five characters, taken gradually: B, Br, Bro, Bron, Bronz  Token decomposition–end—five individual values corresponding to the token's last five characters, taken gradually: r, er, aer, waer, zwaer,  POS Tag—the token's part of speech tag (possible values: proper noun phrase – NNP ,  noun phrase – NP, adjective – JJ, cardinal number – CD, etc): NNP  Orthographic case—a flag signaling the token's orthographic case (possible values:  initialCap, singleCap, lowercase, mixedCaps, allCaps): singleCap  Punctuation type—a flag signaling the presence and type of a trailing punctuation character (possible values: cont, stop, other): cont  Number type—a flag signaling the presence and type of a number in the token (possible values: year, ordinal, 1dig, 2dig, 3dig, 4dig, 4dig+, noNumber): noNumber INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2012 13  Dictionary entries—a set of ten flags signaling the presence of the token in the set of individual gazetteers listed in Sect. 3.1. For our example the dictionary feature set would be: no LastName no no no no no no no no  Date check—a flag checking whether the token may contain a date in form of a period of days, e.g., 12-14 (possible values: possDate, no): no  Pages check—a flag checking whether the token may contain pages, e.g., 234–238 (possible values: possPages, no): no  Token placement—the token placement in the reference string, based on its division into nine equal consecutive buckets. This feature indicates the bucket number: 0 For training purposes we compiled and manually tagged a set of 830 randomly chosen references. These were extracted from random publications from diverse conferences and journals from the computer science field (collected from IEEE Explorer, Springer Link or the ACM Portal), manually cleaned, tagged, and categorized according to their type of publication venue. 34 To achieve an increased versatility, instead of performing cross- validation, 35 which would result in a dataset- tailored model with limited or no versatility, we opted for sampling the test data. Hence, we included in the training corpus some samples from the testing datasets as follows: 10 percent of the CORA dataset (i.e., 20 entries), 36 10 percent of the FLUX-CIM CS dataset (i.e., 30 entries), 37 and 1% of the FLUX-CIM HS dataset (i.e., 20 entries). Consequently, the final training corpus consisted of a total of 900 reference strings. To clarify, this is, to some extent, similar to the dataset-specific cross-validation, but instead of considering, for example, a 60–40 ratio for training/testing, we used only 10 percent for training, while the testing (described in section 4) was performed as a direct application of the chunker on the entire dataset. As already mentioned, our focus on computer science and health sciences is strictly due to evaluation purposes. Our proposed model is domain-agnostic, and hence, the steps described here can be easily performed on datasets emerged from other domains, if at all necessary. In reality, the chunker’s performance on references from a domain not covered above can be easily boosted simply by including a sample of references in the training set and then retraining the chunker. The list of labels used for training and then testing consists of Author, Title, Journal, Conference, Workshop, Website, Technicalrep, Date, Publisher, Location, Volnum, Pages, Etal, Note, Editors, Organization. As we will see in the evaluation, not all labels were actually used for testing (e.g., Note or Editors), some of them being present in the model for the sake of disambiguation. Also, as opposed to the other approaches, we made a clear distinction between Workshop and Conference, which adds an extra degree to the complexity of the disambiguation. The CRF model was trained using the MALLET (A Machine Learning for Language Toolkit) implementation. 38 The output of the chunker is post-processed to expose a series of fine-grained details. As shown in figure 1 in all the examples, the chunking provides a blocked partition of the reference string, but we require for the Author field an even deeper partition. Consequently, following a rule-based approach we extract the individual author names from the Author block making use of the punctuation marks, the orthographic case, and the alternation between initials and actual names. When no initials, subject to the existing punctuation marks, we consider as a rule-of-thumb that each name generally comprises one first name and one surname (in this order, i.e., John Doe). The result of the post-processing is used in the linking process. REFERENCE INFORMATION EXTRACTION AND PROCESSING |GROZA, GRIMNES, AND HANDSCHUH 14 4. EXPERIMENTAL RESULTS We have performed an extensive evaluation of the proposed reference chunking approach. In general, all the previous work in reference chunking focuses on raw reference chunking, i.e., label sequencing at the macro level. More concretely, the other approaches split and tag the reference strings using blocks of complete references, without going into details such as chunking individual authors. The only exception is the ParsCit package that does perform complete reference chunking in a similar fashion as we do. The evaluation results presented in this section, will feature complete chunking only for our solution and for ParsCit, and raw chunking for the rest of the approaches. Field ParsCit Peng Han et al. Our approach P R F1 F1 P R F1 P R F1 Author 98.7 99.3 98.99 99.4 92.6 99.1 97.6 99.08 99.6 99.30 Title 96.0 98.4 97.18 98.3 92.2 93.0 92.6 95.64 95.64 95.64 Date 100 98.4 99.19 98.9 98.5 95.9 97.2 99.33 98.67 98.99 Pages 97.7 98.4 98.04 98.6 95.6 96.9 96.2 99.28 99.22 99.24 Location 95.6 90.0 92.71 87.2 77.7 71.5 74.5 93.45 92.59 93.01 Organization 90.9 87.9 89.37 94.0 76.5 77.3 76.9 100 87.87 93.54 Journal 90.8 91.2 90.99 91.3 77.1 78.7 77.9 94.02 97.42 95.68 Booktitle 92.7 94.2 93.44 93.7 88.7 88.9 88.88 97.77 98.44 98.10 Publisher 95.2 88.7 91.83 76.1 56.0 64.1 59.9 94.84 95.83 95.33 Tech. rep. 94.0 79.6 86.2 86.7 56.2 64.1 59.9 100 90.90 95.23 Website - - - - - - - 100 100 100 Table 1. Evaluation Results on the CORA Dataset An additional observation we need to make is related to the reference fields taken into account. Most of the fields we have focused on coincide with the fields considered by all the existing relevant approaches. Nevertheless, there are also some discrepancies, listed as follows:  the fields: Volume, Number, Editors, or Note were used in the chunking process b u t are not considered for evaluation  unlike all the other approaches, we make the distinction between Conference and Workshop as publication venues. However, for alignment purposes (i.e., to be able to compare our results with the other approaches), in the evaluation results these are merged into the Booktitle field. The actual tests were performed on four different datasets, three of them used also for evaluating the other approaches, and a fourth one compiled by us. In the case of the three existing datasets, during the experimental evaluation we did not make use of the preprocessing step as they were already clean. As evaluation metric, we used the F1 score, 39 i.e., the harmonic mean of precision and recall, using the following formula: INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2012 15 In the following, we iterate over each dataset, by providing a short description and the experimental results. It is worth mentioning that our CRF reference chunker was trained only once, as described earlier, and not specifically for each dataset. 4.1 Dataset: CORA The CORA dataset is the first gold standard created for automatic reference chunking. 40 It comprises two hundred reference strings and focuses on the computer science area. Each entry is segmented into thirteen different fields: Author, Editor, Title, Booktitle, Journal, Volume, Publisher, Date, Pages, Location, Tech, Institution and Note. Table 1 shows the comparative evaluation results on the CORA dataset of ParsCit, Peng et al., 41 Han et al., 42 and our approach. We observe that our chunker outperforms the other chunkers on most of the fields, with some of them presenting a significant increase in performance (looking at the F1 score): Journal from 91.3 percent to 95.68 percent, Booktitle from 93.44 percent to 98.10 percent, Publisher from 91.83 percent to 95.33 percent, and especially Tech. rep. from 86.7 percent to 95.23 percent. In the case of the fields where our chunker was outperformed, the F1 score is very close to the best of the approaches and includes an increase in one of its two components (i.e., precision or recall). For example, on the Organization field, we scored 93.54percent, the best being Peng's 94 percent. However, we achieved a gain of almost 10 percent in precision when compared with ParsCit (100 percent vs. 90.9 percent precision). Similarly, on the Date field, our F1 was 98.99 percent, opposed to ParsCit's 99.19 percent, but with a better recall of 98.67 percent. Field ParsCit FLUX-CIM Our approach P R F1 P R F1 P R F1 Author 98.8 99.0 98.89 93.59 95.58 94.57 99.08 99.08 99.08 Title 98.8 98.3 98.54 93.0 93.0 93.0 99.65 99.65 99.65 Date 99.8 94.5 97.07 97.75 97.44 97.59 98.55 98.19 98.36 Pages 94.7 99.3 96.94 97.0 97.84 97.41 97.28 97.72 97.49 Location 96.9 88.4 92.45 96.83 97.6 97.21 95.55 94.5 95.02 Journal 97.1 82.9 89.43 95.71 97.81 96.75 94.0 97.91 95.91 Booktitle 95.7 99.3 97.46 97.47 95.45 96.45 99.13 99.13 99.13 Publisher 98.8 75.9 85.84 100 100 100 98.59 98.59 98.59 Table 2. Evaluation Results on the FLUX-CIM Dataset—CS Domain Field FLUX-CIM Our approach p R F1 p R F1 Author 98.57 99.04 98.81 99.8 99.36 99.57 Title 84.88 85.14 85.01 91.39 91.39 97.39 Date 99.85 99.5 99.61 99.89 99.69 99.78 Pages 99.1 99.2 99.45 99.94 99.59 99.76 Journal 97.23 89.35 93.13 99.42 99.16 99.28 Table 3. Evaluation Results on the FLUX-CIM Dataset—HS Domain REFERENCE INFORMATION EXTRACTION AND PROCESSING |GROZA, GRIMNES, AND HANDSCHUH 16 4.1 Dataset: FLUX-CIM FLUX-CIM 43 is an unsupervised 44 reference extraction and chunking system. In order to evaluate its performance, the authors of FLUX-CIM created two separate datasets:  the FLUX-CIM CS dataset, composed on a collection of heterogeneous references from the Computer Science field, and  the FLUX-CIM HS dataset is comprised of an organized and controlled collection of references from PubMed. The FLUX-CIM CS dataset contains three hundred reference strings randomly selected from the ACM Digital Library. Each string is segmented into ten fields: Author, Title, Conf, Journal, Volume, Number, Pub, Date, Pages and Place. The FLUX-CIM HS dataset contains 2000 entries, with each entry segmented into six fields: Author, Title, Journal, Volume, Date and Pages. Table 2 presents the comparative test results achieved by ParsCit, FLUX-CIM, and our approach on the CS dataset. Similar to the CORA dataset, our chunker outperformed the other chunkers on the majority of the fields, exceptions being the Location, Journal, and Publisher fields. The test results on the HS dataset are presented in table 3. Here we can observe a clear performance improvement on all fields, in some cases the difference being significant, e.g., the Title field, from 85.01 percent to 97.39 percent, or the Journal field, from 93.12 percent to 99.28 percent. This increase is even more relevant considering the size of the dataset, each 1percent representing twenty references. 4.3 Dataset: CS-SW While the CORA and FLUX-CIM CS datasets do focus on the computer science field, they do not cover the slight differences in reference format that can be found nowadays in the Semantic Web community. Consequently, to show the even broader application of our approach, we have compiled a dataset named CS-SW comprising 576 reference strings randomly selected from publications in the Semantic Web area, from conferences such as International Semantic Web Conference (ISWC), the European Semantic Web Conference (ESWC), the World Wide Web Conference (WWW), or the European Conference on Knowledge Acquisition (and co-located workshops). 45 Each reference entry is segmented into twelve fields: Author, Title, Conference, Workshop, Journal, Techrep, Organization, Publisher, Date, Pages, Website and Location. Table 4 shows the results of the tests carried out on this dataset. One can easily observe that the chunker performed in a similar manner as on the CORA dataset, with emphasis on the Author, Date, Pages and Publisher fields. Field Our approach P R F1 Author 98.61 99.27 98.93 Title 94.91 93.29 94.09 Date 98.89 98.34 98.61 Pages 98.94 97.24 98.08 Location 93.9 92.77 93.33 Organization 85.71 80 00 82.75 Journal 94.59 93.33 93.95 INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2012 17 Conference 96.66 95.08 95.86 Workshop 83.33 88.23 85.71 Publisher 96.61 97.43 97.01 Tech. rep. 100 80 88.88 Website 98.14 94.64 96.35 Table 4. Evaluation Results on the CS-SW Dataset 5. CONCLUSION In this paper we presented a novel approach for extracting and chunking reference information from scientific publications. The solution, realized using a CRF trained chunker, achieved good results in the experimental evaluation, in addition to an increased versatility shown by applying the one-time trained chunker on multiple testing datasets. This enables a straightforward adoption and reuse of our solution for generating semantic metadata in any digital library or publication repository focused on scientific publishing. As next steps, we plan to create a comprehensive dataset covering multiple heterogeneous domains (e.g., social sciences or digital humanities) and evaluate the chunker’s performance on it. Then we will focus on developing an accurate reference consolidation and linking technique, to address the second step mentioned in section 1, i.e., aligning the resulting metadata to the existing Linked Data on the Web. We plan to develop a flexible consolidation mechanism by dynamically generating and executing SPARQL queries from chunked reference fields and filtering the results via two string approximation metrics (a combination of Monge-Elkan and Chapman Soundex algorithms). The SPARQL queries generation will be implemented in an extensible manner, via customizable query modules, to accommodate the heterogeneous nature of the diverse Linked Data sources. Finally, we intend to develop an overlay interface for arbitrary online publication repositories, to enable on-the-fly creation, visualization, and linking of semantic metadata from repositories that currently do not expose their datasets in a semantic / linked manner. ACKNOWLEDGEMENTS The work presented in this paper has been funded by Science Foundation Ireland under Grant No. SFI/08/CE/I1380 (Lion-2). REFERENCES AND NOTES 1. Tim Berners-Lee et al., “The Semantic Web,” Scientific American 284 (2001): 35–43. 2. Christian Bizer et al., “Linked Data—The Story So Far,” International Journal on Semantic Web and Information Systems 5 (2009): 1–22. 3. Generating computer-understandable metadata represents an issue, in general, in the publishing domain, and not necessarily only in its scientific area. However, the relevant literature dealing with metadata extraction/generation has focused on scientific publishing, because of its accelerated growing rate, especially with the increasing use of the World Wide Web as a dissemination mechanism. REFERENCE INFORMATION EXTRACTION AND PROCESSING |GROZA, GRIMNES, AND HANDSCHUH 18 4. Knud Moeller et al., “Recipes for Semantic Web Dog Food – The ESWC and ISWC Metadata Projects,” Proceedings of the 6th International Semantic Web Conference (Busan, Korea, 2007). 5. Wei Peng and Tao Li, “Temporal relation co-clustering on directional social network and author-topic evolution,” Knowledge and Information Systems 26 (2011): 467–86. 6. Laszlo Barabasi et al., “Evolution of the social network of scientific collaborations,” Physica A: Statistical Mechanics and its Applications 311 (2002): 590–614. 7. Xiaoming Liu et al., “Co-authorship networks in the digital library research community,” Information Processing & Management 41 (2005): 1462–80. 8. John D. Lafferty et al., “Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data,” Proceedings of the 18th International Conference on Machine Learning (San Francisco, CA, USA, 2001): 282–89. 9. Vladimir Vapnik, The Nature of Statistical Learning Theory (New York: Springer, 1995). 10. Isaac G. Councill et al., “ParsCit: An Open-source CRF Reference String Parsing Package,” Proceedings of the Sixth International Language Resources and Evaluation (Marrakech, Morocco, 2008). 11. Yong Kiat Ng, “Citation Parsing Using Maximum Entropy and Repairs” (master's thesis, National University of Singapore, 2004). 12. Fuchun Peng and Andrew McCallum, “Information Extraction from Research Papers Using Conditional Random Fields,” Information Processing & Management 42 (2006): 963–79. 13. C. Lee Giles et al., “CiteSeer: An Automatic Citation Indexing System,” Proceedings of the Third AMC Conference on Digital Libraries (Pittsburgh, PA, 1998): 89–98. 14. Kristie Seymore et al., “Learning Hidden Markov Model Structure for Information Extraction,” Proceedings of the AAAI Workshop on Machine Learning for Information Extraction (1999): 37– 42. 15. Isaac G. Councill et al., “ParsCit: An Open-source CRF Reference String Parsing Package,” Proceedings of the Sixth International Language Resources and Evaluation (Marrakech, Morocco, 2008). 16. Hui Han et al., “Rule-based Word Clustering for Document Metadata Extraction,” Proceedings of the Symposium on Applied Computing (Santa Fe, New Mexico, 2005). 17. Eli Cortez et al., “FLUX-CIM: Flexible Unsupervised Extraction of Citation Metadata,” Proceedings of the 2007 Conference on Digital Libraries (New York, 2007): 215–24. 18. Machine Learning methods can be broadly classified into two categories: supervised and unsupervised. Supervised methods require training on specific datasets that exhibit the characteristics of the target domain. To achieve high accuracy levels, the training dataset needs to be reasonably large, and more importantly, it has to cover most of the possible INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2012 19 exceptions from the intrinsic data patterns. Unlike supervised methods, unsupervised methods do not require training, and in principle, use generic rules to encode both the expected patterns and the possible exceptions of the target data. 19. Peng and McCallum, “Information Extraction from Research Papers Using Conditional Random Fields.” 20. Hoifung Poon and Pedro Domingos, “Joint inference in information extraction,” Proceedings of the 22nd National Conference on Artificial Intelligence (Vancouver, British Columbia, Canada, 2007): 913–18. 21. Ariel Schwartz et al., “Multiple Alignment of Citation Sentences with Conditional Random Fields and Posterior Decoding,” Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (Prague, Czech Republic, 2007): 847–57. 22. Simone Teufel et al., “Automatic Classification of Citation Function,” Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing (Sydney, Australia, 2006): 103–10. 23. Jien-Chen Wu et al., “Computational Analysis of Move Structures in Academic Abstracts,” COLING/ACL Interactive Presentation Sessions (Sydney, Australia, 2006): 41–44. 24. Eugenio Cesario et al., “Boosting text segmentation via progressive classification,” Knowledge and Information Systems 15 (2008): 285–320. 25. Dublin Core website, http://dublincore.org (accessed May 4, 2011). 26. York Sure et al., “The SWRC ontology – Semantic Web for research communities,” Proceedings of the 12th Portuguese Conference on Artificial Intelligence (Covilha, Portugal, 2005). 27. Yanjun Qi et al., “Semi-Supervised Sequence Labeling with Self-Learned Features,” Proceedings of IEEE International Conference on Data Mining (Miami, FL, USA, 2009). 28. David Sanchez et al., “Content Annotation for the Semantic Web: An Automatic Web-Based Approach,” Knowledge and Information Systems 27 (2011): 393-418. 29. Tshering Cigay Dorji et al., “Extraction, selection and ranking of Field Association (FA) Terms from domain-specific corpora for building a comprehensive FA terms dictionary,” Knowledge and Information Systems 27 (2011): 141–61. 30. Please note that the chunker is document-format agnostic and takes as input only raw text. The actual extraction of this raw text from the original document (PDF, DOC or some other format) is the user’s responsibility. 31. As a note, we chose this length of fifteen characters empirically, and based on the assumption that in any format the publication content lines usually have more than fifteen characters. REFERENCE INFORMATION EXTRACTION AND PROCESSING |GROZA, GRIMNES, AND HANDSCHUH 20 32. Lafferty et al., “Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data.” 33. Councill et al., “ParsCit: An Open-source CRF Reference String Parsing Package.” 34. The manual tagging was performed by a single person and since the reference chunks have no ambiguity attached, we did not see the need for running any data reliability tests. 35. Ron Kohavi, “A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection,” Proceedings of the 14th International Joint Conference on Artificial Intelligence (Montreal, Quebec, 1995): 1137–43. 36. Peng and McCallum, “Information Extraction from Research Papers Using Conditional Random Fields.” 37. Councill et al., “ParsCit: An Open-source CRF Reference String Parsing Package.” 38. Mallet: MAchine Learning for LanguagE Toolkit, http://mallet.cs.umass.edu (accessed May 4, 2011). 39. William M. Shaw et al., “Performance standards and evaluations in IR test collections: Cluster- based retrieval models,” Information Processing & Management 33 (1997): 1–14. 40. Peng and McCallum, “Information Extraction from Research Papers Using Conditional Random Fields.” 41. Councill et al., “ParsCit: An Open-source CRF Reference String Parsing Package.” 42. Seymore et al., “Learning Hidden Markov Model Structure for Information Extraction.” 43. Han et al., “Rule-based Word Clustering for Document Metadata Extraction.” 44. Cortez et al., “FLUX-CIM: Flexible Unsupervised Extraction of Citation Metadata.” 45. The CS-SW dataset is available at http://resources.smile.deri.ie/corpora/cs-sw (accessed May 4, 2011). http://resources.smile.deri.ie/corpora/cs-sw 2164 ---- Public Library Computer Waiting Queues: Alternatives to the First -Come-First-Served Strategy Stuart Williamson PUBLIC LIBRARY COMPUTER WAITING QUEUES | WILLIAMSON 72 ABSTRACT This paper summarizes the results of a simulation of alternative queuing strategies for a public library computer sign-up system. Using computer usage data gathered from a public library, the performance of these various queuing strategies is compared in terms of the distribution of user wait times. The consequences of partitioning a pool of public computers are illustrated as are the potential benefits of prioritizing users in the waiting queue according to the amount of computer time they desire. INTRODUCTION Many of us at public libraries are all too familiar with the scene: a crowd of customers huddled around the library entrance in the morning, anxiously waiting for the doors to open to begin a race for the computers. From this point on, the wait for a computer at some libraries, such as the one we will examine, can hover near thirty minutes on busy days and peak at an hour or more. Such long waiting times are a common source of frustration for both customers and staff. By far the most effective solution to this problem is to install more public computers at your library. Of course, when the space or money run out, this may no longer be possible. Another approach is to reduce the length or number of sessions each customer is allowed. Unfortunately, reducing session length can make completion of many important tasks difficult; whereas, restricting the number of sessions per day can result in customers upset over being unable to use idle computers.1 Finally, faced with daunting wait times, libraries eager to make their computers accessible to more people may be tempted to partition their waiting queue by installing separate fifteen-minute “express” computers. A primary focus of this paper is to illustrate how partitioning the pool of public computers can significantly increase waiting times. Additionally, several alternative queuing strategies are presented for providing express-like computer access without increasing overall waiting times. We often take for granted the notion that first-come-first-served (FCFS) is a basic principle of fairness. “I was here first,” is an intuitive claim that we understand from an early age. However, Stuart Williamson (swilliamson@metrolibrary.org) is Researcher, Metropolitan Library System, Oklahoma City, Oklahoma. mailto:swilliamson@metrolibrary.org INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2012 73 the inefficiency present in a strictly FCFS queue is implicitly acknowledged when we courteously invite a person with only a few items to bypass our overflowing grocery cart to proceed ahead in the check-out line. Most of us would agree to wait an additional few minutes rather than delay someone else for a much greater length of time. When express lanes are present, they formalize this process by essentially allowing customers needing help for only a short period of time to cut in line. These line cuts are masked by the establishment of separate dedicated lines, i.e., the queue is partitioned into express and non-express lines. One question addressed by this article is “is there a middle ground?” In other words, how might a library system set up its computer waiting queue to achieve express-lane type service without splitting the set of public internet computers into partitions that operate separately and in parallel? Several such strategies are presented here along with the results of how each performed in a computer simulation using actual customer usage data from a public library. STRATEGIES Queuing systems are heavily researched in a number of disciplines, particularly computer science and operations research. The complexity and sheer number of different queuing models can present a formidable barrier to library professionals. This is because, in the absence of real-world data, it is often necessary to analyze a queuing system mathematically by approximating its key features with an applicable probability distribution. Unfortunately, applying these distributions entails adopting their underlying assumptions as well as any additional assumptions involved in calculating the input parameters. For instance, the Poisson distribution (used to approximate customer arrival rates) requires that the expected arrival rate be uniform across all time intervals, an assumption which is clearly violated when school lets out and teenagers suddenly swarm the computers.2 Even if we can account for such discrepancies, there remains the difficulty of estimating the correct arrival rate parameter for each discrete time interval being analyzed. Fortunately, many libraries now use automated computer sign-up systems which provide access to vast amounts of real-world data. With realistic data, it is possible to simulate various queuing strategies, a few of which will be analyzed in this article. A computer simulation using real-world data provides a good picture of the practical implications of any queuing strategy we care to devise without the need for complex models. As is often the case, designing a waiting queue strategy involves striking a balance among competing factors. For instance, one way of reducing waiting times involves breaking with the FCFS rule and allowing users in one category to cut in front of other users. How many cuts are acceptable? Does the shorter wait time for users in one category justify the longer waits in another? There are no right answers to these questions. While simulating a strategy can provide a realistic picture of its results in terms of waiting times, evaluating which strategy’s results are preferable for a particular library must be done on a case-by-case basis. In addition to the standard FCFS strategy with a single pool of computers and the same FCFS strategy implemented with one computer removed from the pool to serve as a dedicated fifteen- PUBLIC LIBRARY COMPUTER WAITING QUEUES | WILLIAMSON 74 minute express computer (referred to as FCFS-15), we will consider for comparison three other well-known alternative queuing strategies: Shortest-Job-First (SJF), Highest-Response-Ratio-Next (HRRN), and a variant of Shortest-Job-First (SJF-FB) which employs a feedback mechanism to restrict the number of times a given user may be bypassed in the queue.3 The three alternative strategies all require advance knowledge or estimation of how long each particular computer session will last. In our case, this means customers would need to indicate how long of a session they desire upon first signing up for a computer. Any number of minutes is acceptable so we will limit the sign-up options to four categories in fifteen-minute intervals: fifteen minutes, thirty minutes, forty-five minutes, and sixty minutes. Each session will then be initially categorized into one of four priority classes (P1, P2, P3, and P4) accordingly. As the data will show, customers selecting shorter sessions are given a higher priority in the queue and will thus have a shorter expected waiting time. It should be noted that relying on users to choose their own session length presents its own set of problems. It is often difficult to estimate how much time will be required to accomplish a given set of tasks online. However, users face a similar difficulty in deciding whether to opt for a dedicated fifteen-minute computer under the FCFS-15 system. The trade-off between use time and wait time should provide an incentive for some users to self-ration their computer use, placing an additional downward pressure on wait times. However, user adaptations in response to various queuing strategies are outside the scope of this analysis and will not be considered further. The Shortest-Job-First (SJF) strategy functions by simply selecting from the queue the user in the highest priority class. The amount of time spent waiting by each user is only considered as a tie breaker among users occupying the same priority class. Our results demonstrate that the SJF strategy is generally best for minimizing overall average waiting time as well as for getting customers needing the least amount of computer time online the fastest. The main drawbacks of this strategy are that these gains come at the expense of more line cuts and higher average and maximum waiting times for the lowest priority users—those needing the longest sessions (sixty minutes). There is no limit to how many times a user can be passed over in the queue. In theory, this means that such a user could be continually bypassed and never be assigned a computer during the day. The SJF-FB strategy is a variant of SJF with the addition of a feedback mechanism that increases the priority of users each time they are cut in line. For instance, if a user signs up for a sixty- minute session, he/she is initially assigned a priority of 4. Suppose that shortly after, another user signs up for a thirty-minute session and is assigned a priority of 2. The next available computer will be assigned to the user with the priority 2. The bypassed user’s priority will now be bumped up by a set interval. In this simulation an interval of 0.5 is used so the bypassed user’s new priority becomes 3.5. As a result, users beginning with a priority of 4 will reach the highest priority of 1 after being bypassed six times and will not be bypassed further. This effectively restricts the maximum number of times a user can be cut in front of at six. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2012 75 The final alternative strategy, Highest-Response-Ratio-Next (HRRN), is a balance between FCFS and SJF. It considers both the arrival time and requested session length when assigning a priority to each user in the queue. Each time a user is selected from the queue, the response ratio is re- calculated for all users. The user with the highest response ratio is selected and assigned the open computer. The formula for response ratio is: ( ) This allows users with a shorter session request to cut in line, but only up to a point. Even customers requesting the longest possible session move up in priority as they wait, just at a slower pace. This method produces the same benefits and drawbacks as the SJF strategy; but the effects of both are moderated, and the possibility of unbounded waiting is eliminated. Still, although the expected number of cuts will be lower using HRRN than with SJF, there is no limit on how many times a user may be passed over in the queue. The response ratio formula can be generalized by scaling the importance of the waiting time factor. For instance in the modified response ratio below, increasing values of x > 1 will cause the strategy to more resemble FCFS, and decreasing values of 0 < x < 1 will more resemble SJF. ( ) One could experiment with different values of x to find a desired balance between the number of line cuts and the impact on average waiting times for customers in the various priority classes. This won’t be pursued here, and x will be assumed to be 1. METHODOLOGY The data used in this simulation come from the Metropolitan Library System’s Southern Oaks Library in Oklahoma City. This library has eighteen public Internet computers that customers can sign up for using proprietary software developed by Jimmy Welch, Deputy Executive Director/Technology for the Metropolitan Library System. The waiting queue employs the first- come-first-served (FCFS) strategy. Customers are allotted an initial session of up to sixty minutes but may extend their session in thirty-minute increments so long as the waiting queue is empty. Repeat customers are also allowed to sign up for additional thirty-minute sessions during the day, provided that no user currently in the queue has been waiting for more than ten minutes (an indication that demand for computers is currently high). Anonymous usage data gathered by the system in August 2010 was compiled to produce the information about each customer session shown in table 1. PUBLIC LIBRARY COMPUTER WAITING QUEUES | WILLIAMSON 76 Table 1. Session Data (units in minutes) The information about each session required for the simulation includes the time at which the user arrived to sign up for a computer, the number of minutes it took the user to log in once assigned a computer, how many minutes of computer time were used, whether or not this was the user’s first or a subsequent session for the day, and finally, whether the user gave up waiting and abandoned his/her place in the queue. Users are given eight minutes to log in once a computer station is assigned to them before they are considered to have abandoned the queue. Once this data has been gathered, the computer simulation runs by iterating through each second the library is open. As user sign-up times are encountered in the data, they are added to the waiting queue. When a computer becomes available, a user is selected from the queue using the strategy being simulated and assigned to the open computer. The customer occupies the computer for the length of time given by their associated log-in delay and session length. When this time expires, customers are removed from their computer and the information recorded during their time spent in the waiting queue is logged. RESULTS There were 7,403 sign-ups for the computers at the Southern Oaks Library in August 2010. Each of these requests is assigned a priority class based on the length of the session as detailed in table 2. The intended session length of users choosing to abandon the queue is unknown. Abandoned sign-ups are assigned a priority class randomly in proportion to the overall distribution of priority classes in the data so as not to introduce any systematic bias into the results. Even though their actual session length is zero, these users participate in the queue and cause the computer eventually assigned to them to sit idle for eight minutes until it is re-assigned. Customers signing up for a subsequent session during the day are always assigned the lowest priority class (P-4) regardless of their requested session length. This is a policy decision to not give priority to users who have already received a computer session for the day. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2012 77 Table 2. Assignment of Priority Classes Figure 1 displays the average waiting time for each priority class during the simulation (bars) along with the total number of sessions initially assigned to each class (line). It is immediately obvious from the chart that each alternative strategy excels at reducing the average wait for high priority (P1) users. Also observe how removing one computer from the pool to serve exclusively as a fifteen-minute computer drastically increases the FCFS-15 average wait times in the other priority classes. Clearly, removing one (or more) computer from the pool to serve as a dedicated fifteen-minute station is a poor strategy here for all but the 519 users in class P-1. Losing just one of the eighteen available computers nearly doubles the average wait for the remaining 6,884 users in the other priority classes. Figure 1. Average User Wait Minutes by Priority Class PUBLIC LIBRARY COMPUTER WAITING QUEUES | WILLIAMSON 78 By contrast, note that the reduced average wait times for the highest priority users in class P-1 persist in classes P-2 and P-3 for the non-FCSC strategies. The SJF strategy produces the most dramatic reductions for the 2,164 users not in class P-4. However, for the 5,239 users in class P-4, the SJF strategy produced an average wait time that was 2.1 minutes longer than the purely FCFS strategy. The HRRN strategy achieves lesser wait time reductions than SJF in the higher priority classes, but HRRN increased the average wait for users in class P-4 by only 0.7 minutes relative to FCFS. The average wait using the SJF-FB strategy falls in between that of SJF and HRRN for each priority class while guaranteeing users will be cut at most six times. An examination of the maximum wait times for each priority class in figure 2 illustrates how the express lane itself can be a bottleneck. Even with a dedicated fifteen-minute express computer under the FCFS-15 strategy, at least one user would have waited over half an hour to use a computer for fifteen minutes or less. In all but the highest priority class (P-2 through P-4), the FCFS-15 strategy again performs poorly with at least one user in each of these classes waiting over ninety minutes for a computer. Figure 2. Maximum User Wait Minutes by Priority Class Capping the number of times a user may be passed over in the queue under the SFJ-FB strategy makes it less likely that members of classes P-2 and P-3 will be able to take advantage of their higher priority to cut in front of users in class P-4 during periods of peak demand. As a result, the SJF-FB maximum wait times for classes P-2 and P-3 are similar to those under the FCFS strategy. This was not the case in the breakdown of SJF-FB average waiting times across priority classes in figure 1. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2012 79 Table 3 breaks down waiting times for each queuing strategy according to the overall percentage of users waiting no more than the given number of minutes. Here we see the effects of each strategy on the system as a whole, instead of by priority class. Notice that the overall average wait times for the non-FCFS strategies are lower than those of FCFS. This indicates that the total reduction in waiting times for high-priority users exceeds the additional time spent waiting by users in class P-4. In other words, these strategies are globally more efficient than FCFS. Notice, too, in table 3 that the non-FCFS strategies achieve significant reductions in the median wait time compared with FCFS. Table 3. Distribution of Wait Times by Strategy After demonstrating the impact that breaking the first-come-first-served rule can have on waiting times, it is important to examine the line cuts that are associated with each of these strategies. Line cuts are recorded by each user in the simulation while waiting in the queue. Each time a user is selected from the queue and assigned a computer, remaining users who arrived prior to the one just selected note having been skipped over. By the time they are assigned a computer, users have recorded the total number of times they were passed over in the queue. PUBLIC LIBRARY COMPUTER WAITING QUEUES | WILLIAMSON 80 Figure 3. Cumulative Distribution of Line Cuts by Queuing Strategy Figure 3 displays the cumulative percentage of users experiencing no more than the listed number of cuts for each non-FCFS strategy. The majority of users are not passed over at all under these strategies. However, there is a small minority of users that will be repeatedly cut in line. For instance, in our simulation, one unfortunate individual was passed over in the queue sixteen times under the SJF strategy. This user waited ninety-one minutes using this strategy as opposed to only fifty-nine minutes under the familiar FCFS waiting queue. Most customers would become upset upon seeing a string of sixteen people jump over them in the queue and get on a computer while they are enduring such a long wait. The HRRN strategy caused a maximum of nine cuts to an individual in this simulation. This user waited seventy-three minutes under HRRN versus only fifty-five minutes using FCFS. Extreme examples such as those above are the exception. Under the HRRN and SJF-FB strategies, 99% of users were passed over at most four times while waiting in the queue. CONCLUSION We have examined the simulation of several queuing strategies using a single month of computer usage data from the Southern Oaks Library. The relative performance difference between queuing strategies will depend on the supply and demand of computers at any given location. Clearly, at libraries with plenty of public computers for which customers seldom have to wait, the choice of queuing strategy is inconsequential. However, for libraries struggling with waiting times on par with those examined here, the choice can have a substantial impact. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2012 81 In general, however, these simulation results demonstrate the ability of non-FCFS queuing strategies to significantly lower waiting times for certain classes of users without partitioning the pool of computers. These reductions in waiting times come at the cost of allowing high priority users to essentially cut in line. This causes slightly longer wait times for low priority users; but, overall average and median wait times see a small reduction. Of course, for some customers, being passed over in line even once is intolerable. Furthermore, creating a system to implement an alternative queuing strategy may present obstacles of its own. However, if the need to provide for quick, short-term computer access is pressing enough for a library to create a separate pool of “express” computers; then, one of the non-FCFS queuing strategies discussed in this paper may be a viable alternative. At the very least, the FCFS-15 simulation results should give one pause before resorting to designated “express” and “non- express” computers in an attempt to remedy unacceptable customer waiting times. ACKNOWLEDGMENTS The author would like to thank the Metropolitan Library System, Kay Bauman, Jimmy Welch, Sudarshan Dhall, and Bo Kinney for their support and assistance with this paper as well as Tracey Thompson and Tim Spindle for their excellent review and recommendations. REFERENCES 1. J. D. Slone, “The Impact of Time Constraints on Internet and Web Use,” Journal of the American Society for Information Science and Technology 58 (2007): 508–17. 2. William Mendenhall and Terry Sincich, Statistics for Engineering and the Sciences (Upper Saddle River, NJ: Prentice-Hall, 2006), 151–54. 3. Abraham Silberschatz, Peter Baer Galvin, and Greg Gagne, Operating System Concepts (Hoboken, NJ: Wiley, 2009), 188–200. 2165 ---- Resource Discovery: Comparative Survey Results on Two Catalog Interfaces Heather Hessel and Janet Fransen RESOURCE DISCOVERY: COMPARATIVE SURVEY RESULTS | HESSEL AND FRANSEN 21 ABSTRACT Like many libraries, the University of Minnesota Libraries-Twin Cities now offers a next-generation catalog alongside a traditional online public access catalog (OPAC). One year after the launch of its new platform as the default catalog, usage data for the OPAC remained relatively high, and anecdotal comments raised questions. In response, the libraries conducted surveys that covered topics such as perceptions of success, known-item searching, preferred search environments, and desirable resource types. Results show distinct differences in the behavior of faculty, graduate student, and undergraduate survey respondents, and between library staff and non-library staff respondents. Both quantitative and qualitative data inform the analysis and conclusions. INTRODUCTION The growing level of searching expertise at large research institutions and the increasingly complex array of available discovery tools present unique challenges to librarians as they try to provide authoritative and clear searching options to their communities. Many libraries have introduced next-generation catalogs to satisfy the needs and expectations of a new generation of library searchers. These catalogs incorporate some of the features that make the current web environment appealing: relevancy ranking, recommendations, tagging, and intuitive user interfaces. Traditional OPACs are generally viewed as more complex systems, catering to advanced users and requiring explicit training in order to extract useful data. Some librarians and users also see them as more effective tools for conducting research than next-generation catalogs. Academic libraries are frequently caught in the middle of conflicting requirements and expectations for discovery from diverse sets of searchers. In 2002, the University of Minnesota-Twin Cities Libraries migrated from the NOTIS library system to the ALEPH500™ system and launched a new web interface based on the ALEPH online catalog, originally branded as MNCAT. In 2006, the libraries contracted with the Ex Libris Group as one of three development partners in the creation of a new next-generation search environment called Primo. During the development process, the libraries conducted multiple usability studies that provided data to inform the direction of the product. Participants in the usability studies generally characterized the Primo interface as “clear” and “efficient.”1 A year later the University Heather Hessel (heatherhessel@yahoo.com) was Interim Director of Enterprise Technology and Systems, Janet Fransen (fransen@umn.edu) is the librarian for Aerospace Engineering, Electrical Engineering, Computer Science, and History of Science & Technology, University of Minnesota, Minneapolis, MN. mailto:heatherhessel@yahoo.com mailto:fransen@umn.edu INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2012 22 Libraries branded Primo as MNCAT Plus, rebranded the ALEPH OPAC as MNCAT Classic, and introduced MNCAT Plus to the Twin Cities user community as a beta service. In August 2008, MNCAT Plus was configured as the default search for the Twin Cities catalog on the libraries’ main website, with the libraries continuing to keep a separate link active to the ALEPH OPAC. A new organizational body called the Primo Management Group was created in December 2008 to coordinate support, feedback, and enhancements of the local Primo installation. This committee’s charge includes evaluating user input and satisfaction, coordinating communication to users and staff, and prioritizing enhancements to the software and the normalization process. When the Primo Management Group began planning its first user satisfaction survey, the group noted that a significant number of library users seemed to prefer MNCAT Classic. Therefore, two surveys were developed in response to the group’s charge. These two surveys were identical in scope and questions, except that one survey referenced MNCAT Classic and was targeted to MNCAT Classic searchers (appendix A), while the other survey referenced MNCAT Plus and was targeted to MNCAT Plus searchers (appendix B). These surveys were designed to produce statistics that could be used as internal benchmarks to gauge library progress in areas of user experience, as well as to assist with ongoing and future planning with regard to discovery tools and features. RESEARCH QUESTIONS In addition to evaluating user satisfaction and requesting user input, the Primo Management Group also chose to question users about searching behaviors in order to set the direction of future interface work. Questions directed toward searching behaviors were informed by the findings from a 2009 University of Minnesota Libraries report on making resources discoverable.2 The group surveyed respondents about types of items they expect to find in their searches, their interest in online resources, and the entry point for their discovery experience. The Primo Management Group crafted the surveys to get answers to the following research questions:  How often do users view their searching activity as successful?  How often do users know the title of the item that they are looking for, as opposed to finding any resource relevant to their topic?  What search environments do users choose when looking for a book? A journal? Anything relevant to a topic?  How interested are users in finding items that are not physically located at the University of Minnesota?  Are there other types of resources that users would find helpful to discover in a catalog search? RESOURCE DISCOVERY: COMPARATIVE SURVEY RESULTS | HESSEL AND FRANSEN 23 Although it can be tempting to think of the people using the catalog interfaces as a homogeneous group of “users,” large academic libraries serve many types of users. As Wakimoto states in “Scope of the Library Catalog in Times of Transition,” On the one hand, we have ‘Net-generation users who are accustomed to the simplicity of the Google interface, are content to enter a string of keywords, and want only the results that are available online. On the other hand, we have sophisticated, experienced catalog users who understand the purpose of uniform titles and Library of Congress classifications and take full advantage of advanced search functions. We need to accommodate both of these user groups effectively.3 The Primo Management Group planned to use the demographic information to look for differences among user communities; therefore the surveys requested demographic information such as role (e.g., student) and college of affiliation (e.g., School of Dentistry). In designing the surveys, the group took into account the limitations of this type of survey as well as the availability of other sources of information. For example, the Primo Management Group chose not to include questions about specific interface features because such questions could be answered by analyzing data from system logs. The group was also interested in finding out about users’ strategies for discovering information, but members felt that this information was better obtained through focus groups or usability studies rather than through a survey instrument. RESEARCH METHOD The Primo Management Group positioned links to the user surveys in several online locations, with the libraries’ home page providing one primary entry point. Clicking on the link from the home page presented users with an intermediate page, where they were given a choice of which survey to complete: one based on MNCAT Plus, and the other on MNCAT Classic. If desired, users could choose to complete a separate survey for each of the two systems. Links were also provided from within the MNCAT Plus and MNCAT Classic environments, and these links directed users to the relevant version of the survey without the intermediary page. In addition to the survey links in the online environment, announcements were made to staff about the surveys, and librarians were encouraged to publicize the surveys to their constituents around campus. The survey period lasted from October 1 through November 25, 2009. At the time of the surveys, the University of Minnesota Libraries was running Primo version 2 and ALEPH version 19. Because participants were self-selected, the survey results represent a biased sample, are more extreme than the norm, and are not generalizable to the whole university population. Participants were not likely to click the survey link or respond to e-mailed requests unless they had sufficient incentive, such as strong feelings about one interface or the other. Thirty percent of respondents provided an e-mail address to indicate that they would be willing to be contacted for focus groups or further surveys, indicating a high level of interest in the public-facing interfaces the libraries employ. In considering a process for repeating this project, more attention would be paid to methodology to address validity concerns. FINDINGS AND ANALYSIS INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2012 24 Findings relevant to each research question are discussed here. Six hundred twenty-nine surveys contained at least one response—476 for MNCAT Plus and 153 for MNCAT Classic. Responses by Demographics As shown in table 1, graduate students were the primary respondents for both MNCAT Plus and MNCAT Classic, followed by undergraduates and faculty members. Library staff made up 13 percent of MNCAT Classic respondents and 4 percent of MNCAT Plus respondents, although the actual number of library staff responding was nearly identical (twenty-one for MNCAT Plus, twenty for MNCAT Classic). Library staff members were disproportionately represented in these survey responses and the group analyzed the results to identify categories in which library staff members differed from overall trends in the responses. Questions about affiliation appeared at the end of the surveys, which may account for the high number of respondents in the “Unspecified” category. MNCAT Classic Respondents Frequency MNCAT Plus Respondents Frequency Graduate student 50 33% Graduate student 176 37% Undergraduate student 31 20% Undergraduate student 110 23% Library staff 20 13% Faculty 40 8% Faculty 21 14% Staff (non-library) 28 6% Staff (non-library) 10 7% Library staff 21 4% Community member 2 1% Community member 11 2% (Unspecified) 19 12% (Unspecified) 90 19% Total 153 100% Total 476 100% Table 1. Respondents by User Population A comparison of the student survey responses shows that graduate students were overrepresented, while undergraduates were underrepresented, at close to a reverse ratio. Of the total number of graduate and undergraduate students, 62 percent of the respondents were graduate students, even though they accounted for only 32 percent in the larger population. Conversely, undergraduates represented only 38 percent of the student respondents, even though they accounted for 68 percent of the graduate and undergraduate total. Regrettably, the surveys did not include options for identifying oneself as a non-degree-seeking or professional student, so the analysis of students compared with overall population in this section includes only graduate students and undergraduates. Differences were also apparent in the representation of all four categories of students within a particular college unit. At least two college units were underrepresented in the survey responses: RESOURCE DISCOVERY: COMPARATIVE SURVEY RESULTS | HESSEL AND FRANSEN 25 Carlson School of Management and the College of Continuing Education. One college unit was overrepresented in the survey results; 59 percent of the overall student respondents to the MNCAT Classic survey, and 47 percent of the MNCAT Plus students indicated that they were housed in the College of Liberal Arts (CLA), and yet CLA students only represent 32 percent of the total number of students on campus. Table 2 shows the breakdown of percentages by college or unit and the corresponding breakdown by survey respondent, highlighting where significant discrepancies are evident. Twin Cities Overall Percentage of Students MNCAT Classic Student Survey Respondents +/- MNCAT Plus Student Survey Respondents +/- Carlson School of Management 9% 0% -9% 2% -7% Center for Allied Health 0% 2% +1% 1% 0% Col of Educ/Human Development 10% 9% -1% 14% +3% Col of Food, Agr & Nat Res Sci 5% 4% 0% 7% +2% Coll of Continuing Education 8% 1% -7% 1% -7% College of Biological Sciences 4% 6% +2% 5% 0% College of Design 3% 3% 0% 3% 0% College of Liberal Arts 32% 59% +27% 47% +15% College of Pharmacy 1% 1% 0% 0% -1% College of Veterinary Medicine 1% 1% 0% 1% 0% Graduate School 0% 0% 0% 0% 0% Humphrey Inst of Publ Affairs 1% 1% 0% 1% 0% Institute of Technology (now College of Science & Engineering) 14% 9% -5% 10% -4% Law School 2% 1% -1% 1% 0% Medical School 4% 2% -3% 5% 0% School of Dentistry 1% 1% 0% 0% -1% School of Nursing 1% 0% -1% 0% -1% School of Public Health 2% 1% -1% 3% +1% Table 2. Student Responses by Affiliation INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2012 26 Faculty and staff together totaled only eighty-nine respondents on the MNCAT Plus survey and fifty-one respondents on the MNCAT Classic survey. In keeping with graduate and undergraduate student trends, the College of Liberal Arts (CLA) was clearly over-represented in terms of faculty responses. The CLA faculty group represents about 17 percent of the faculty at the University of Minnesota. Yet over half the faculty respondents on the MNCAT Plus survey were from CLA; over 80 percent of the MNCAT Classic faculty respondents identified themselves as affiliated with CLA. Faculty groups that were underrepresented include the Medical School and the Institute of Technology. Perceptions of Success A critical area of inquiry for the surveys was user satisfaction and perceptions of success: “Do users perceive their searching activity as successful?” Asked in both surveys, the question’s responses allowed the Primo Management Group to compare respondents’ perceived success between the two interfaces. Results show a marked difference: While 86 percent of the MNCAT Classic respondents reported that they are “usually” or “very often” successful at finding what they are looking for, only 62 percent of the MNCAT Plus respondents reported the same perception of success. Respondents reported very similar rates of success regardless of school, type of affiliation, or student status. Figure 1. Perceptions of Success: MNCAT Plus and MNCAT Classic These results should be interpreted cautiously. Because MNCAT Plus is the libraries’ default catalog interface, MNCAT Classic users are a self-selecting group whose members make a conscious decision to bookmark or click the extra link to use the MNCAT Classic interface. One cannot assume that MNCAT users in general also would have an 86 percent perception of success were they to use MNCAT Classic; familiarity with the tool could play a part in MNCAT Classic users’ success. 14% 24% 44% 18% 4% 11% 32% 54% 0% 10% 20% 30% 40% 50% 60% Rarely Sometimes Usually Very often MNCAT Classic MNCAT Plus RESOURCE DISCOVERY: COMPARATIVE SURVEY RESULTS | HESSEL AND FRANSEN 27 Another possible factor in the reported difference in user success is the higher proportion of known-item searching—finding a book by title—occurring in MNCAT Classic. A user’s criteria for success differ when searching for a known item versus conducting a general topical search. It is easier for a searcher to determine that they have been successful in a situation where they are looking for a specific item. Some features of MNCAT Classic, such as the start-of-title and other browse indexes, are well suited to known-item searching and had no direct equivalent in MNCAT Plus, which defaults to relevance-ranked results. (Primo version 3 has implemented new features to enhance known-item searching.) Comments received from users suggest that several factors played a role. One MNCAT Classic respondent praised the “precision of the search...not just lots of random hits” and noted that MNCAT Classic supports a “[m]ore focused search since I usually already know the title or author.” In contrast, a MNCAT Plus respondent commented that the next-generation interface was “great for browsing topics when you do not have a specific title in mind.” This comment is consonant with the results from other usability testing done on next-generation catalogs. In "Next Generation Catalogs: What Do They Do and Why Should We Care?", Emanuel describes observed differences between topical and known-item searching: “During the testing, users were generally happy with the results when they searched for a broad term, but they were not happy with results for more specific searches because often they had to further limit to find what they wanted in the first screen of results.”4 A common characteristic of next-generation catalogs is that they return a large result set that can then be limited using facets. Training and experience may also explain some of the differences in success. MNCAT Plus also enables functionality associated with the Functional Requirements for Bibliographic Records (FRBR), which is intended to group items with the same core intellectual content in a way that is more intuitive to searchers. However, this feature is unfamiliar to traditional catalog searchers and requires an extra step to discover very specific known-items in Primo. One MNCAT Plus user expressed dissatisfaction and added, “I'm not sure if it's my lack of training/practice or that the system is not user-friendly.” In focus group analyses conducted in 2008, OCLC found that “when participants conducted general searches on a topic (i.e., searches for unknown items) that they expressed dissatisfaction when items unrelated to what they were looking for were returned in the results list. End users may not understand how to best craft an appropriate search strategy for topic searches.”5 How Often do Users Know the Title of the Item that They are Looking For? Users come to the library with different goals in mind. In “Chang's Browsing,” available in Theories of Information Behavior, Chang identified five general browsing themes,6 adapted to discovery by Carter.7 For the purposes of the survey, the Primo Management Group grouped those themes into two goals: finding an item when the title is known, and finding anything on a given topic. The Primo Management Group had heard concerns from faculty and staff that they have more difficulty finding an item when they know the title when using MNCAT Plus than they did with MNCAT Classic. The group was interested in knowing how often users search for known items. To explore this topic and its impact on perceptions of success, the surveys included two questions on known-item and topical searching. The survey results shown in table 3 indicate that a significantly higher proportion of MNCAT Classic respondents (30 percent plus 43 percent = 73 percent) than MNCAT Plus respondents (24 INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2012 28 percent plus 29 percent = 53 percent) were “very often” or “usually” searching for known items. It may be that users in search of known items have learned to go to MNCAT Classic rather than MNCAT Plus. Rarely Sometimes Usually Very often Total I already know the title of the item I am looking for MNCAT Classic 7% (11) 19% (29) 30% (46) 43% (66) 152 MNCAT Plus 15% (69) 33% (151) 24% (111) 29% (132) 463 I am looking for any resource relevant to my topic MNCAT Classic 14% (21) 32% (47) 20% (29) 34% (51) 148 MNCAT Plus 14% (62) 29% (133) 29% (133) 28% (127) 455 Table 3. Responses to “I already know the title of the item I am looking for” When the Primo Management Group considered how often researchers in different user roles searched for known items versus anything on a topic, clear patterns emerged as shown in figure 2. In the MNCAT Plus survey, only 34 percent of undergraduate MNCAT Plus searchers “usually” or “very often” search for a particular item, versus 74 percent of faculty. Conversely, 75 percent of undergraduate respondents “usually” or “very often” search for any resource relevant to a topic, versus 37 percent of faculty. Graduate student respondents showed interest in both kinds of use. If successful browsing by topic is best achieved using post-search filtering, it may help to explain differences between undergraduate students and faculty. The analysis of usability testing done on other next generation catalogs described in “Next Generation Catalogs: What Do They Do and Why Should We Care?” states that “users that did not have extensive searching skills were more likely to appreciate the search first, limit later approach, while faculty members were faster to get frustrated with this technique.”8 Results for all MNCAT Classic respondents showed a preference for known item searching, but undergraduate students still indicated that they search more for anything on the topic and less for known items than faculty respondents. No significant differences were identified by discipline. RESOURCE DISCOVERY: COMPARATIVE SURVEY RESULTS | HESSEL AND FRANSEN 29 Figure 2. Searching for a Known Item vs. Any Relevant Resource Some qualitative comments from survey takers suggest that respondents view the library interface as a place to go to find something already known to exist, e.g., “I never want to search by topic. Library catalogs are for looking up specific items.” However, with respect to discovering resources for a subject in general, both MNCAT Classic and MNCAT Plus respondents showed that they would also like to find items relevant to their topic (figure 2). There was no significant difference between MNCAT Classic and MNCAT Plus respondents on this question; in both environments, only 14 percent of the users said that they would “rarely” be interested in general results relevant to their topic. Perceptions of Success by Specific Characteristics For MNCAT Plus, the majority of respondents “somewhat agree” or “strongly agree” that items available online or in a particular collection are easy to find. One-third of the MNCAT Plus respondents had never tried to find an item in a particular format. Over 40 percent had never tried to find an item with a particular ISBN/ISSN. Interface features may be a factor here: ISBN/ISSN searching is not a choice in the MNCAT Plus drop down menu, so users may not know that they can do such a search. A higher percentage of MNCAT Classic respondents “strongly agree” that it is easy to find items by collection, available online, or in a particular format, than MNCAT Plus respondents. Figure 3 shows results based on particular characteristics. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2012 30 Figure 3. Perception of Success by Characteristic Although the surveys were primarily intended to gather reactions from end users, some interesting data emerged about usage by library staff. As demonstrated in figure 4, library staff respondents were much more likely to have performed the specific types of searches listed in this section than users generally, and reported a much higher rate of perceived success with MNCAT Classic. Figure 4. Perception of Success by Characteristic: Library Staff RESOURCE DISCOVERY: COMPARATIVE SURVEY RESULTS | HESSEL AND FRANSEN 31 Searching by Location: Local Collections and Other Resources In a large research institution with several physical library locations and many distinct collections, users need the ability to quickly narrow a search to a particular collection. But even the largest institution cannot collect everything a researcher might need. The Primo Management Group wondered not only whether users felt successful when they looked for an item in a particular collection but also wanted to explore whether users want to see items not owned by the institution as part of their search results. Finding items among the many library locations was not a problem for either MNCAT Plus or MNCAT Classic respondents: 72 percent either somewhat or strongly agreed that it is easy to find items in a particular collection using MNCAT. Furthermore, survey respondents of both interfaces agreed that they are interested in items no matter where the items are, which underlines the value of a service such as WorldCat; 73 percent of MNCAT Plus respondents and 78 percent of MNCAT Classic respondents expressed a preference for seeing items held by other libraries, knowing they could request items using an interlibrary loan service if necessary. Preferred Search Environments Three of the survey questions asked users about their preferred search environments for different searching needs:  When looking for a particular book  When looking for a particular journal article  When searching without a particular title in mind Each survey presented respondents with a list of choices and space to specify other sources not listed. Respondents were encouraged to mark as many sources as they regularly use. When searching for a specific book, users of the two catalog environments identified a number of other sources. The top five sources in each survey are listed in table 4. When I am looking for a specific book, I usually search (check all that apply): MNCAT Classic Respondents (Frequency) MNCAT Plus Respondents (Frequency) 1. MNCAT Classic (116) 1. MNCAT Plus (217) 2. WorldCat (50) 2. Google (165) 3. Amazon (50) 3. MNCAT Classic (163) 4. Google (49) 4. Amazon (160) 5. Google Books (31) 5. Google Books (108) Table 4. Search Environment for Books INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2012 32 Qualitative comments indicated that users like being able to connect to Amazon and Google Books in order to look at tables of contents and reviews. They also specifically mentioned Barnes and Noble, as well as other local libraries. These results show that MNCAT Plus respondents were more likely to also use MNCAT Classic than vice-versa. The data do not suggest why this would be the case, but familiarity with the older interface may play a role. MNCAT Classic respondents were more likely than MNCAT Plus users to return to their search environment when searching for a particular book (82 percent versus 53 percent). One MNCAT Plus respondent commented “I didn't know I could still get to MNCAT Classic.” When searching for a specific journal article, users of both systems chose “Other databases (JSTOR, PubMed, etc.)” above all the other choices. Even more respondents would likely have marked this choice if not for confusion over the term “Other databases.” Most of the comments mentioned specific databases, even when the respondent had not selected the “Other databases” choice. One user commented, “Most of these choices would be illogical. You don't list article indexes, that's where I go first.” Table 5 lists the five responses marked most often for each survey. When I am looking for a specific journal article, I usually search (check all that apply): MNCAT Classic Respondents (Frequency) MNCAT Plus Respondents (Frequency) 1. Other databases (JSTOR, PubMed, etc.) (92) 1. Other databases (JSTOR, PubMed, etc.) (232) 2. MNCAT Classic (53) 2. Google Scholar (131) 3. Google Scholar (40) 3. E-Journals List (130) 4. E-Journals List (34) 4. MNCAT Plus (110) 5. Google (29) 5. MNCAT Plus article search (101) Table 5. Search Environment for Articles. Qualitative comments from respondents indicated that interfaces would be more useful if they helped users find online journal articles. This raised some questions with regard to MNCAT Plus, which includes a tab labeled “Articles” for conducting federated article searches. However, MNCAT Plus respondents noted that they used the Plus “Articles” search almost as much as they did MNCAT Plus. Other Plus comments included: I tried to use this for journal articles but it only has some in the database I guess and when I did my search it only found books and no articles. I don't understand it. I tried this new one and it came up with wierd [sic] stuff in terms of articles. My professor said to give up and use the regular indexes because I wasn't getting what I needed to do the paper. It wasted my time. This desire for federated search coupled with the expressions of dissatisfaction with the existing federated search platform is consistent with the mixed opinions expressed in other studies, such as Sam Houston State University’s assessment of use of and satisfaction with the WebFeat RESOURCE DISCOVERY: COMPARATIVE SURVEY RESULTS | HESSEL AND FRANSEN 33 federated search tool. That study found “[f]ederated search use was highest among lower-level undergraduates, and both use and satisfaction declined as student classification rose.”9 The new search tools that contain preindexed articles, such as Primo Central, Summon, WorldCat Local, and EBSCO Discovery Service, may address the frustrations that more experienced searchers express regarding federated search technology. When researching a topic without a specific title in mind, “Google” and “Other databases” were nearly equal and ranked first for MNCAT Plus respondents, while “Other databases” ranked first for MNCAT Classic respondents. Table 6 lists the five responses marked most option for each survey. When I am researching a topic without a specific title in mind, I usually search (check all that apply): MNCAT Classic Respondents (Frequency) MNCAT Plus Respondents (Frequency) 1. Other databases (JSTOR, PubMed, etc.) (84) 1. Google (197) 2. MNCAT Classic (76) 2. Other databases (JSTOR, PubMed, etc.) (192) 3. Google (63) 3. Google Scholar (155) 4. Google Scholar (47) 4. MNCAT Plus (145) 5. WorldCat (32) 5. MNCAT Classic (101) Table 6. Search Environment for Topics Significant differences based on school affiliation were evident in the area of preferred search environments for topical research. For example, Institute of Technology respondents reported using Google much more often when researching without a specific title in mind than respondents in other areas. Evidence from the health sciences is limited in that only seven percent of respondents in total identified themselves as being from this area. However, these limited results show that health sciences respondents relied more on library databases than on Google. Respondents in the liberal arts relied more on MNCAT, in either version, than did respondents in the other fields. Desired Resource Types One feature of the Primo discovery interface is its ability to aggregate records from more than one source. University Libraries maintains several internal data sources that are not included in the catalog, and the possibility of including some of these in the MNCAT Plus catalog has been considered many times since Primo’s release. The Primo Management Group was interested to hear from users whether they would find three types of internal sources useful: research reports and preprints, online media, and archival finding aids. The group also asked users to mark “Online journal articles” if they would find article results helpful. The question did not specify whether journal articles would appear integrated with other search results in a MNCAT “Books” search or INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2012 34 in a separate search such as that already provided through a metasearch on the MNCAT Plus Articles tab. The surveys asked users what kinds of resources would make MNCAT more useful. The results for both MNCAT Plus and MNCAT Classic were similar and response counts for both surveys were ordered as shown in table 7. Respondents could mark more than one of the choices. I would find MNCAT more useful if it helped me find: MNCAT Classic Frequency MNCAT Plus Frequency Online journal articles 65 255 U of M research materials (e.g., research reports, preprints) 34 149 Online media (e.g., digital images, streaming audio/visual) 27 134 Archival finding aids 27 90 Table 7. Desired Resource Types The Primo Management Group noted that more MNCAT Plus respondents chose “Online Journal Articles” more frequently than the other categories even though the MNCAT Plus interface includes an “Articles” tab for federated searching. It is unclear whether the respondents were not seeing the “Articles” tab in MNCAT Plus because they would like to see search results integrated, or if they were using the “Articles” tab and were not satisfied with the results. Comments from respondents generally supported the inclusion of a wider range of resources in MNCAT. However, several respondents also expressed concerns about the trade-offs that might be involved in providing wider coverage. One user liked the idea of having the databases “all … in one place,” but added that “it would have to just give you the stuff that you need.” Several users cited the varying quality of the material discovered through library sources. One user supported the inclusion of articles “if it included GOOD articles and not the ones I got.” A MNCAT Classic respondent gave the variable quality of the material he or she had found through a database search as a reason for leaving the coverage of MNCAT as it is: “I use the best sources depending on my needs.” Another MNCAT Classic user expressed doubt that coverage of all disciplines was feasible. In commenting on the content of MNCAT, respondents also mentioned specific types of material that they wanted to see (e.g. archives of various countries), as well as difficulties with particular classes of material (“the confusing world of government documents”). One MNCAT Plus user related his or her interest in public domain items to a specific item of functionality that would enhance their discovery, namely a date sort. In general, the interest in University of Minnesota research material was fairly high. However, faculty members ranked University of Minnesota research materials last in terms of preference: Only twelve faculty respondents chose the option, out of sixty-one total faculty respondents. RESOURCE DISCOVERY: COMPARATIVE SURVEY RESULTS | HESSEL AND FRANSEN 35 CONCLUSIONS The data from two surveys, conducted concurrently in 2009 on a traditional OPAC (MNCAT Classic) and next-generation catalog (MNCAT Plus), point to differences in the use and perceptions of both systems. There appeared to be fairly strong “brand loyalty” with MNCAT Classic, given that this interface is no longer the default search for the libraries. Surveys for both systems suggest a perception of success that is lower than desirable and that there is room to improve the quality of the discovery experience. It is unclear from the data if the reported perceptions of success were the result of the systems not finding what the user wants, or if the systems did not contain what the user wanted to find. MNCAT Classic respondents were more likely to use WorldCat to find a specific book than MNCAT Plus respondents. MNCAT Plus respondents indicated a use of MNCAT Classic, but not vice versa. Both sets of surveys described use of Amazon and Google for discovery. MNCAT Plus respondents reported lower rates of success at finding known items than MNCAT Classic respondents. MNCAT Classic respondents were far more likely to have a specific title in mind that they wanted to obtain; half of the MNCAT Plus respondents reported having a specific title in mind. The team that examined the survey responses found that the data suggested several key attributes that should be present in the libraries discovery environment. Further discussion of the results and suggested attributes was conducted with library staff members in open sessions. Results also informed local work on improving discovery interfaces. The results suggested:  The environment should support multiple discovery tasks, including known-item searching and topical research.  Support for discovery activity should be provided to all primary constituent groups, noting the significant survey response by graduate student searchers.  Users want to discover materials that are not owned by the libraries, in addition to local holdings.  A discovery environment should make it easy for users to find and access resources in vendor-provided resources, such as JSTOR and PubMed. While the results of the 2009 surveys provided a valuable description of usage, the survey team recognized that methodological choices limit the usefulness in applying results to a larger population. The team also recognized that there were a number of questions yet unanswered. Some of these outstanding questions present opportunities for future research and suggest that a variety of formats might be useful, including surveys, focus groups, and targeted interviews.  To what extent do users expect to find integrated search results among different kinds of content, such as articles, databases, indexes, and even large scale data sets?  What general search strategies do users use to navigate the complex discovery environment that is available to them, and where are the failure points?  How much of the current environment requires training and how much is truly intuitive to users? INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2012 36  How can the University Libraries identify and serve users who did not complete the surveys?  How useful would users find targeted results based on a particular characteristic such as role, student status, or discipline? Since the surveys were conducted, the University Libraries upgraded to Primo version 3, which included features to address some of the concerns respondents identified in the surveys, such as known-item searching. Primo version 3 allows users to conduct a left-justified title search (“Title begins with…”), as well as sort by fields such as title and author. Once the new version has been in place long enough for users to develop some comfort with the interface, the Primo Management Group intends to resolve methodological issues and repeat its surveys, measuring users’ reactions against the baseline data set in the 2009 surveys. ACKNOWLEDGEMENTS We would like to thank the other members of the Primo Management Group, who helped to design and implement the surveys, as well as analyze and communicate the results: Chew Chiat Naun (chair), Susan Gangl, Connie Hendrick, Lois Hendrickson, Kristen Mastel, R. Arvid Nelsen, and Jeff Peterson. We also want to acknowledge the helpful feedback and guidance of the group’s sponsor, John Butler. REFERENCES 1 Tamar Sadeh, “User Experience in the Library: A Case Study.” New Library World 109, no. 1/2 (2008): 7–24. 2 Cody Hanson et al., Discoverability Phase 1 Final Report (Minneapolis: University of Minnesota, 2009), http://purl.umn.edu/48258/ (accessed Dec. 20, 2010). 3 Jina Choi Wakimoto, “Scope of the Library Catalog in Times of Transition.” Cataloging & Classification Quarterly 47, no. 5 (2009): 409–26. 4 Jenny Emanuel, “Next Generation Catalogs: What Do They Do and Why Should We Care?” Reference & User Services Quarterly 49, no. 2 (Winter, 2009): 117–20. 5 Karen Calhoun, Diane Cellentani, and OCLC, Online Catalogs : What Users and Librarians Want: An OCLC Report (Dublin, Ohio: OCLC, 2009). 6 Shan-ju Chang, “Chang's Browsing,” In Theories of Information Behavior, ed. Karen E. Fisher, Sandra Erdelez and Lynne McKechnie, 69-74 (Medford, N.J.: Information Today, 2005). 7 Judith Carter, “Discovery: What do You Mean by that?” Information Technology & Libraries 28, no. 4 (December 2009): 161–63. 8 Jenny Emanuel, “Next Generation Catalogs: What Do They Do and Why Should We Care?” Reference & User Services Quarterly 49, no. 2 (Winter, 2009): 117–20. 9 Abe Korah and Erin Dorris Cassidy. “Students and Federated Searching: A Survey of Use and Satisfaction,” Reference & User Services Quarterly 49, no. 4 (Summer 2010): 325–32. https://purl.umn.edu/48258 RESOURCE DISCOVERY: COMPARATIVE SURVEY RESULTS | HESSEL AND FRANSEN 37 APPENDIX A. MNCAT Classic Survey The library catalog is intended to help you find an item when you know its title, as well as suggest items that are relevant to a given topic. We’d like to know how often you use MNCAT Classic for these different purposes. 1. When I visit MNCAT Classic… Very often Usually Sometimes Rarely I already know the title of the item I am looking for     I am looking for any resource relevant to my topic     Many people use tools other than the library catalog to find books, articles, and other resources. For the different situations below, please tell us what other tools you find helpful. 2. When I am looking for a specific book, I usually search (check all that apply):  Amazon  MNCAT Classic  Other databases (JSTOR, PubMed, etc.)  Google  MNCAT Plus  WorldCat  Google Books  MNCAT Plus article search  Google Scholar  Libraries OneSearch Other (please specify) _______________________________________________________ 3. When I am looking for a specific journal article, I usually search (check all that apply):  Amazon  Google Books  MNCAT Plus article search  Citation Linker  Google Scholar  Libraries OneSearch  E-Journals List  MNCAT Classic  Other databases (JSTOR, PubMed, etc.)  Google  MNCAT Plus  WorldCat Other (please specify) ___________________________________________________ INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2012 38 4. When I am researching a topic without a specific title in mind, I usually search (check all that apply):  Amazon  Google Scholar  Libraries OneSearch  E-Journals List  MNCAT Classic  Other databases (JSTOR, PubMed, etc.)  Google  MNCAT Plus  WorldCat  Google Books  MNCAT Plus article search Other (please specify) ___________________________________________________ Now we’d like to know what you think of MNCAT Classic and what new features (if any) you’d like to see. 5. When I use MNCAT Classic Very often Usually Sometimes Rarely I succeed in finding what I’m looking for     6. It is easy to find the following kinds of items in MNCAT Classic Strongly agree Somewhat agree Somewhat disagree Strongly disagree I haven’t looked for this with MNCAT Classic An item that is available online      An item within a particular collection (e.g., Wilson Library, University Archives, etc.)      An item in a particular physical format (e.g., DVD, map, etc.)      An item with a specific ISBN or ISSN      RESOURCE DISCOVERY: COMPARATIVE SURVEY RESULTS | HESSEL AND FRANSEN 39 7. I would find MNCAT Classic more useful if it helped me find (check all that apply):  Online journal articles  Online media (e.g., digital images, streaming audio/visual)  Archival finding aids  U of M research material (e.g., research reports, preprints) Other (please specify) ___________________________________________________ 8. The WorldCat catalog allows you to search the contents of many library collections in addition to the University of Minnesota. Which of the following best describes your level of interest in this type of catalog?  Yes, I am interested in what other libraries have regardless of where they are, knowing I could request it through interlibrary loan if I want it  Yes, I am interested, but only if I can get the items from a nearby library  No, I am interested only in what is available at the University of Minnesota Libraries Please share anything you particularly like or dislike about MNCAT Classic. 9. What I like most about MNCAT Classic is: ___________________________________________ ___________________________________________________________________________________ ___________________________________________________________________________________ 10. What I like least about MNCAT Classic is: ___________________________________________ ___________________________________________________________________________________ ___________________________________________________________________________________ We want to understand how different groups of people use MNCAT Classic, as well as other tools, for finding information. Please answer the following questions to give us an idea of who you are. 11. How are you affiliated with the University of Minnesota?  Faculty  Graduate student  Undergraduate student  Staff (non-library) INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2012 40  Library staff  Community member 12. With which University of Minnesota college or school are you most closely affiliated?  Allied Health Programs  Food, Agricultural and Natural Resource Sciences  Pharmacy  Biological Sciences  Law School  Public Affairs  Continuing Education  Liberal Arts  Public Health  Dentistry  Libraries  Technology (engineering, physical sciences & mathematics)  Design  Management  Veterinary Medicine  Education & Human Development  Medical School  None of these  Extension  Nursing 13. We are interested in learning more about how you find the materials you need. If you would be willing to be contacted for further surveys or focus groups, please provide your e-mail address: _______________________________________________ RESOURCE DISCOVERY: COMPARATIVE SURVEY RESULTS | HESSEL AND FRANSEN 41 APPENDIX B. MNCAT Plus Survey The library catalog is intended to help you find an item when you know its title, as well as suggest items that are relevant to a given topic. We’d like to know how often you use MNCAT Plus for these different purposes. 1. When I visit MNCAT Plus… Very often Usually Sometimes Rarely I already know the title of the item I am looking for     I am looking for any resource relevant to my topic     Many people use tools other than the library catalog to find books, articles, and other resources. For the different situations below, please tell us what other tools you find helpful. 2. When I am looking for a specific book, I usually search (check all that apply):  Amazon  MNCAT Classic  Other databases (JSTOR, PubMed, etc.)  Google  MNCAT Plus  WorldCat  Google Books  MNCAT Plus article search  Google Scholar  Libraries OneSearch Other (please specify) _______________________________________________________ 3. When I am looking for a specific journal article, I usually search (check all that apply):  Amazon  Google Books  MNCAT Plus article search  Citation Linker  Google Scholar  Libraries OneSearch  E-Journals List  MNCAT Classic  Other databases (JSTOR, PubMed, etc.)  Google  MNCAT Plus  WorldCat Other (please specify) ___________________________________________________ INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2012 42 4. When I am researching a topic without a specific title in mind, I usually search (check all that apply):  Amazon  Google Scholar  Libraries OneSearch  E-Journals List  MNCAT Classic  Other databases (JSTOR, PubMed, etc.)  Google  MNCAT Plus  WorldCat  Google Books  MNCAT Plus article search Other (please specify) ___________________________________________________ Now we’d like to know what you think of MNCAT Plus and what new features (if any) you’d like to see. 5. When I use MNCAT Plus Very often Usually Sometimes Rarely I succeed in finding what I’m looking for     6. It is easy to find the following kinds of items in MNCAT Plus Strongly agree Somewhat agree Somewhat disagree Strongly disagree I haven’t looked for this with MNCAT Plus An item that is available online      An item within a particular collection (e.g., Wilson Library, University Archives, etc.)      An item in a particular physical format (e.g., DVD, map, etc.)      An item with a specific ISBN or ISSN      RESOURCE DISCOVERY: COMPARATIVE SURVEY RESULTS | HESSEL AND FRANSEN 43 7. I would find MNCAT Plus more useful if it helped me find (check all that apply):  Online journal articles  Online media (e.g., digital images, streaming audio/visual)  Archival finding aids  U of M research material (e.g., research reports, preprints) Other (please specify) ___________________________________________________ 8. The WorldCat catalog allows you to search the contents of many library collections in addition to the University of Minnesota. Which of the following best describes your level of interest in this type of catalog?  Yes, I am interested in what other libraries have regardless of where they are, knowing I could request it through interlibrary loan if I want it  Yes, I am interested, but only if I can get the items from a nearby library  No, I am interested only in what is available at the University of Minnesota Libraries Please share anything you particularly like or dislike about MNCAT Plus. 9. What I like most about MNCAT Plus is: ___________________________________________ ___________________________________________________________________________________ ___________________________________________________________________________________ 10. What I like least about MNCAT Plus is: ___________________________________________ ___________________________________________________________________________________ ___________________________________________________________________________________ We want to understand how different groups of people use MNCAT Plus, as well as other tools, for finding information. Please answer the following questions to give us an idea of who you are. 11. How are you affiliated with the University of Minnesota?  Faculty  Graduate student  Undergraduate student  Staff (non-library) INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2012 44  Library staff  Community member 12. With which University of Minnesota college or school are you most closely affiliated?  Allied Health Programs  Food, Agricultural and Natural Resource Sciences  Pharmacy  Biological Sciences  Law School  Public Affairs  Continuing Education  Liberal Arts  Public Health  Dentistry  Libraries  Technology (engineering, physical sciences & mathematics)  Design  Management  Veterinary Medicine  Education & Human Development  Medical School  None of these  Extension  Nursing 13. We are interested in learning more about how you find the materials you need. If you would be willing to be contacted for further surveys or focus groups, please provide your e-mail address: _______________________________________________ 2166 ---- Mobile Technologies & Academics: Do Students Use Mobile Technologies in Their Academic Lives and are Librarians Ready to Meet this Challenge? Angela Dresselhaus and Flora Shrode MOBILE TECHNOLOGIES & ACADEMICS | DRESSELHAUS AND SHRODE 82 ABSTRACT In this paper we report on two surveys and offer an introductory plan that librarians may use to begin implementing mobile access to selected library databases and services. Results from the first survey helped us to gain insight into where students at Utah State University (USU) in Logan, Utah, stand regarding their use of mobile devices for academic activities in general and their desire for access to library services and resources in particular. A second survey, conducted with librarians, gave us an idea of the extent to which responding libraries offer mobile access, their future plans for mobile implementation, and their opinions about whether and how mobile technologies may be useful to library patrons. In the last segment of the paper, we outline steps librarians can take as they “go mobile.” PURPOSE OF THE STUDY Similar to colleagues in all types of libraries around the world, librarians at Utah State University (USU) want to take advantage of opportunities to provide information resources and library services via mobile devices. Observing growing popularity of mobile, Internet- capable telephones and computing devices, USU librarians assume that at least some users would welcome the ability to use such devices to connect to library resources. To find out what mobile services or vendors’ applications USU students would be likely to use, we conducted a needs assessment. The lessons learned will provide important guidance to management decisions about how librarians and staff members devote time and effort toward implementing and developing mobile access. We conducted a survey of USU’s students (approximately 25,000 undergraduates and graduates) to determine the degree of handheld device usage in the student population, the purposes for which students use such devices, and students’ interests in mobile access to the library. In addition, we surveyed librarians to learn about libraries’ current and future plans to launch mobile services. This survey was administered to an opportunistic population Angela Dresselhaus (aldresselhaus@gmail.com) was Electronic Resources Librarian, Flora Shrode (flora.shrode@usu.edu) is Head, Reference & Instruction Services, Utah State University, Logan, Utah. mailto:aldresselhaus@gmail.com mailto:flora.shrode@usu.edu INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2012 83 comprised of subscribers to seven e-mail lists whom we invited to offer feedback. Our goal was to develop an action plan that would be responsive to students’ interests. At the same time, we aim to take advantage of the growing awareness of and demand for mobile access and to balance workloads among the library information technology professionals who would implement these services. USU is Utah’s land-grant university and the Merrill-Cazier Library is its primary library facility on the home campus in Logan, Utah. While USU has had satellite branches for some time, a growing emphasis on expanding online and distance education courses and degree programs has resulted in a considerable growth of its distance education programs in the last five years. Mobile access to university resources makes especially good sense for the distance education population and for students who may reside close to the main USU campus but who also enroll in online courses. The library has an information technology staff of 4.5 FTE professionals who support the library catalog, maintain roughly 250 computer workstations in cooperation with the director of campus student computer labs, and oversee the computing needs of library staff and faculty members. LITERATURE REVIEW Mobile access to library resources is not a new concept; in fact, the first project designed to deliver handheld mobile access to library patrons began eighteen years ago, in 1993, the time of mainframe computers and Gopher. The “Library Without A Roof” project partners included the University of Southern Alabama, AT&T, BellSouth Cellular, and Notable Technologies, Inc. 1 Library patrons at participating institutions could search and read electronic texts on their personal digital assistants (PDAs) and search the library catalog while browsing in physical collections. As reflected in the literature, interest in PDA applications for libraries started to pick up around the turn of the twenty-first century. Medical librarians were among the first to widely recognize the potential impact of mobile technologies on librarianship. A 2002 article in the Journal of the Medical Library Association and a monograph by Colleen Cuddy are among the first publications that focus on PDAs. 2 A quick perusal of the medical category on the iTunes store reveals several professional applications, ranging from New England Journal of Medicine tools to remote patient vital-sign monitors. As an example of the depth of mobile-device penetration in the medical field, in 2010 the Food and Drug Administration approved the marketing of the AirStrip suite of mobile-device applications. These apps work in conjunction with vital-sign monitoring equipment to allow instant remote access to a patient’s vital signs. 3 These examples illustrate the increasing pervasiveness of mobile technology in everyday life. Mobile learning in academic areas outside of medicine has increased recently as more universities have adopted mobile technologies. 4 A sampling of current projects at academic MOBILE TECHNOLOGIES & ACADEMICS | DRESSELHAUS AND SHRODE 84 institutions is provided in the 2010 Horizon Report. 5 According to the 2010 Educause Center for Applied Research (ECAR) study, 49 percent of undergraduates consider themselves mainstream adopters of technology. 6 Locally, Utah State University students have adopted smartphones at the rate of 39.3 percent and other handheld Internet devices at the rate of 31.5 percent. These statistics indicate that skills are increasing and the technological landscape is changing quickly. The ECAR study reports that student computing is rapidly moving to the cloud, another indication of the rapid change in the use of technology. “USB may one day go the way of the eight-track tape as laptops, netbooks, smartphones and other portable devices enable students to access their content from anywhere. They may or may not be aware of it, but many of today’s undergraduates are already cloud-savvy information consumers, and higher education is slowly but surely following their lead.” 7 Similarly, USU students show interest in adopting new technology. While USU students are less likely to own mobile devices, 70.2 percent of respondents indicated that they would be likely or very likely to use library resources on smartphones if they owned capable devices and if the library provided easy access to materials. Bridges, Gascho Rempel, and Griggs published a comprehensive article, “Making the case for a fully mobile library web site: from floor maps to the catalog,” detailing their efforts to implement mobile services on the Oregon State University campus. 8 Their paper highlights the popularity of mobile phones and smartphones/web-enabled phones. The authors discuss mobile phone use, library mobile websites, and mobile catalogs, and they describe the process they used to develop their mobile library site. They note that mobile services will certainly be expected in the coming years, and we have learned that USU students share this expectation. SURVEY RESEARCH In recent years librarians have conducted surveys on mobile technology in libraries. In a 2007 study, Cummings, Merrill, and Borrelli surveyed library patrons to find out if they are likely to access the library catalog via small-screen devices. 9 They discovered that 45.2 percent of respondents, regardless of whether they owned a device, would access the library catalog on a small-screen device. Mobile access to the library catalog was the most requested service in the USU student survey, although it accounted for only 16percent of the responses. Cummings, et al. also discovered that the most frequent users of the catalog were also the least willing to access the catalog via mobile devices, an interesting observation that merits further research. Their survey was completed in June of 2007, just five months after the January 9th release of the original iPhone. The release of the iPhone is significant as the point where the market demographics of mobile device users began to shift to people under thirty, the primary age group of undergraduate students. 10 Librarians Wilson and McCarthy at Ryerson University conducted two surveys to measure INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2012 85 the usage of their catalog’s feature to send a call number via text or email (initiated in 2007) and their “fledgling mobile web site” (launched in 2008). 11 The first survey indicated that 20 percent of respondents owned Internet-capable cell phones, and over half said they intended to buy this type of phone when their current contracts expired. The survey respondents indicated they wanted the following services: “booking group study rooms, checking hours and schedules, checking their borrower records and checking the catalogue.” 12 The second survey was conducted a year after the library had implemented a group study room reservation system, catalog and borrower record services, and a computer/laptop availability service. Results of the follow-up survey show a drastic increase in ownership of Internet- capable cell phones (from 20% to 65%). Respondents desired two new services: article searches and e-book access. Wilson and McCarthy found that very few library patrons were accessing the mobile services, but “60% of the survey respondents were unaware that the library provided mobile services.” 13 The authors conclude that advertising should be a central part of mobile technology implementation. They also detail how the library contributed expertise and leadership to their campus-wide mobile initiatives. Seeholzer and Salem conducted a series of focus groups in the spring of 2009 to determine the extent of mobile device use among students at Kent State University. 14 Notable among their findings are that students are willing to conduct research with mobile devices, and they desire to have a feature-rich interactive experience via handheld devices. Students expressed interest in customizing interactions with the library’s mobile site and completing common tasks such as placing holds or renewing library materials. NATIONWIDE SURVEY OF LIBRARIANS We asked colleagues who subscribe to e-mail distribution lists to respond to a survey about their libraries’ implementation of mobile applications for access to library collections and services. Invitations to take the survey were sent to seven lists (ACRL Science & Technology Section, ERIL, Information Literacy Instruction, Liblicense-L, NASIG, Ref-L, and Serialist), and 289 librarians and library staff members responded to the survey. The population of subscribers to the e-mail lists we used to solicit survey responses is dynamic and includes librarians and staff who work in academic and other types of settings. While our findings cannot be generalized in a statistically reliable manner, we nonetheless believe that the survey responses merit thorough analysis. We chose to conduct two surveys to avoid some of the problems we noted in a 2007 study conducted by Todd Spires. 15 Spires’ survey questions focused on librarians’ perceptions rather than on empirical data. We developed separate surveys for librarians and students in hopes of avoiding problems that could arise from basing assumptions on perceived behavior or from the complexity of interpreting and generalizing from perceptions. A survey of library patrons should provide more accurate insight into the ways that patrons are using the library MOBILE TECHNOLOGIES & ACADEMICS | DRESSELHAUS AND SHRODE 86 via handheld devices. In the libraries that currently provide mobile access to resources, the library catalog is most commonly offered. Article databases and assistance from a librarian tie as the second most frequently provided services. Figure 1 shows a snapshot of the resources and services librarians reported that they provide. We also asked how long libraries have provided mobile access, and the time periods ranged from a few weeks to more than ten years. Five librarians indicated that they have provided mobile access for six to ten years, and it is possible that these respondents may work in medical or health science libraries, as our literature review indicated that access to medical information and journal articles via PDAs has been a reality for several years. Figure 1. Librarians’ Responses: Does Your Library Provide Mobile Access to the Following Library Resources? Librarians were also asked what services and resources they believe libraries should provide via mobile devices. Of one hundred seventy-eight responses, 71 percent indicated that “everything” or a variety of library resources should be made available. A few of the more interesting suggestions include a library café webcam (similar to a popular link from North Carolina State University), locker reservations, a virtual suggestion box, alerts about database trials, an app that lists new books, and using iPads or other mobile devices for roving reference. Roving reference with tablet PCs was evaluated by Smith and Pietraszewski at the west campus branch library of Texas A&M. 16 As tablet computers become increasingly popular with the release of the iPad and other tablets, 17 roving reference should be reconsidered. Smith and Pietraszewski note that "the tablet PC proved to be an extremely useful device as well as a novelty that drew student interest (anything to make reference librarians look cool!)" 18 Using the latest technology in libraries will help raise awareness that libraries are relevant and adapting to changing user preferences. We asked librarians to indicate who had responsibility for implementing mobile access in their library. The 184 responses are summarized here:  63 percent answered that a library systems or computing professional does this work;  26.1 percent indicated that the electronic resources librarian has this role;  17.9 percent rely on an information professional from outside of the library;  22.8 percent chose “other,” and we unfortunately did not offer a space for comments where survey respondents could tell us the job title of the person in their library who implements mobile access. The results from our sample of librarians are consistent with a larger study by the Library Journal. 19 The LJ study found that the majority of academic libraries have implemented or are INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2012 87 planning to implement mobile technologies. STUDENT SURVEY In January of 2011 we sent out a thirteen-question survey to students (questions are available in appendix A). USU’s student headcount is 25,767, and 3,074 students responded, representing 11.9 percent of the student population. We asked students to identify with colleges so that we could evaluate the survey sample against the enrollment at USU. The rate of response by college clustered between 12–19 percent with the lowest response rate (8 percent) from the College of Education. The highest response rate came from the College of Humanities and Social Sciences. We examined survey response rates from USU undergraduate and graduate populations; 54 percent of undergraduates and 50 percent of graduate students use mobile technology for academic purposes. We believe that our sample is sufficiently representative of the overall population of USU. Figure 2. Student Response Rates by College In order to understand the context of survey questions that specifically address mobile access, we asked students how often they used library electronic resources. The majority of students used electronic books, the library catalog, and electronic journals/articles a few times each semester. Only 34.4 percent of students never use electronic books, 19.6 percent never use the library catalog, and 17.6 percent never use electronic journals/articles. We made comparisons between disciplines and found no significant difference in electronic resource use between fields in the sciences and those in humanities. Further data will be collected in fall 2011 about use of print and electronic materials. MOBILE TECHNOLOGIES & ACADEMICS | DRESSELHAUS AND SHRODE 88 Figure 3. Electronic Resource Use Among Students Students were asked how often they use a variety of handheld devices. We decided to emphasize access over ownership in order to allow for a variety of situations. Responses show that 39.3 percent of our students use a smartphone with Internet access on a daily basis. Another 31.5 percent of students use other handheld devices like an iPod touch on a daily basis. Very few students use iPads or e-book readers, with 3.9 percent and 5.4 percent indicating daily use, respectively. We view the "Other handheld device" category as an important segment of the mobile technology market because of the lower cost barrier, since such devices do not require a subscription to a data plan. The ECAR study also noted the possibility of cost factors influencing the decision of some students not to access the Internet via a handheld device. 20 INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2012 89 Figure 4. Mobile Device Usage Students were asked if they use their mobile device or phone for academic purposes (e.g., Blackboard, electronic course reserves, etc.). This question was intentionally worded broadly in order to gather general information. We used skip logic to direct respondents to different paths through the survey based on their response to earlier questions. In response to a question about how students use their mobile devices, 54 percent of respondents indicated that they use their mobile devices for academic purposes. We analyzed the results by discipline and noted a few variances. Among students responding from the School of Business, 63 percent said that they use their mobile device for academic purposes, and 59 percent of engineering students use their devices for school work. The respondents from the other colleges reported use under 50 percent, most likely because of more limited adoption of mobile technology by USU faculty in those fields or lack of personal funds (or unwillingness to spend) to acquire devices and data plans. The 2010 ECAR report also noted higher exposure to technology in these fields, indicating that the situation at USU is in line with results from a national study. 21 MOBILE TECHNOLOGIES & ACADEMICS | DRESSELHAUS AND SHRODE 90 Table 1. Device use for Academic Purposes by College We asked the students, “If library resources were easily accessible on your mobile devices, and if you had such a device, how likely would you be to use any of the following for assignments or research?” Responses to this question allowed us to gauge interest without concerns about cost of technology or the current state of mobile readiness in our library. Among the survey respondents, 70.2 percent are likely or very likely to use resources on a smartphone; 46.9 percent are likely or very likely to use resources on an iPad; 45.9 percent are likely or very likely to use resources on an e-book reader; 63.2 percent are likely or very likely to use resources on other devices. We included an option for respondents to select “not applicable” as distinct from “not likely” to allow for those students who may welcome use of a mobile device but who may currently use a device different from the types we specified. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2012 91 Figure 5. Likelihood of Using Library Resources on Mobile Device if Easily Available We are unsure how to account for the dramatic difference in interest between smartphone and iPad usage. Survey responses indicated that only a small number of students have access to an iPad, and it is possible that students have had little opportunity to see their classmates or others use iPads in an academic setting. Students were asked in a free-text question to list the services the library should offer. The comments were varied and often used language different from the vocabulary that librarians typically use. In order to gain an understanding of trends and to standardize the language, we coded the survey comments. After coding, trends began to emerge. Access to the library catalog was mentioned by 16 percent of respondents. Mobile services in general were specified by 11 percent of survey respondents, 10 percent wanted articles, and 9 percent wanted to reserve study rooms on their mobile device. The phrase “mobile services” represents a catch-all tag designated for comments that indicated that a student desired a variety of services or all services that are possible. For example, only 9 percent of respondents indicated they had used text to contact the library and 15percent had used instant messaging. Several students indicated they might have used these services but did not know they were available, indicating a need for advertising. While we learned much MOBILE TECHNOLOGIES & ACADEMICS | DRESSELHAUS AND SHRODE 92 about students’ desires for mobile services from this important subset of comments in response to the free-text question, they did not prove especially useful to guide librarians’ plans for the next stages of implementing mobile technology. Figure 6. Services Requested by Students As is common at many institutions, funding at USU is limited and any development in the area of mobile access implementation must be strategic. Our survey indicated that USU students are using mobile devices for their academic work and would like to further integrate library resources into their mobile routine. The next section of this paper outlines the steps we are taking toward mobile implementation. Going Mobile The USU Library joins many other academic libraries in the beginning stages of implementing mobile technologies. Survey responses from students indicate that they use mobile devices for academic purposes, and until options to use the library with such devices are available and advertised, we will not have a clear understanding of students’ preferences. Klatt's article, “Going Mobile: Free and Easy,” 22 outlines a way to get started with mobile services with small investments of time and money. Articles by Griggs, 23 Back, 24 and West, 25 and books by Green, et al. 26 and Hanson 27 also provide guidance in this area. Here we offer suggestions to establish an implementation team, conduct an environmental scan, outline steps to begin the process, and shed light on advertising, assessment, and policy issues. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2012 93 Implementation Team For a library seeking to provide mobile access to online resources, a diverse and talented implementation team is important. Public services personnel in an academic library staff are on the front lines and often field students’ questions. They may also have the opportunity to observe how students are using mobile devices in the library. If librarians track reference interactions, they may find evidence that students are attempting to use their mobile devices to access library services. The electronic resources/collections specialist will also play a key role in mobile development. These specialists are often in contact with vendors, and their advocacy is important in encouraging mobile web development in the vendor community. A web site coordinator interested in mobile services and knowledgeable in current web standards will bring essential talent to the team. Arguably, a mobile-optimized web site should become a standard level of service. Web sites that are optimized or adapted specifically for mobile access are device agnostic and do not require advanced knowledge of smart phone operating systems. Therefore existing web development staff can apply their current skill set to expand into mobile web design. In order to launch advanced interactive access to library resources, a programmer who is interested in developing mobile apps on a number of platforms is needed. Device-specific applications allow for the use of phone features such as GPS and orientation sensing via an accelerometer and provide the basis for augmented reality technologies. Environmental Scan Librarians can learn about mobile usage in their community by gathering information to guide future development. At USU we interpret the numbers of students who use mobile devices for academic purposes as justification for implementing mobile library access, but we have not set a benchmark for a degree of interest that would trigger more development. Some of the mobile implementations described at the end of this paper required minimal time or were investigated because of the electronic resources librarian’s interest for their relevance to her role as music subject librarian. In the survey we administered to students, we considered it important to include a wide range of devices, including iPod touches and similar devices that have many of the same possibilities for academic use as smartphones but which do not require a monthly contract. Laptops are also considered a mobile technology, and while we did not emphasize this class of devices, some student comments referred specifically to laptop computers. We will monitor use of the mobile applications that we implement and likely conduct a follow-up survey to assess students’ satisfaction and to find out if there are other services they would like for the library to provide. While librarians may gather useful information from a user study, there are other ways to determine if students are, in fact, using mobile devices in the library. One approach is to review logs of reference questions to determine if students are inquiring about access to library resources via mobile devices. Recently, a few mobile-related questions have surfaced MOBILE TECHNOLOGIES & ACADEMICS | DRESSELHAUS AND SHRODE 94 at USU in the LibStats program used to track reference interactions. This is also an area where training reference staff to recognize and record questions about mobile access could be helpful to detect demand in the library’s community. If vendors provide statistics about use of their products from mobile devices, this information could also contribute to assessing need. Finally, in libraries that use VPN or other off-campus authentication methods, consulting with IT support staff to see if they field questions on setting up remote access on smartphones or other devices may factor into decisions regarding mobile access. The USU Information Technology website provides a knowledgebase that includes entries on a variety of mobile device queries. This indicates to librarians that people in the university community are using their mobile devices for academic functions. Before we conducted the survey of USU students, we knew little about the exact nature of their mobile use. Getting Started After identifying the needs on campus, the next step is to create a plan for mobile implementation. An important aspect of anticipating the needs of a library’s user population is to understand the likely use scenarios, goals, tasks, and context as outlined in “Library/mobile: Tips on Designing and Developing Mobile Web Sites.” 28 Building on services that incorporate tasks that people already perform in non-academic contexts provides a logical bridge for those who are familiar with everyday use of a mobile device to recognize how such devices can serve academic purposes. Gathering information from each vendor that supplies content to the library is an important early step in planning. This information can serve as the basis of a mobile web implementation plan and, in the case of EBSCO, creating a profile is necessary in order to allow access to a mobile-formatted platform. At USU our online catalog provider has developed an application for Apple's iOS platform. If a library’s catalog vendor does not offer a dedicated application or mobile site, Samuel Liston’s comparisons of three major online catalogs on three popular mobile devices is helpful in gaining an understanding of how OPACs display on smartphones. His article also outlines a procedure for testing OPACs and usability. 29 At USU we can also take advantage of Serials Solutions’ mobile-optimized search screen and a variety of applications provided by other vendors. Jensen noted that librarians should not rely solely on vendor-created applications due to vendors’ tendency to develop applications that are usable by only a segment of the overall mobile device user population. 30 He adds that libraries should also avoid developing applications for limited platforms. In addition, Jensen provides a simple step-by-step process for converting articles retrieved from a vendor database to a format that can be downloaded from electronic course reserves and read on a variety of handheld devices. While using vendor-developed applications is an important strategy, most libraries will find that developing a mobile-compatible library website is necessary. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2012 95 Mobile website development can be accomplished in a variety of ways. At USU we plan to offer a version of our regular website by employing cascading style sheets (CSS). This method is described in the paper by Bridges, et al., 31 and standard guidelines can be found in the Mobile Web Best Practices 1.0. 32 This method will allow the content to be reformatted at the point of need for a variety of platforms. Results from the USU student survey indicate a desire to be able to use a mobile device for access to the library catalog, to use services like reference assistance, find articles, and make study room reservations. The library plans to include hours and location information, access to existing reference chat and text features, and links to databases with mobile friendly websites or vendor-created applications in addition to the resources requested by students. We are still unsure of the best way to provide links to applications and how to explain the various authentication methods required by each vendor. While VPN and EZproxy are possible methods to authenticate via mobile devices, vendors are content at the moment to allow students to access their resources by setting up an account that is based on an authorized e-mail domain or through a user account created on the non-mobile version of the resource. In a few cases at USU, mobile applications from vendors allow access to categories of users such as alumni because they have a usu.edu e-mail address, although the library does not typically include these patrons in our authorized remote user group. Advertising, Assessment, and Policy Creating a mobile website and offering mobile services are only the beginning of the effort to provide access to library materials for mobile users. As Wilson and McCarthy found, advertising is essential; 33 students won’t use a service they don’t know about. Crafting a marketing plan with both online and print materials is essential. Educating library staff members, especially those on the public services front line, is an essential part of promoting mobile services. Assessment strategies must be developed in order to focus development strategically. Periodic surveys and focus groups can inform future development of mobile services and gauge the impact of currently offered services. Librarians should encourage vendors to provide usage data for their mobile portals or applications, and libraries can track use data from their own information technology departments. Implementation of mobile web services creates the need to develop new policies and to educate staff. Privacy concerns and the complexities of digital rights management have the potential to transform the role of the library and its policies. 34 Patrons will need to be aware that the library has less control over maintaining privacy when materials are accessed via third-party mobile applications. Libraries will need to consider how new developments in pricing models may affect expanding mobile access; one example is HarperCollins’ announcement in early 2011 about a policy requiring libraries to repurchase individual e- MOBILE TECHNOLOGIES & ACADEMICS | DRESSELHAUS AND SHRODE 96 book titles after a cap on check-outs is reached. 35 Librarians’ desire to offer reference services or other assistance via mobile devices follows naturally from their long-standing efforts to enable patrons to ask questions via e-mail, chat, instant messaging, or SMS text. Instant messaging, chat, and text lend themselves to mobile access because they are designed for the relatively short exchange that people typically use when communicating with a handheld device. Offering reference services using SMS text and chat in particular are relatively easy for libraries to employ because there are many free services to support them. In some cases, a systems administrator or IT expert may be helpful in navigating the set-up of chat and text services and to integrate them so that, for example, when a text message arrives during a time when no one is monitoring the service, a voicemail message automatically appears in library’s e-mail account. Librarians can find an enormous amount of advice on the web and in the literature about how to begin offering mobile- friendly reference, how to expand the virtual reference services they currently provide, and how to choose among free and fee-based services for their library’s needs and budget. Two efficient places to begin are Cody Hanson’s special issue of Library Technology Reports, which provides a thorough overview of mobile devices and their capabilities and straightforward suggestions for planning and implementation, and M-Libraries, a section of Library Success: a Best Practices Wiki. 36 CONCLUSION In light of trends toward more widespread use of mobile computing devices and smartphones, it makes sense for libraries to provide access to their collections and services in ways that work well with mobile devices. This case study presents the situation at the Merrill-Cazier Library at Utah State University, where students who responded to a survey indicate they are very interested in mobile access, even if they have not yet purchased a smartphone or find data plans to be too expensive at this point. As is only reasonable for any library, at USU we have begun by implementing mobile applications that are available from vendors of our online catalog and databases because these require minimal effort and no additional cost. We present ideas for establishing an implementation team and advice for academic libraries who wish to “go mobile.” We aim to have a concrete plan for the work that will be required to optimize the library’s website for mobile access by the fall of 2011. A significant step is hiring a digital services librarian to work closely with the webmaster, electronic resources librarian, and others interested in promoting access to resources and services via mobile devices. Our vision is to be on track to offer an augmented-reality experience to our patrons as the 2010 Horizon Report indicates will be an important trend in the next two to three years. We aim to create an environment in which students can use their mobile device to gain entry to a new layer of digital information, enhancing their experience in the physical library. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2012 97 REFERENCES 1. Clifton Dale Foster, “PDAs and the Library Without a Roof,” Journal of Computing in Higher Education 7, no. 1 (1995): 85–93. 2. Russell Smith, “Adapting a New Technology to the Academic Medical Library: Personal Digital Assistants,” Journal of the Medical Library Association 90, no. 1 (2002): 93–94; Colleen Cuddy, Using PDAs in Libraries: A How-to-Do-It Manual (New York: Neal-Schuman Publishers, 2005). 3. Andrea Jackson, “Wireless Technology Poised to Transform Health Care,” Rady Business Journal 3, no. 1 (2010): 24–26. 4. Alan W. Aldrich, “Universities and Libraries Move to the Mobile Web,” EDUCAUSE Quarterly 33, no. 2 (2010), www.educause.edu/EDUCAUSE+Quarterly/EDUCAUSEQuarterlyMagazineVolum/Univers itiesandLibrariesMoveto/206531 (accessed Mar. 30, 2011). 5. Larry Johnson, Alan Levine, R. Smith, and S. Stone, The 2010 Horizon Report (Austin, TX: The New Media Consortium, 2010), www.nmc.org/pdf/2010-Horizon-Report.pdf (accessed Mar. 31, 2011). 6. Shannon D. Smith and Judith Borreson Caruso, with an introduction by Joshua Kim, The ECAR Study of Undergraduate Students and Information Technology, 2010 (Research Study, Vol. 6) (Boulder, CO: EDUCAUSE Center for Applied Research, 2010), www.educause.edu/ecar (accessed Mar. 31, 2011). 7. Smith and Caruso, The ECAR Study of Undergraduate Students and Information Technology, 2010. 8. Laurie Bridges et al., “Making the Case for a Fully Mobile Library Web Site: From Floor Maps to the Catalog,” Reference Services Review 38, no. 2 (2010): 309–20. 9. Joel Cummings, Alex Merrill, and Steve Borrelli, “The Use of Handheld Mobile Devices: Their Impact and Implications for Library Services,” Library Hi Tech 28, no. 1 (2009): 22– 40. 10. Rubicon Consulting, The Apple iPhone: Success and Challenges for the Mobile Industry (Los Gatos, CA: Rubicon Consulting, 2008), http://rubiconconsulting.com/downloads/whitepapers/Rubicon-iPhone_User_Survey.pdf (accessed Mar. 31, 2011). 11. Sally Wilson and Graham McCarthy, “The Mobile University: From the Library to the Campus,” Reference Services Review 38, no. 2 (2010): 215. http://www.educause.edu/EDUCAUSE%2BQuarterly/EDUCAUSEQuarterlyMagazineVolum/UniversitiesandLibrariesMoveto/206531 http://www.educause.edu/EDUCAUSE%2BQuarterly/EDUCAUSEQuarterlyMagazineVolum/UniversitiesandLibrariesMoveto/206531 http://www.educause.edu/EDUCAUSE%2BQuarterly/EDUCAUSEQuarterlyMagazineVolum/UniversitiesandLibrariesMoveto/206531 file:///C:/Users/GERRITYR/Desktop/ITAL%2031n2_PROOFREAD/www.nmc.org/pdf/2010-Horizon-Report.pdf file:///C:/Users/GERRITYR/Desktop/ITAL%2031n2_PROOFREAD/www.nmc.org/pdf/2010-Horizon-Report.pdf file:///C:/Users/GERRITYR/Desktop/ITAL%2031n2_PROOFREAD/www.educause.edu/ecar http://rubiconconsulting.com/downloads/whitepapers/Rubicon-iPhone_User_Survey.pdf http://rubiconconsulting.com/downloads/whitepapers/Rubicon-iPhone_User_Survey.pdf MOBILE TECHNOLOGIES & ACADEMICS | DRESSELHAUS AND SHRODE 98 12. Ibid., 216. 13. Ibid., 223. 14. Jamie Seeholzer and Joseph A. Salem, “Library on the Go: A Focus Group Study of the Mobile Web and the Academic Library,” College and Research Libraries 72, no. 1 (2011): 9– 20. 15. Todd Spires, “Handheld Librarians: A Survey of Librarian and Library Patron Use of Wireless Handheld Devices,” Internet Reference Services Quarterly 13, no. 4 (2008): 287– 309. 16. Michael M. Smith and Barbara A. Pietraszewski, “Enabling the Roving Reference Librarian: Wireless Access with Tablet PCs,” Reference Services Review 32, no. 3 (2004): 249–55. 17. Kathryn Zickuhr, Generations and Their Gadgets (Washington, D.C.: Pew Internet & American Life Project, 2011), http://pewinternet.org/Reports/2011/Generations-and- gadgets.aspx (accessed Mar. 31, 2011). 18. Smith and Pietraszewski, “Enabling the Roving Reference Librarian,” 253. 19. Lisa Carlucci Thomas, “Gone Mobile: Mobile Catalogs, SMS Reference, and QR Codes are on the Rise—How are Libraries Adapting to Mobile Culture?” Library Journal 135, no. 17 (2020): 30–34. 20. Smith and Caruso, The ECAR Study of Undergraduate Students and Information Technology, 2010. 21. Ibid. 22. Carolyn Klatt, “Going Mobile: Free and Easy,” Medical Reference Services Quarterly 30, no. 1 (2011): 56–73. 23. Kim Griggs, Laurie M. Bridges, and Hannah Gascho Rempel, “Library/Mobile: Tips on Designing and Developing Mobile Web Sites,” Code4Lib 8, November 23, 2009, http://journal.code4lib.org/articles/2055 (accessed Mar. 30, 2011). 24. Godmar Back and A. Bailey, “Web Services and Widgets for Library Information Systems,”Information Technology & Libraries 29, no. 2 (2010): 76–86. 25. Mark Andy West, Arthur W Hafner, and Bradley D. Faust, “Communications—Expanding Access to Library Collections and Services Using Small-Screen Devices,” Information Technology & Libraries 25, no. 2 (2006): 103. 26. Courtney Greene, Missy Roser, and Elizabeth Ruane, The Anywhere Library: A Primer for the Mobile Web (Chicago: Association of College and Research Libraries, 2010). http://pewinternet.org/Reports/2011/Generations-and-gadgets.aspx http://pewinternet.org/Reports/2011/Generations-and-gadgets.aspx http://journal.code4lib.org/articles/2055 INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2012 99 27. Cody W. Hanson, “Libraries and the Mobile Web,” Library Technology Reports 42, no. 2 (February/March 2011). 28. Griggs, Bridges, and Gascho Rempel, “Library/Mobile.” 29. Samuel Liston, “OPACs and the Mobile,” Computers in Libraries 29, no. 5 (2009): 6–47. 30. R. Bruce Jensen, “Optimizing Library Content for Mobile Phones,” Library Hi Tech News 27, no. 2 (2010): 6–9. 31. Griggs, Bridges, and Gascho Rempel, “Library/Mobile.” 32. “Mobile Web Best Practices 1.0,” Worldwide Web Consortium (W3C), www.w3.org/TR/mobile-bp (accessed Mar. 30, 2011). 33. Wilson and McCarthy, “The Mobile University.” 34. Timothy Vollmer, There’s an App for That! Libraries and Mobile Technology: An Introduction to Public Policy Considerations (Policy Brief No. 3) (Washington, D.C.: ALA Office for Information Technology Policy, 2010), www.ala.org/ala/aboutala/offices/oitp/publications/policybriefs/mobiledevices.pdf (accessed Mar. 31, 2011). 35. Josh Hadro, “HarperCollins Puts 26 Loan Cap on Ebook Circulations,” Library Journal, February 25, 2011, www.libraryjournal.com/lj/home/889452- 264/harpercollins_puts_26_loan_cap.html.csp (accessed Mar. 31, 2011). 36. “M-Libraries: Library Success: A Best Practices Wiki,” www.libsuccess.org/index.php?title=M-Libraries, (accessed Mar. 31, 2011). file:///C:/Users/GERRITYR/Desktop/ITAL%2031n2_PROOFREAD/www.ala.org/ala/aboutala/offices/oitp/publications/policybriefs/mobiledevices.pdf file:///C:/Users/GERRITYR/Desktop/ITAL%2031n2_PROOFREAD/www.ala.org/ala/aboutala/offices/oitp/publications/policybriefs/mobiledevices.pdf file:///C:/Users/GERRITYR/Desktop/ITAL%2031n2_PROOFREAD/www.libraryjournal.com/lj/home/889452-264/harpercollins_puts_26_loan_cap.html.csp file:///C:/Users/GERRITYR/Desktop/ITAL%2031n2_PROOFREAD/www.libraryjournal.com/lj/home/889452-264/harpercollins_puts_26_loan_cap.html.csp file:///C:/Users/GERRITYR/Desktop/ITAL%2031n2_PROOFREAD/www.libsuccess.org/index.php%3ftitle=M-Libraries, MOBILE TECHNOLOGIES & ACADEMICS | DRESSELHAUS AND SHRODE 100 APPENDIX A. Student Survey Questions 1. Type of student? 2. Age? 3. Gender? 4. What is your college? 5. How often do you use the following electronic resources provided by your library? 6. Do you use any of the following devices? 7. Do you use your mobile device or phone for academic purposes (e.g., Blackboard, electronic course reserves, etc.)? 8. Please list what you use your device to do? 9. Have you ever used a text message to get help using the library? 10. Have you ever used Instant Messaging to get help using the library? 11. If library resources were easily accessible on your mobile devices and if you had such a device, how likely would you be to use any of the following for assignments or research? 12. What mobile services would you like the library to offer? 13. Comments? INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2012 101 APPENDIX B. Librarian Survey Questions 1. Type of library? 2. Your job/role in the library? 3. Years working in libraries? 4. Does your library offer mobile device applications for the following electronic resources? 5. Who in your library or on your campus is responsible for implementing or developing mobile device applications? 6. How long has your library provided access via mobile devices to electronic resources or services? 7. If you collect use data for library electronic resources, are patrons using the mobile device applications your library provides? 8. What mobile services do you believe libraries should offer? 9. Comments? 2167 ---- Practical Limits to the Scope of Digital Preservation Mike Kastellec PRACTICAL LIMITS TO THE SCOPE OF DIGITAL PRESERVATION | KASTELLEC 63 ABSTRACT This paper examines factors that limit the ability of institutions to digitally preserve the cultural heritage of the modern era. The author takes a wide-ranging approach to shed light on limitations to the scope of digital preservation. The author finds that technological limitations to digital preservation have been addressed but still exist, and that non-technical aspects—access, selection, law, and finances—move into the foreground as technological limitations recede. The author proposes a nested model of constraints to the scope of digital preservation and concludes that costs are digital preservation’s most pervasive limitation. INTRODUCTION Imagine for a moment what perfect digital preservation would entail: A perfect archive would capture all the content generated by humanity instantly and continuously. It would catalog that information and make it available to users, yet it would not stifle creativity by undermining creators’ right to control their creations. Most of all, it would perfectly safeguard all the information it ingested eternally, at a cost society is willing and able to sustain. Now return to reality: digital preservation is decidedly imperfect. Today’s archives fall far short of the possibilities outlined above. Much previous scholarship debates the quality of different digital preservation strategies; this paper looks past these arguments to shed light on limitations to the scope of digital preservation. What are the factors that limit the ability of libraries, archives, and museums (henceforth collectively referred to as archival institutions) to digitally preserve the cultural heritage of the modern era? 1 I first examine the degree to which technological limitations to digital preservation have been addressed. Next, I identify the non-technical factors that limit the archival of digital objects. Finally, I propose a conceptual model of limitations to digital preservation. TECHNOLOGY Any discussion of digital preservation naturally begins with consideration of the limits of digital preservation technology. While all aspects of digital preservation are by definition related to technology, there are two purely technical issues at the core of digital preservation: data loss and technological obsolescence. 2 Many things can cause data loss. The constant risk is physical deterioration. A digital file consists at its most basic level as binary code written to some form of Mike Kastellec (makastel@ncsu.edu) is Libraries Fellow, North Carolina State University Libraries, Raleigh, NC. mailto:makastel@ncsu.edu INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2012 64 physical media. Just like analog media (paper, vinyl recordings), digital media (optical discs, hard drives) are subject to degradation at a rate determined by the inherent properties of the medium and environment in which it is stored. 3 When the physical medium of a digital file decays to the point where one or more bits lose their definition, the file becomes partially or wholly unreadable. Other causes of data loss include software bugs, human action (e.g., accidental deletion or purposeful alteration), and environmental dangers (e.g., fire, flood, war). Assuming a digital archive can overcome the problem of physical deterioration, it then faces the issue of technological obsolescence. Binary code is simply a string of zeroes and ones (sometimes called a bitstream)—like any encoded information, this code is only useful if it can be decoded into an intelligible format. This process depends on hardware, used to access a bitstream from a piece of physical media, and software, which decodes the bitstream into an intelligible object, such as a document or video displayed on a screen, a printout, or an audio output. Technological obsolescence occurs when either the hardware or software needed to render a bitstream usable is no longer available. Given the rapid pace of change in computer hardware and software, technological obsolescence is a constant concern. 4 Most digital preservation strategies involve staying ahead of deterioration and obsolescence by copying data from older to current generations of file formats and storage media (migration) or by keeping many copies that are tested against one another to find and correct errors (data redundancy). 5 Other strategies to overcome obsolescence include pre-emptively converting data to standardized formats (normalization) or avoiding conversion and instead using virtualized hardware and software to simulate the original digital environment needed to access obsolete formats (emulation). As may be expected of a young field, 6 there is a great deal of debate over the merits of each of these strategies. To date, the arguments mostly concern the quality of preservation, which is beyond the scope of this work. What should not be contentious is that each strategy also imposes limitations on the potential scale of digital preservation. Migration and normalization are intensive processes, in the sense that they normally require some level of human interaction. Any human-mediated process limits the scale of an archival institution’s preservation activities, as trained staffs are a limited and expensive resource. Emulation postpones the processing of data until it is later accessed, potentially allowing greater ingest of information. As a strategy, however, it remains at least partly theoretical and untested, increasing the possibility that future access will be limited. Data redundancy deserves closer examination, as it has emerged as the gold standard in recent years. The limitations data redundancy imposes on digital preservation are two-fold. The first is that simple maintenance of multiple copies necessarily increases expenses, therefore—given equal levels of funding—less information can be preserved redundantly than can be preserved without such measures. (Cost considerations are inextricably linked to every other limitation on digital preservation and are examined in greater detail in “Finances,” below.) There are practical, technical limitations on the bandwidth, disk access, and processing speeds needed to perform PRACTICAL LIMITS TO THE SCOPE OF DIGITAL PRESERVATION | KASTELLEC 65 parity checks (tests of each bit’s validity) of large datasets to guard against data loss. Pushing against these limitations incurs dramatic costs, limiting the scale of digital preservation. Current technology and funding are many orders of magnitude short of what is required to archive the amount of information desired by society over the long term. 7 The second way technology limits digital preservation is more complex—it concerns error rates of archived data. Non-redundant storage strategies are also subject to errors, of course. Only redundant systems have been proposed as a theoretical solution to the technological problem of digital preservation, 8 though, so it is necessary to examine their error rate in particular. On a theoretical level, given sufficient copies, redundant backup is all but infallible. In practice, technological limitations emerge. 9 The number of copies required to ensure perfect bit preservation is a function of the reliability of the hardware storing each copy. Multiple studies have found that hardware failure rates greatly exceed manufacturers’ claims. 10 Rosenthal argues that, given the extreme time spans under consideration, storage reliability is not just unknown but untestable. 11 He therefore concludes that it cannot be known with certainty how many copies are needed to sustain acceptably low error rates. Even today’s best digital preservation technologies are subject to some degree of loss and error. Analog materials are also inevitably subject to deterioration, of course, but the promise of digital media leads many to unrealistic expectations of perfection. Nevertheless, modern digital preservation technology addresses the fundamental needs of archival institutions to a workable degree. Technological limitations to digital preservation still exist but the aspects of digital preservation beyond purely technical considerations—access, selection, law, and finances— should gain greater relative importance than they have in the past. ACCESS With regard to digital preservation, there are two different dimensions of access that are important. At one end of a digital preservation operation, authorized users must be able to access an archival institution’s holdings and unauthorized users restricted from doing so. This is largely a question of technology and rights management—users must be able to access preserved information and permitted to do so. This dimension of access is addressed in the Technology and Law sections of this paper. The other dimension of access occurs at the other end of a digital preservation operation: An archival institution must be able to access a digital object to preserve it. This simple fact leads to serious restrictions on the scope of digital preservation because much of the world’s digital information is inaccessible for the purposes of archiving by libraries and archives. There are a number of reasons why a given digital object may be inaccessible. Large-scale harvesting of webpages requires automated programs that “crawl” the Web, discovering and capturing pages as they go. Web crawlers cannot access password-protected sites (e.g., Facebook) and database-backed sites (all manner of sites, including many blogs, news sites, e-commerce sites, INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2012 66 and countless collections of data). This inaccessible portion of the Web is estimated to dwarf the readily accessible portion by orders of magnitude. There is also an enormous amount of inaccessible digital information that is not part of the Web at all, such as emails, company intranets, and digital objects created and stored by individuals. 12 Additionally, there is a temporal limit to access. Some digital objects only are accessible (or even exist) for a short window of time, and all require some measure of active preservation to avoid permanent loss. 13 The lifespans of many webpages are vanishingly short. Other pages, like some news items, are publicly accessible for a short window before they are hidden behind paywalls. Even long-lasting digital objects are often dynamic: the ads accompanying a webpage may change with each visit; news articles and other documents are revised; blog posts and comments are deleted. If an archival institution cannot access a digital object quickly or frequently enough, the object cannot be archived, at least not completely. Large-scale digital preservation, which in practice necessarily relies on periodic automated harvesting of content, is therefore limited to capturing snapshots of the changes digital objects undergo over their lifespans. LAW Existing copyright law does not translate well to the digital realm. Leaving aside the complexities of international copyright law, in the United States it is not clear, for example, whether an archival institution like the Library of Congress is bound by licensing restrictions and if it can require deposit of digital objects, nor whether content on the Web or in databases should be treated as published or unpublished. 14 “Many of the uncertainties come from applying laws to technologies and methods of distribution they were not designed to address.” 15 A lack of revised laws or even relevant court decisions significantly impacts the potential scale of digital preservation, as few archival institutions will venture to preserve digital objects without legal protection for doing so. Given this unclear legal environment, efforts at large-scale digital preservation are hampered by the need to secure permission to archive from the rights holder of each piece of content. 16 This obviously has enormous impact on preserving the Web, but even scholarly databases and periodical archives may not hold full rights to all of their published content. Additionally, a single digital object can include content owned by any number of authors, each of whose permission is needed for legal archival. Without stronger legal protection for archival institutions, the scope of digital preservation is severely limited by copyright restrictions. Digital preservation is further limited by licensing agreements, which can be even more restrictive than general copyright law. Frequently, purchase of a digital object does not transfer ownership to the end-user, but rather grants limited licensed access to the object. In this case, libraries do not enjoy the customary right of first sale that, among other things, allows for actions related to preservation that would otherwise breach copyright. 17 Preservation of licensed works requires that libraries either cede archival responsibility to rights PRACTICAL LIMITS TO THE SCOPE OF DIGITAL PRESERVATION | KASTELLEC 67 holders, negotiate the right to archive licensed copies, or create dark archives that preserve objects in an inaccessible state until their copyright expires. SELECTION The limitation selection imposes on digital preservation hinges on the act of intellectual appraisal. The total digital content created each year already outstrips the total current storage capacity of the world by a wide margin. 18 It is clear libraries and archives cannot preserve everything so, more than ever, deciding what to preserve is critical. 19 Models of selection for digital objects can be plotted on a scale according to the degree of human mediation they entail. At one end, the selective model is closest to selection in the analog world, with librarians individually identifying digital objects worthy of digital preservation. At the other end of the scale, the whole domain model involves minimal human-mediation, with automated harvesting of digital objects. The collaborative model, in which archival institutions negotiate agreements with publishers to deposit content, falls somewhere between these two extremes, as does the thematic model, which can apply either selective- or whole-domain-type approaches to relatively narrow sets of digital objects defined by event, topic, or community. Each of these approaches results in limits to the scope of digital preservation. The human mediation of the selective model limits the scale of what can be preserved, as objects can only be acquired as quickly as staff can appraise them. The collaborative and thematic models offer the potential for thorough coverage of their target but by definition are limited in scope. The whole domain model avoids the bottleneck of human appraisal but, more than any other model, is subject to the access limitations discussed above. Whole domain harvesting is also essentially wasteful, as it is an anti-selection approach—everything found is kept, irrespective of potential value. This wastefulness makes the whole domain model extremely expensive because of the technological resources required to manage information at such a scale. FINANCES The ultimate limiting factor is financial reality. Considerations of funding and cost have both broad and narrow effects. The narrow effects are on each of the other limitations previously identified— financial constraints are intertwined with the constraints imposed by technology, access, law, and selection. The technological model of digital preservation that offers the highest quality and lowest risk, redundant offsite copies, also carries hard-to-sustain costs. While the cost of storage continues to drop, hardware costs actually make up only a small percentage of the total cost of digital preservation. Power, cooling, and—for offsite copy strategies—bandwidth costs are significant and do not decrease as scale increases to the same degree that storage costs do. Cost considerations similarly fuel non-technical limitations: Increased funding can increase the rate at which digital objects are accessed for preservation and can enable development of systems to mine deep Web resources. Selection is limited by the number of staff who can evaluate objects or INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2012 68 the need to develop systems to automate appraisal. Negotiating perpetual access to objects or arranging to purchase archival copies creates additional costs. The broad financial effect is that any digital preservation requires dedicated funding over an indefinite timespan. Lavoie outlines the problem: Much of the discussion in the digital preservation community focuses on the problem of ensuring that digital materials survive for future generations. In comparison, however, there has been relatively little discussion of how we can ensure that digital preservation activities survive beyond the current availability of soft-money funding; or the transition from a project's first-generation management to the second; or even how they might be supplied with sufficient resources to get underway at all. 20 There are many possible funding models for digital preservation, 21 each with their own limitations. Creators and rights holders can preserve their own content but normally have little incentive to do so over the long-term, as demand for access slackens. Publicly funded agencies can preserve content, but they may lack a clear mandate for doing so, and they are chronically underfunded. Preservation may be voluntarily funded, as is the case for Wikipedia, although it is not clear if there is enough potential volunteer funding for more than a few preservation efforts. Fees may support preservation, either through charging users for access or by third-party organizations charging content owners for archival services; in such cases, however, fees may also discourage access or provision of content, respectively. A Nested Model of Limitations These aspects can be seen as a series of nested constraints (see figure 1). PRACTICAL LIMITS TO THE SCOPE OF DIGITAL PRESERVATION | KASTELLEC 69 Figure 1. Nested Model of Limitations At the highest level, there are technical limitations on how much digital information can be preserved at an acceptable quality. Within that constraint, only a limited portion of what could possibly be preserved can be accessed by archival institutions for digital preservation. Next, within that which is accessible, there are legal limitations on what may be archived. The subset defined by technological, access, and legal limitations still holds far more information than archival institutions are capable of archiving, therefore selection is required, entailing either the limited quality of automated gathering or the limited quantity of human-mediated appraisal. Finally, each of these constraints is in turn limited by financial considerations, so finances exert pressure at each level. CONCLUSION It is possible to envision alternative ways to model these series of constraints—the order could be different, or they could all be centered on a single point but not nested within each other. Thus, undue attention should not be given to the specific sequence outlined above. One important conclusion that may be drawn, however, is that the identified limitations are related but distinct. The preponderance of digital preservation research to date has understandably focused on overcoming technological limitations. With the establishment of the redundant backup model, which addresses technological limitations to a workable degree, the field would be well served by greater efforts to push back the non-technical limitations of access, law, and selection. The other conclusion is that costs are digital preservation’s most pervasive limitation. As Rosenthal plainly states it, “Society’s ever-increasing demands for vast amounts of data to be kept for the future are INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2012 70 not matched by suitably lavish funds.” 22 If funding cannot be increased, expectations must be tempered. Perhaps it has always been the case, but the scale of the digital landscape makes it clear that preservation is a process of triage. For the foreseeable future, the amount of digital information that could possibly be preserved far outstrips the amount that feasibly can be preserved. It is useful to put the advances in digital preservation technology in perspective and to recognize that non-technical factors also play a large role in determining how much of our cultural heritage may be preserved for the benefit of future generations. REFERENCES AND NOTES 1. Issues specific to digitized objects (i.e., digital versions of analog originals) are not specifically addressed herein. Technological limitations apply equally to digitized and born-digital objects, however, and the remaining limitations overlap greatly in either case. 2. Francine Berman et al., Sustainable Economics for a Digital Planet: Ensuring Long-Term Access to Digital Information (Blue Ribbon Task Force on Sustainable Digital Preservation and Access, 2010), http://brtf.sdsc.edu/biblio/BRTF_Final_Report.pdf (accessed Apr. 23, 2011). 3. Marilyn Deegan and Simon Tanner, “Some Key Issues in Digital Preservation,” in Digital Convergence—Libraries of the Future, ed. Rae Earnshaw and John Vince, 219–37 (London: Springer London, 2007), www.springerlink.com.proxy- remote.galib.uga.edu/content/h12631/#section=339742&page=1 (accessed Nov. 18, 2010). 4. Berman et al., Sustainable Economics for a Digital Planet; Deegan and Tanner, “Digital Convergence.” 5. Data redundancy normally will also entail hardware migration; it may or may not also incorporate file format migration. 6. The Library of Congress, for instance, only began digital preservation in 2000 (www.digitalpreservation.gov/partners/pioneers/index.html [accessed Apr. 24, 2011]). 7. David S. H. Rosenthal, “Bit Preservation: A Solved Problem?” International Journal of Digital Curation 5, no. 1 (July 21, 2010), www.ijdc.net/index.php/ijdc/article/view/151 (accessed Mar. 14, 2011). 8. H. M. Gladney, “Durable Digital Objects Rather Than Digital Preservation,” January 1, 2008, http://eprints.erpanet.org/149 (accessed Mar. 14, 2011). 9. Rosenthal, “Bit Preservation.” 10. Ibid. Rosenthal cites studies by Schroeder and Gibson (2007) and Pinheiro (2007). 11. Ibid. http://brtf.sdsc.edu/biblio/BRTF_Final_Report.pdf file:///C:/Users/GERRITYR/Desktop/ITAL%2031n2_PROOFREAD/www.springerlink.com.proxy-remote.galib.uga.edu/content/h12631/%23section=339742&page=1 file:///C:/Users/GERRITYR/Desktop/ITAL%2031n2_PROOFREAD/www.springerlink.com.proxy-remote.galib.uga.edu/content/h12631/%23section=339742&page=1 http://www.digitalpreservation.gov/partners/pioneers/index.html file:///C:/Users/GERRITYR/Desktop/ITAL%2031n2_PROOFREAD/www.ijdc.net/index.php/ijdc/article/view/151 http://eprints.erpanet.org/149/ PRACTICAL LIMITS TO THE SCOPE OF DIGITAL PRESERVATION | KASTELLEC 71 12. Peter Lyman, “Archiving the World Wide Web,” in Building a National Strategy for Digital Preservation: Issues in Digital Media Archiving (Washington, DC: Council on Library and Information Resources and Library of Congress, 2002), 38–51, www.clir.org/pubs/reports/pub106/pub106.pdf (accessed Dec. 1, 2010); F. McCown, C. C Marshall, and M. L Nelson, “Why Web Sites are Lost (and how they’re sometimes found),” Communications of the ACM 52, no. 11 (2009): 141–45; Margaret E. Phillips, “What Should We Preserve? The Question for Heritage Libraries in a Digital World,” Library Trends 54, no. 1 (Summer 2005): 57–71. 13. Deegan and Tanner, “Digital Convergence”; McCown, Marshall, and Nelson, “Why Web Sites are Lost (and how they’re sometimes found).” 14. June Besek, Copyright Issues Relevant to the Creation of a Digital Archive: A Preliminary Assessment (The Council on Library and Information Resources and the Library of Congress, 2003), www.clir.org/pubs/reports/pub112/contents.html (accessed Mar. 15, 2011). 15. Ibid., 17. 16. Archival institutions that do not pay heed to this restriction, such as the Internet Archive (www.archive.org), claim their actions constitute fair use. The legality of this claim is as yet untested. 17. Berman et al., Sustainable Economics for a Digital Planet. 18. Francine Berman, “Got Data?” Communications of the ACM 51, no. 12 (December 2008): 50, http://portal.acm.org/citation.cfm?id=1409360.1409376&coll=portal&dl=ACM&idx=J79&part =magazine&WantType=Magazines&title=Communications (accessed Nov. 20, 2010). 19. Phillips, “What Should We Preserve?” 20. Brian F. Lavoie, “The Fifth Blackbird,” D-Lib Magazine 14, no. 3/4 (March 2008): I, www.dlib.org/dlib/march08/lavoie/03lavoie.html (accessed Mar. 14, 2011). 21. Berman et al., Sustainable Economics for a Digital Planet. 22. Rosenthal, “Bit Preservation.” http://www.clir.org/pubs/reports/pub106/pub106.pdf file:///C:/Users/GERRITYR/Documents/My%20Dropbox/ITAL/ITAL_June_2012_preprints/,%20http:/www.clir.org/pubs/reports/pub112/contents.htm http://www.archive.org/ http://portal.acm.org/citation.cfm?id=1409360.1409376&coll=portal&dl=ACM&idx=J79&part=magazine&WantType=Magazines&title=Communications http://portal.acm.org/citation.cfm?id=1409360.1409376&coll=portal&dl=ACM&idx=J79&part=magazine&WantType=Magazines&title=Communications http://www.dlib.org/dlib/march08/lavoie/03lavoie.html http://www.dlib.org/dlib/march08/lavoie/03lavoie.html 2230 ---- President’s Message: The Year in Review—Open Everything Colleen Cuddy INFORMATION TECHNOLOGIES AND LIBRARIES | JUNE 2012 1 As I sit down to write my last president’s column a variety of topics are running through my mind. But as I focus on just one word to sum up the year, “open” rises to the top of the list. For truly it was a year of all things open. My presidential theme is open data/open science and I am looking forward to hearing Tony Hey and Clifford Lynch speak at the LITA President’s Program later this month on this topic. Dr. Lynch is also the recipient of this year’s LITA/Library Hi Tech award for Outstanding Communication in Library and Information Technology, cosponsored by Emerald Group Publishing Limited. The prestigious Frederick G. Kilgour Award for Research in Library and Information Technology award, co-sponsored by OCLC, is being given to G. Sayeed Choudhury this year. Dr. Choudhury is a longtime proponent of open data and the award recognizes his leadership in the field of data curation through the National Science Foundation supported Data Conservancy project. As you well know ITAL is now an open-access journal. Open access continues to be a hot topic, and rightly so. My last column was devoted to the subject of open access, but, I do want to remind librarians to advocate for open access in the coming year—please keep up the fight! In addition to seeing our journal to its new platform, the Publications Committee has also been busy with a few new LITA Guides, one of which, “Getting Started with GIS,” by Eva Dodsworth, provides some guidance on harnessing data sets to work with geospatial technology. Ms. Dodworth will be conducting an online course on this topic in August and the Education Committee has many new courses in the pipeline. Internally LITA has been working towards a more open and transparent governance structure. The Board has been relentless in making sure that all of its meetings are open, from in-person meetings at conferences to our monthly phone meetings to conversations on ALA Connect. We have been streaming our board meetings live and now will archive the recordings for a limited amount of time. This move has not been without challenges as board members and the LITA office struggled to build open communication with each other and the membership. Sometimes the challenges were ideological or legal, and sometimes the very technology that we embrace has caused problems, but I think it is safe to say that LITA leadership is working towards a common goal of a transparent structure with open communication channels. Colleen Cuddy (colleen.cuddy@med.cornell.edu) is LITA President 2011-12 and Director of the Samuel J. Wood Library and C. V. Starr Biomedical Information Center at Weill Cornell Medical College, New York, New York. mailto:colleen.cuddy@med.cornell.edu PRESIDENT’S MESSAGE | CUDDY 2 We opened up communication channels to get feedback on what our membership would like most when Zoe Stewart-Marshall, incoming president, hosted a town hall meeting at the ALA Midwinter Meeting that focused on member feedback. I know that she is working hard to address membership needs during her presidency. As a medical librarian I often travel in circles outside of ALA and when my medical colleagues learned that I was LITA President they were really impressed. LITA is a well-known and well- respected brand in the library community. Talking to my non-LITA colleagues reinforced the value that LITA brings to the entire profession, particularly through our programming, education, and they way in which we share and exchange information in open forums such as the LITA blog and listserv. (Of course I hope that we have gained some new members through this outreach!) Clearly we are doing many things right and we should not lose sight of what is great about LITA as we work on addressing areas that need improvement. One thing that is consistently great about LITA is its annual sponsorship of ALA Emerging Leaders. This year we sponsored two LITA members who were part of the 2012 ALA Emerging Leaders cohort: Jodie Gambill and Tasha Keagan. Both were assigned to a team working on a LITA project that asked for a recommendation and plan for the implementation of a LITA Experts Profile System. The team was responsible for identifying the software to employ and creating an implementation plan with ontology recommendations. The team has identified VIVO (an open- source, semantic-web application) as the software for the project and will present its findings and implementation plan to the LITA Board and the ALA community at the ALA Annual Conference. The team did an outstanding job on this project and completed the deliverable on time, with very little guidance from LITA leadership—a sure sign of leadership! Yet, I was often reminded that as we embrace our upcoming leaders, we should not forget that leadership occurs on all levels. One message that I heard throughout my presidency is that LITA should do more for mid-career librarians—and this sentiment is shared by members of other organizations in which I am active. This is a challenge that LITA leadership is poised to take on as it balances its services to membership. As I now count eighteen occurrences of the word “open” in this column I believe I have made my point and it is time to sign off. Although I am finishing up my duties as LITA President, I am not saying goodbye. I look forward to my new role as past-president, particularly in hosting the 2012 LITA National Forum in Columbus, Ohio (October 4-7): New World of Data: Discover. Connect. Remix. The National Forum Planning Committee led by Susan Sharpless Smith has done an outstanding job putting together an excellent meeting. The committee has lined up interesting speakers such as Eric Hellman, Ben Schneiderman, and Sarah Houghton, and thoughtfully evaluated many paper and poster submissions. I am sure we will all learn quite a bit from our colleagues as we attend sessions and network. I will be hosting a dinner and I hope to see some of you there as I enjoy what I hope will be a more relaxed role as past-president. It has been an honor to serve you and I look forward to working with LITA in the years to come! 2241 ---- Extending IM beyond the Reference Desk: A Case Study on the Integration of Chat Reference and Library-Wide Instant Messaging Network Ian Chan, Pearl Ly, and Yvonne Meulemans INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2012 4 ABSTRACT Openfire is an open-source instant messaging (IM) network and a single unified application that meets the needs of chat reference and internal communication. In Fall 2009, the California State University San Marcos (CSUSM) Library began using Openfire and other Jive software IM technologies to simultaneously improve our existing IM-integrated chat reference software and implement an internal IM network. This case study describes the chat reference and internal communications environment at the CSUSM Library and the selection, implementation, and evaluation of Openfire. In addition, the authors discuss the benefits of deploying an integrated IM and chat reference network. INTRODUCTION Instant messaging (IM) has become a prevalent contact point for library patrons to get information and reference help, commonly known as chat reference or virtual reference. However, IM can also offer a unique method of communication between library staff. Librarians are able to rapidly exchange information synchronously or asynchronously in an informal way. IM provides another means of building relationships within the library organization and can improve teamwork. Many different chat-reference software packages are widely used by libraries, including QuestionPoint, Meebo, and LibraryH3lp. Less commonly used is Openfire (www.igniterealtime.org/projects/openfire), an open-source IM network and a single unified application that uses the Extensible Messaging and Presence Protocol (XMPP), a widely adopted open protocol for IM. Since 2009, the California State University San Marcos (CSUSM) Kellogg Library has used Openfire for chat reference and internal IM communication. Openfire was relatively easy to set up and administer by the Web Development Librarian. Librarians and library users have found the IM interface to be intuitive. In addition to helpful chat reference features such as statistics capture, queues, transfer, linking to Meebo widgets, Openfire offers the unique capability to host an internal IM network within the library. Ian Chan (ichan@csusm.edu) is Web Development Librarian, California State University San Marcos, Pearl Ly (pmly@pasadena.edu) is Access Services & Emerging Technologies Librarian, Pasadena Community College, Pasadena, and Yvonne Meulemans (ymeulema@csusm.edu) is Information Literacy Program Coordinator, California State University San Marcos, California. EXTENDING IM BEYOND THE REFERENCE DESK | CHAN, LY, AND MEULEMANS 5 In this article, the authors present a literature review on IM as a workplace communication tool and its successful use in libraries for chat reference services. A case study on the selection, implementation, and evaluation of Openfire for use in chat reference and as an internal network will be discussed. In addition, survey results on the library staff use of the internal IM network and its implications for collaboration and increased communication are shared. LITERATURE REVIEW Although there is a great deal of literature on IM for library reference services, publications on the use of IM in libraries for internal communications do not appear in the professional literature. A review of library and information science (LIS) literature has revealed very limited work on this aspect of instant messaging. However, a wider literature review in the fields of communications, computer science, and business, indicates there is growing interest in studying the benefits of IM within organizations. Instant Messaging in the Workplace In the workplace, IM can offer a cost-effective means of connecting in real-time and may increase communication effectiveness between employees. It offers a number of advantages over email, telephone, and face-to-face that we will discuss further in the following section. Within the academic library, IM offers the possibility of not only improving access to librarians for research help but also provides the opportunity to enhance communication and collaboration throughout the entire organization. Research findings indicate that IM allows coworkers to maintain a sense of connection and context that is different from email, face-to-face (FTF), and phone conversations.1 Each IM conversation is designed to display as a single textual thread with one window per conversation. The contributions from each person in the discussion are clearly indicated and it is easy to review what has been said. This design supports the intermittent reconnection of conversation and in contrast to email, “intermittent instant messages were thought to be more immersive and to give more of a sense of a shared space and context than such email exchanges.”2 Through the use of IM, coworkers gain a highly interactive channel of communication that is not available via other methods of communication.3 Phone and FTF conversations are two of the most common forms of interruption within the workplace.4 However, Garrett and Danziger found that “instant messaging in the workplace simultaneously promotes more frequent communications and reduces interruptions.”5 Participants reported they were better able to manage disruptions using IM and that IM did not increase their communication time. The findings of this study revealed that some communication that otherwise may have occurred over email, by telephone, or in-person were instead delivered via IM. This likely contributed to the reduced interruptions because IM does not require full and immediate attention unlike a phone call or face-to-face communication. In addition, IM study participants reported the ability to negotiate their availability through postponing conversations, INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2012 6 and these findings support earlier studies suggesting IM is less intrusive than traditional communication methods for determining availability of coworkers.6 A number of research studies show that IM improves teamwork and is useful for discussing complex tasks. Huang, Hung, and Chen compared the effectiveness of email and IM and the number of new ideas; they found that groups utilizing IM generated more ideas than the email groups.7 They suggested that the spontaneous and rapid interchanges typical of IM facilitates brainstorming between team members. The information that is uniquely visible through IM and the ease of sending messages help create opportunities for spontaneous dialog. This is supported by a study by Quan-Haase, Cothrel, and Wellman, which found IM promotes team interaction by indicating the likelihood of a faster response.8 Ou et al. also suggest IM has “potential to empower teamwork by establishing social networks and facilitating knowledge sharing among organizational members.”9 IM can enhance the social connectedness of coworkers through its focus on contact lists and instant, opportunistic interactivity. The informal and personalized nature of IM allows workers to build relationships while promoting the sharing of information. Cho, Trier, and Kim suggest that the use of IM as a communication tool encourages unplanned virtual hallway discussions that may be difficult for those located in different parts of a building, campus, or in remote locations.10 IM can build relationships between teams and organizations where members are in physically separated locations. However, Cho, Trier, and Kim also note that IM is more successful in building relationships between coworkers who already have an existing relationship. Wu et al. argue that by helping to build the social network within the organization, instant messaging can contribute to increased productivity.11 Several studies have cautioned that IM, like other forms of communication, requires organizational guidelines on usage and best practices. Mahatanankoon suggests that productivity or job satisfaction may decrease without policies and workplace norms that guide IM use.12 Other research indicates that personality, employee status, and working style may affect the usefulness of IM for individual employees.13 Some workers may find the multitasking nature of IM to work in their favor while those who prefer sequential task completion may find IM disruptive. The hierarchy of work relationships and the nature of managerial styles are likely to have an impact on the use of IM as well. While there are no research findings associated with the use of IM for internal communication within libraries, there are articles encouraging its use. Breeding writes of the potential for IM to bring about “a level of collaboration that only rarely occurs with the store-and-forward model of traditional e-mail.”14 Fink provides a concise introduction to the advantages of using internal IM for communication between library staff.15 In addition, he provides an overview of the implementation and success of the Openfire-based IM network at McMaster University. EXTENDING IM BEYOND THE REFERENCE DESK | CHAN, LY, AND MEULEMANS 7 Success of Chat Reference in Libraries IM-based chat reference gives libraries the means to more easily offer low-cost delivery of synchronous, real-time research assistance to their users, commonly referred to as “chat reference.” Although libraries have used IM for the last decade and many currently subscribe to QuestionPoint, a collaborative virtual reference service through OCLC, two newer online services helped propel the growth of IM-based chat reference. First available in 2006, the web-based Meebo (www.meebo.com) made it much easier to use IM for localized chat reference because library patrons were no longer required to have accounts on a proprietary network, such as AOL or Yahoo, to communicate with librarians.16 Instead, Meebo provided web widgets that allowed users to chat via the web browser. Libraries could easily embed these widgets throughout their website and unlike QuestionPoint, Meebo is free and does not require a subscription. Librarians could answer questions using either their account on Meebo’s website or by logging-in with a locally installed instant messaging client. In comparison to IM-based chat reference, a number of libraries also found Questionpoint difficult to use due to its complexity and awkward interface.17 In 2008, LibraryH3lp (http://libraryh3lp.com) pushed the growth of IM-based chat reference even further because it offered a low-cost, library-specific service that required little technical expertise to implement and operate. LibraryH3lp improved on the Meebo model by adding features such as queues, multi-user accounts, and assessment tools.18 IM adds a more informal means of interaction that helps librarians build relationships with their users. Several recent studies have shown that users respond positively to the use of IM for chat reference. The Illinois State University Milner Library found that switching from its older chat reference software to IM increased transactions by 161 percent within one year.19 With the introduction of web-based IM widgets Pennsylvania State University Library’s IM-based chat reference grew from 20 percent to 60 percent of all virtual reference (VR), which includes email reference, in one year.20 A 2010 study of VR and IM service at the University of Guelph Library found 71 percent user satisfaction with IM compared to 70 percent satisfaction with VR overall.21 IM use in academic libraries has become ubiquitous, and other types of libraries also use IM to communicate with library patrons. CASE STUDY California State University, San Marcos (CSUSM) is a mid-size public university with approximately 9,500 students. CSUSM is a commuter campus with the majority of students living in North County San Diego and offers many online or distance courses at satellite campuses. The CSUSM Kellogg Library has a robust chat reference service that is used by students on and off campus. The library has about forty-five employees including librarians, library administrators, and library assistants. The following section will discuss the Meebo chat reference pilot, selection of Openfire to replace Meebo, implementation and customization of Openfire, and evaluation of Openfire for chat reference by librarians and as an internal network for all library personnel. INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2012 8 Meebo Chat Reference Pilot To examine the feasibility of using IM for chat reference at CSUSM, the reference librarians initiated a pilot program using Meebo (2008–9). A Meebo widget was placed on the library’s homepage, the Ask a Librarian page, and on library research guides. Within the first year of the pilot project, chat reference grew to more than 41 percent of all reference transactions.22 Based on responses to user satisfaction surveys, 85 percent indicated they would recommend chat reference to other students, and 69 percent said they preferred it to other forms of reference services. Chat reference is now an integral part of the library’s research assistance program, and IM has become a permanent access point for students to contact reference librarians. Although the new IM service was successful, the pilot program uncovered a number of key shortcomings with Meebo when used for chat reference; these shortcomings are documented in a case study by Meulemans et al.23 These findings matched problems reported by other libraries who used Meebo in their reference services.24 Meebo is most suited for individual users who communicate one-to-one via IM. For example, Meebo chat widgets are specific to each Meebo user, and it is not possible to share a single widget between multiple librarians. In addition, features such as message queues and message transfers, invaluable for managing a heavily used chat reference service, are not available in Meebo. Those features are essential for working with multiple, simultaneous incoming IM messages, a common occurrence in virtual reference. Other missing features included the lack of built-in transcript retention and lack of automated usage statistics.25 Selecting Openfire Based on the need for a more robust chat reference system, the CSUSM reference librarians and the web development librarian explored other IM options, especially open-source software. The web development librarian had previous experience using Openfire at the University of Alaska Anchorage, for an internal library IM network and investigated its capabilities to replace Meebo as a chat reference tool. The desire to replace Meebo for chat reference at CSUSM also provided the opportunity to pilot an internal IM network. Openfire, part of the suite of open-source instant messaging tools from Jive Software, was the only application that could easily fulfill both roles and offered a number of features that made it highly preferable when compared to other IM-based chat reference systems. Of its many features, one of the most valuable was the integration between Openfire user accounts and our campus email system. Being able to tap into the university’s email system meant automated configuration and updating of all staff accounts and contact lists. This removed the burden of individual account maintenance associated with external services such as Meebo, LibraryH3lp, and QuestionPoint. Openfire supports internal IM networks at educational institutions such as the University of Pennsylvania, Central Michigan University, and University of California, San Francisco. EXTENDING IM BEYOND THE REFERENCE DESK | CHAN, LY, AND MEULEMANS 9 Openfire could meet our IM chat reference needs because it includes the Fastpath plugin, a complete web-based chat management system available at www.igniterealtime.org/projects/openfire/plugins.jsp. This robust system incorporates important features such as message queues, message transfer, statistics, and canned messages. James Cook University Library in Australia also chose to use Openfire with Fastpath plugin as its chat reference solution based on their need for those features.26 Other institutions using Fastpath and Openfire in the role of chat reference or support include the University of Texas, the Oregon/Ohio multistate virtual reference consortium, Mozilla.com, and the University of Wisconsin. When reviewing chat reference solutions, we considered the possibility of using chat modules available through Drupal (http://drupal.org), the web content management system (CMS) for our library website. The primary advantage of that option was complete integration with the library website and intranet. Further analysis of the Drupal option revealed that the available chat modules where too basic for our needs and that reconfiguration of our intranet and website to incorporate a workable chat reference system would require extensive time. In comparison to the implementation time associated with deploying the Openfire system, using Drupal-based chat modules did not provide a favorable cost-benefit ratio. While the proprietary LibraryH3lp offered similar functionality for chat reference, its inability to integrate with our email system was clearly a deficit when compared to Openfire. In LibraryH3lp, it is necessary to create accounts for all library personnel in chat reference. Fastpath does not have that requirement if you integrate Openfire with your organization’s Lightweight Directory Access Protocol (LDAP) directory. Instead, the system will automatically create accounts for all library staff. Furthermore, the administrative options and interface for Libraryh3lp also did not compare favorably with that of Fastpath. The Fastpath interface for assigning users is more intuitive and the system generates a customizable chat initiation form for each workgroup (figures 1 and 2). Oregon’s L-net and Ohio’s KnowItNow24x7 offer information about software requirements and an online demonstration of Spark/Fastpath.27 INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2012 10 Figure 1. Fastpath Chat Initiation Form for CSUSM Research Help Desk Figure 2. Fastpath Chat Initiation Form for CSUSM Media Library For our requirements, Openfire was clearly superior to the available systems for chat reference. Its relatively simple deployment requirements and ease of setup helped make it our first choice for building a combined IM network and chat reference system. In the following section, we will discuss the installation, customization, and assessment of our Openfire implementation. Openfire Installation and Configuration The Openfire application is a free download from Ignite Realtime, a community of Jive Software. The program will run on any web server that has a Windows, Linux, or Macintosh operating system. If configured as a self-contained application, Openfire only requires Java to be available on your web server. Installation of the software is an automated process and system configuration is through a web-based setup guide. After the initial language selection form, the next step in the server configuration process is to enter the web server URL and the ports through which the server will communicate with the outside world (figure 3). The third step provides fields for selecting the type of database to use with Openfire and for inputting any information relating to your selection (figure 4). EXTENDING IM BEYOND THE REFERENCE DESK | CHAN, LY, AND MEULEMANS 11 Figure 3. Openfire Server Settings Screen Figure 4. Openfire Database Configuration Form Openfire uses a database to store information such as IM network settings, user account information, and transcripts. Database options include using an embedded database or connecting to an external database server. Using the embedded database is the simpler option and is helpful if you do not have access to a database server. Connecting to an external database server offers more control of the data generated by Openfire and provides additional backup options. Openfire works with a number of the more commonly used database servers such as MySQL, PostgreSQL, and Microsoft SQL Server. In addition, Oracle and IBM’s DB2 are database options with additional free plugins from these vendors. We choose to use MySQL because of our experience using it with other library web applications. If using the external database option, creating and configuring the external database before installing Openfire is highly recommended. After choosing a database, the Openfire configuration requires the selection of an authentication method for user accounts. One option is to use Openfire’s internal authentication system. While the internal system is robust, it requires additional administrative support to manage the process of creating and maintaining user accounts. The recommended option is to connect Openfire with your organization’s Lightweight Directory Access Protocol (LDAP) directory (figure 5). LDAP is a protocol that allows external systems to interact with the user information stored in an organization’s email system. Using LDAP with Openfire is highly preferable because it simplifies access for your librarians and staff by automatically creating user accounts based on the information in your organization’s email system. Library staff simply login with their work email or network account information; they are not required to create a new username and password. INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2012 12 Figure 5. Openfire LDAP Configuration Form The last step in the configuration process is to grant system administrator access to the appropriate users. If using the LDAP authentication method, you are able to select one or more users in your organization by entering their email id (the portion before the ampersand). The selected users will have complete access to all aspects of the Openfire server. Once the setup and configuration process is complete, the server is ready to accept IM connections and route messages. Reviewing the settings and options within the Openfire system administration area is highly recommended. Most libraries will likely want to adjust the configurations within the sections for server settings and archives. Connecting the IM Network The second phase of the implementation process connected our Library personnel with the IM network using IM software installed on their workstations. The Openfire IM server works with any multiprotocol IM client (“multiprotocol” refers to support for simultaneous connections to multiple IM networks) that provides options for configuring an XMPP or Jabber account. Some of the more popular IM clients that offer this functionality include Spark, Trillian, Miranda, and Pidgin. Based on our chat reference requirements, we choose to use Spark (www.igniterealtime.org/projects/spark), an IM client program designed to work specifically with the Fastpath web chat service. Spark comes with a Fastpath plugin that enables users to receive and send messages to anyone communicating through the web-based Fastpath chat widgets (more information on Fastpath configuration is in the next section of this article). This plugin provides a tab for logging into a Fastpath group and for viewing the status of the group’s message queues EXTENDING IM BEYOND THE REFERENCE DESK | CHAN, LY, AND MEULEMANS 13 (figure 6). Spark also includes many of the features offered by other IM clients including built-in screen capture, message transfer, and group chat. Figure 6. The Fastpath Plugin for Spark Library personnel were able to install Spark on their own by downloading it from the Ignite Software website and launching the software’s installation package. The installation process is very simple and user-specific information is only required when Spark is started for the first time. The fields required for login include the username and password of the user’s organizational email and the address of the IM server. As part of our implementation process, we also provided library staff with recommendations regarding the selection and configuration of optional settings that might enhance their IM experience. Recommendations included auto-start of Spark when logging- in to computer and the activation of incoming message signals, such as sound effects and pop-ups. On our Openfire server, we had also installed the Kraken Gateway (http://kraken.blathersource.org) plugin to enable connections to external IM networks. The gateway plugin works with Spark to integrate library staff accounts on chat network such as Google Talk, Facebook, and MSN (an example of integrated networks is shown in figure 6.) By integrating Meebo as well, librarians were able to continue using the Meebo widgets they had embedded into their research guides and faculty profile pages. This allowed them to use Spark to receive IM messages rather than logging on to the Meebo website. INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2012 14 Configuring the Fastpath Plugin for Chat Reference A primary motivation for using Openfire was the feature set available in the Fastpath plugin. Fastpath is a complete chat messaging system that includes workgroups, queues, chat widgets, and reporting. Fastpath actually consists of two plugins that work together, Fastpath Service for managing the chat system and Fastpath Webchat for web-based chat widgets. Both plugins are available as free downloads from the Openfire Plugins section of the Ignite Software website— www.igniterealtime.org/projects/openfire/plugins.jsp. To install Fastpath, upload the its packages using the form in the plugins section of the Openfire administrative interface. The plugins will automatically install and add a Fastpath tab to the administrative main menu. The first step in getting started with the system is to create a workgroup and add members (figure 7). Within each new workgroup, one or more queues are required to process and route incoming requests and each queue requires at least one “agent.” In Fastpath, the term agent refers to those who will receive the incoming chat requests. Figure 7. Workgroup Setup Form in Fastpath As work groups are created, the system automatically generates a chat initiation form which by default includes fields for name, email and question. Administrators can remove, modify, and add any combination of field types including text fields, dropdown menus, multiline text areas, radio buttons, and check boxes. You may also configure the chat initiation form to require completion of some, all, or none of the fields. At CSUSM, our form (figures 1 and 2) includes name, question, email, and a dropdown menu for selecting the topic area of the user’s research and a field for the user to enter their question. The information in these fields allows us to quickly route incoming EXTENDING IM BEYOND THE REFERENCE DESK | CHAN, LY, AND MEULEMANS 15 questions to the appropriate subject librarian. Fastpath includes the ability to create routing rules that use the values submitted in the form to send messages to specific queues within a workgroup. In future, we may use the dropdown menu to automatically route questions to the subject specialist based on the student’s topic. There are two methods to make the Fastpath chat widget available to the public. The standard approach embeds a presence icon on your webpage and provides automatic status updates. Clicking on the icon displays the chat initiation form. For our needs we choose to embed the chat initiation form in our webpages (see appendix B for sample code). When the user submits the form, Openfire routes the message to the next available librarian. On the librarian’s computer, the Spark program plays a notification sound and displays a pop-up dialog. The pop-up dialog remains open until the librarian accepts the message, passes it on, or the time limit for acceptance is reached, in which case the message returns to the queue for the next available librarian. Evaluation of Openfire for Enhanced Chat Reference The CSUSM reference librarians found Fastpath and Openfire to be much more robust than Meebo for chat reference. The ability to keep chat transcripts and to retain metadata such as time stamps, duration of chats, and topic of research for each conversation is very helpful toward analyzing the effectiveness of chat research assistance and for statistical reporting. The automated recording of transcripts and metadata saved time when compared to Meebo. Using Meebo, transcripts were manually copied into a Microsoft Word document and the tracking statistics of IM interactions were kept in a shared Excel spreadsheet. Other useful features of Fastpath were the capability of transferring of patrons to other librarians and having more than one librarian monitor incoming questions. Furthermore, access to the database holding the Fastpath data allowed us to build an intranet page to monitor real-time incoming IM messages and their responses. However, some issues were encountered with the Fastpath plugin when initiating chat connections. We experienced intermittent, random instances of dropped IM connections and lost messages. While many of these lost connections were likely the result of user actions (accidentally closing the chat pop-up, walking away from the computer, etc.), others appear to have been due to problematic connections between the server and the user’s browser. To address these issues, we are now asking users to provide their email when they initiate a chat session. With user emails and our real-time chat monitoring system, we are able to follow up with reference patrons that experience IM connection issues and provide research assistance via email. Evaluation of Openfire as an Internal Communication Tool While the adoption of IM as internal communication tool was highly encouraged, its use was not mandatory for all library personnel. Based on the varied technical background of our staff and librarians, we recognized that some might find IM difficult to integrate within their workflow or communication style and chose a soft-launch for our network. INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2012 16 In summer 2011, we conducted a survey of CSUSM library personnel (44 respondents, 99 percent of total staff) to evaluate IM as an internal communication tool. (See appendix A for survey questions.) We found that 59 percent of staff use the internal IM network while 85 percent use some type of IM for web-based chat for work. Of those who use internal IM, 30 percent used it daily. While the survey was anonymous, anecdotal discussions indicate adoption rates are higher among library units where the work is technically oriented or instructional in nature, such as Library Systems and the Information Literacy Program/Reference. Among the respondents who use IM, 45 percent of library staff indicated they use it because it allows quick communication between those in the library and 39 percent like its informal nature of communication. Twenty percent of total respondents preferred IM to email and phone communications. Two respondents use the internal IM network but were dissatisfied with it and indicated it did not work well while one found it too difficult to use. An additional survey question was geared for staff members who do not use the internal IM network at all (“Why do you not use the Library IM network?”). This question was designed to find areas of possible improvement within our system to encourage greater use. Survey respondents were allowed to select more than one reason. The most common reasons given by those who do not use the library IM network were that they don’t feel the need (34 percent of nonusers), they mainly communicate with staff members who are also not utilizing the IM network (18 percent), IM does not work for their communication style (14 percent), and privacy concerns (14 percent). We believe more in-depth analysis is necessary to learn more regarding the perceived usefulness of IM within our organization and to further its adoption. CONCLUSION Through additional training and user education, we hope to promote greater use of the Openfire internal IM network among those who work in the library. While 100 percent adoption of IM as a communication tool is not a stated goal of our project, we believe that some staff have not realized the full potential of IM for collaboration and productivity due to a lack of experience with this technology. In hindsight, additional training sessions beyond the initial introductory workshop to set up the Spark IM client may have increased the usage of IM by staff. For example, providing more information on the library’s policies regarding internal IM tracking and the configuration of our system may have alleviated concerns regarding privacy. In addition, we need to lead more discussions on the benefits of IM for collaboration, lowering disruptions, and increasing effectiveness in the workplace. Openfire and Fastpath for chat reference has brought many new features that were previously unavailable to chat reference at CSUSM. The addition of queues, message transfer, and transcripts has enhanced the effectiveness of this service and eased its management. Compared to the prior chat reference implementations that used QuestionPoint and Meebo, this new system is more user friendly and robust. EXTENDING IM BEYOND THE REFERENCE DESK | CHAN, LY, AND MEULEMANS 17 Furthermore, the internal IM network and its connection to web-based chat widgets offer the opportunity for building a library that is more open to users. Library users could feasibly contact any library staff member, not just reference librarians, via IM for help. We are testing this concept with a pilot project involving the CSUSM Media Library. They are staffing their own chat workgroup and a chat widget is now available on their website. In the future, we also hope to employ a chat widget for Circulation and ILL services, another public services area that frequently works with library users. It is important to note that the success of Openfire and IM in the library attracted the attention of other CSUSM instructional and student support areas. In spring 2011, Instructional and Information Technology Services (IITS), which provides campus-wide technology services for faculty, staff, and students piloted an Openfire-based IM helpdesk service to assist users with technology questions and problems. As of fall 2011, the “Ask an IT Technician” service is fully implemented and available on all campus webpages. Discussions on the adoption of IM for other campus student services, such as financial aid and counseling, have also occurred. In addition to being a contact point for students, IM has potential to improve the internal communication within the organization. REFERENCES 1. Hee-Kyung Cho, Matthias Trier, and Eunhee Kim, “The Use of Instant Messaging in Working Relationship Development: A Case Study,” Journal of Computer-Mediated Communication 10, no. 4 (2005), http://onlinelibrary.wiley.com/doi/10.1111/j.1083-6101.2005.tb00280.x/full (accessed Aug. 1, 2011). 2. Bonnie A. Nardi, Steven Whittaker, and Erin Bradner, “Interaction and Outeraction: Instant Messaging in Action,” in Proceedings of the 2000 ACM Conference on Computer Supported Cooperative Work (New York, New York: ACM Press, 2000),79–88. 3. Ellen Isaacs et al., “The Character, Functions, and Styles of Instant Messaging in the Workplace,” in Proceedings of the 2002 ACM Conference on Computer Supported Cooperative Work (New York, New York: ACM Press, 2002), 11–20. 4. Victor M. González and Gloria Mark, “Constant, Constant, Multi-tasking Craziness: Managing multiple working spheres,” in Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (New York, New York: ACM Press, 2004), 113–20. 5. R. Kelly Garrett and James N. Danziger, “IM = Interruption Management? Instant Messaging and Disruption in the Workplace,” Journal of Computer-Mediated Communication 13, no. 1 (2007), http://jcmc.indiana.edu/vol13/issue1/garrett.html (accessed Jun. 15, 2011). 6. Nardi, Whittaker, and Bradner, “Interaction and Outeraction,” 83. 7. Albert H. Huang, Shin-Yuan Hung, and David C. Yen, “An Exploratory Investigation of Two Internet-based Communication Modes,” Computer Standards & Interfaces 29, no. 2 (2006): 238–43. http://onlinelibrary.wiley.com/doi/10.1111/j.1083-6101.2005.tb00280.x/full http://jcmc.indiana.edu/vol13/issue1/garrett.html INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2012 18 8. Anabel Quan-Haase, Joseph Cothrel, and Barry Wellman, “Instant Messaging for Collaboration: A Case Study of a High-Tech Firm,” Journal of Computer-Mediated Communication 10, no. 4 (2005), http://jcmc.indiana.edu/vol10/issue4/quan-haase.html (accessed Jun. 12, 2011). 9. Carol X. J. Ou et al., “Empowering Employees through Instant Messaging,” Information Technology & People 23, no. 2 (2010): 193–211. 10. Cho, Trier, and Kim, “Instant Messaging in Working Relationship Development.” 11. Lynn Wu et al., “Value of Social Network—A Large-Scale Analysis on Network Structure Impact to Financial Revenue of Information Technology Consultants” (paper presented at Winter Information Systems Conference, Salt Lake City, UT, Feb. 5, 2009). 12. Pruthikrai Mahatanankoon, “28P. Exploring the Impact of Instant Messaging on Job Satisfaction and Creativity,” CONF-IRM 2010 Proceedings (2010). 13. Ashish Gupta and Han Li, “Understanding the Impact of Instant Messaging (IM) on Subjective Task Complexity and User Satisfaction,” in PACIS 2009 Proceedings. Paper 10, http://aisel.aisnet.org/pacis2009/1; and Stephanie L. Woerner, JoAnne Yates, and Wanda J. Orlikowski, “Conversational Coherence in Instant Messaging and Getting Work Done,” in Proceedings of the 40th Annual Hawaii International Conference on System Sciences, http://www.computer.org/portal/web/csdl/doi/10.1109/HICSS.2007.152 (2007). 14. Marshall Breeding, “Instant Messaging: It’s Not Just for Kids Anymore,” Computers in Libraries 23, no. 10 (2003): 38–40. 15. John Fink, “Using a Local Chat Server In Your Library,” Feliciter 56, no. 5 (2010): 202–3. 16. William Breitbach, Matthew Mallard, and Robert Sage, “Using Meebo’s Embedded IM for Academic Reference Services: A Case Study,” Reference Services Review 37, no. 1 (2009): 83–98. 17. Cathy Carpenter and Crystal Renfro, “Twelve Years of Online Reference Services at Georgia Tech: Where We Have Been and Where We Are Going,” Georgia Library Quarterly 44, no. 2 (2007), http://digitalcommons.kennesaw.edu/glq/vol44/iss2/3 (accessed Aug. 25, 2011); and Danielle Theiss-White et al., “IM’ing Overload: Libraryh3lp to the Rescue,” Library Hi Tech News 26, no. 1/2 (2009): 12–17. 18. Theiss-White et al., “IM’ing overload,” 12–17. 19. Sharon Naylor, “Why Isn’t Our Chat Reference Used More?” Reference & User Services Quarterly 47, no. 4 (2008): 342–54 20. Sam Stormont, “Becoming Embedded: Incorporating Instant Messaging and the Ongoing Evolution of a Virtual Reference Service,” Public Services Quarterly 6, no. 4 (2010): 343–59. http://jcmc.indiana.edu/vol10/issue4/quan-haase.html http://www.computer.org/portal/web/csdl/doi/10.1109/HICSS.2007.152 http://digitalcommons.kennesaw.edu/glq/vol44/iss2/3 EXTENDING IM BEYOND THE REFERENCE DESK | CHAN, LY, AND MEULEMANS 19 21. Lorna Rourke and Pascal Lupien, “Learning from Chatting: How our Virtual Reference Questions are Giving Us Answers,” Evidence Based Library & Information Practice 5, no. 2 (2010): 63–74. 22. Pearl Ly and Allison Carr, “Do u IM?: Using Evidence to Inform Decisions about Instant Messaging in Library Reference Services” (poster presented at the 5th Evidence Based Library and Information Practice Conference, Stockholm, Sweden, June 29, 2009), http://blogs.kib.ki.se/eblip5/posters/ly_carr_poster.pdf (accessed August 1, 2011). 23. Yvonne Nalani Meulemans, Allison Carr, and Pearl Ly, “From a Distance: Robust Reference Service via Instant Messaging,” Journal of Library & Information Services in Distance Learning 4, no. 1 (2010): 3–17. 24. Theiss-White et al., “IM’ing overload,” 12–17. 25. Meulemans, Carr, and Ly, “From a Distance,” 14–15 26. Nicole Johnston, “Improving the Reference and Information Experience of Students in Regional Areas—Does an Instant Messaging Service Make a Difference?” (paper presented at 4th ALIA New Librarians Symposium, December 5–6, 2008, Melbourne, Australia), http://eprints.jcu.edu.au/2076(accessed August 17, 2011); and Alan Cockerill, “Open Source for IM Reference: OpenFire, Fastpath and Spark” (workshop presented at Fair Shake of the Open Source Bottle, Griffith University, Queensland College of Art, Brisbane, Australia, November 20, 2009), http://www.quloc.org.au/download.php?doc_id=6932&site_id=255 (accessed August 4, 2011). 27. Oregon State Multistate Collaboration, “Multi-State Collaboration: Home,” http://www.oregonlibraries.net/multi-state (accessed August 16, 2011). http://blogs.kib.ki.se/eblip5/posters/ly_carr_poster.pdf http://eprints.jcu.edu.au/2076 http://www.quloc.org.au/download.php?doc_id=6932&site_id=255 http://www.oregonlibraries.net/multi-state INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2012 20 APPENDIX A Library Instant Messaging (IM) Usage Survey The information you submit is confidential. Your name and campus ID are NOT included with your response. Which of the following do you use . . . for work for personal Library’s IM Network (Spark) Meebo MSN Yahoo GTalk Facebook or other website-specific chat system IM app on my phone Trillian, Pidgin or other IM aggregator Skype I don’t use IM or web-based chat Other If you selected other, please describe: ____________________________________________________________________ EXTENDING IM BEYOND THE REFERENCE DESK | CHAN, LY, AND MEULEMANS 21 On average, how often do you communicate via IM or web-based chat at work? ● Several times a day ● Almost daily ● Several times a week ● Several times a month ● Never How often do you use IM or web-based chat to . . . 5—Often 4 3— Sometimes 2 1—Never discuss work-related topic socialize with co-worker answer questions from library users talk about NON-work related topic request tech support Other If you selected other, please describe: ____________________________________________________________________ If you use IM to communicate at work, what do you like about it? ● Allows for quick communication with others in the library ● Facilitates informal conversation ● Students like to use it to ask library related questions ● I prefer IM over phone or email ● Other: INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2012 22 Why do you NOT use the Library IM network? ● Don’t feel the need ● The people I usually talk to aren’t on it ● Does not work well ● Never get around to it . . . but would like to ● It doesn’t work for my communication style ● The system is too difficult to use ● Privacy concerns ● Other: Additional comments? ____________________________________________________________________ EXTENDING IM BEYOND THE REFERENCE DESK | CHAN, LY, AND MEULEMANS 23 APPENDIX B IFRAME Code for Embedding Fastpath Chat Widget Many different chat-reference software packages are widely used by libraries, including QuestionPoint, Meebo, and LibraryH3lp. Less commonly used is Openfire (www.igniterealtime.org/projects/openfire), an open-source IM network and a single unified ap... LITERATURE REVIEW Instant Messaging in the Workplace Success of Chat Reference in Libraries CASE STUDY Meebo Chat Reference Pilot Selecting Openfire Openfire Installation and Configuration Connecting the IM Network Configuring the Fastpath Plugin for Chat Reference Evaluation of Openfire for Enhanced Chat Reference Evaluation of Openfire as an Internal Communication Tool CONCLUSION REFERENCES APPENDIX A Library Instant Messaging (IM) Usage Survey APPENDIX B 2268 ---- Experiences of Migrating to an Open- Source Integrated Library System Vandana Singh INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2013 36 ABSTRACT Interest in migrating to open-source integrated library systems is continually growing in libraries. Along with the interest, lack of empirical research and evidence to compare the process of migration brings a lot of anxiety to the interested librarians. In this research, twenty librarians who have worked in libraries that migrated to open-source integrated library system (ILS) or are in the process of migrating were interviewed. The interviews focused on their experiences and the lessons learned in the process of migration. The results from the interviews are used to create guidelines/best practices for each stage of the adoption process of an open-source ILS. These guidelines will be helpful for librarians who want to research and adopt an open-source ILS. INTRODUCTION Open-source software (OSS) has become increasingly popular in libraries, and every year more libraries migrate to an open-source integrated library system.1 While there many discrete open- source applications used by libraries, this paper focuses on the integrated library system (ILS), which supports core operations at most libraries. The two most popular open-source ILSs in the United States are Koha and Evergreen, and they are being positioned as alternatives to proprietary ILSs. 2 As open-source software becomes more widely used, it is not enough just to identify which software is the most appropriate for libraries, but it is also important to identify best practices, common problems, and misconceptions with the adoption of these software packages. The literature on open-source ILSs is usually in the form of a case study from an individual library or a detailed account of one or two aspects of the process of selection, migration, and adoption. In our interactions with librarians from across the country, we found that there are no consolidated resources for researching different open-source ILSs and for sharing the experiences of the people using them. Librarians who are interested in open-source ILS cannot find one resource that can give them an overview of the necessary information related to open-source ILSs. In this research, we interviewed twenty librarians from different types and sizes of libraries and gathered their experiences to create generalized guidelines for the adoption of open-source ILSs. These guidelines are at a broader level than one single case study and cover all the different stages of the adoption lifecycle. The experiences of librarians are useful for people who are evaluating open- source ILSs as well as those who are in the process of adoption. Learning from their experiences will help librarians to not have to reinvent the wheel. This type of research helps the librarians by empowering them with the information they need; also, it helps us in understanding the current status of this popular software. Vandana Singh (vandana@utk.edu) is Assistant Professor, School of Information Sciences, University of Tennessee, Knoxville, Tennessee. mailto:vandana@utk.edu EXPERIENCES OF MIGRATING TO AN OPEN-SOURCE INTEGRATED LIBRARY SYSTEM | SINGH 37 LITERATURE REVIEW As mentioned earlier, most of the literature on open-source ILS is practitioner-based and provides case studies or single steps in the process of adoption. These research studies and resources are useful but do not address the broad information needs of the librarians who are researching the topic of open-source ILSs. Every library is different, so no two libraries are going to take the same path in the adoption process. The usefulness of these articles depends on whether the searcher can find one in a similar environment. Another issue is the amount of information given in these resources. Often these papers discuss only one aspect of moving to an open-source ILS, for example choosing the open-source ILS. If they do cover the whole process, there is usually not enough detail to know how they did it. For example, Morton-Owens, Hanson, and Walls organize their paper into five sections: motivation and requirements analysis, software selection, configuration, training, and maintenance. 3 However, each section includes more main points than description. Another relevant stream of literature includes those that compare different open- source ILSs. These range from little more than links to different open-source projects to in-depth comparisons.4 For example, Muller evaluated open-source communities for different ILSs on forty criteria and then compared the ILS on over eight hundred functions and features.5 These types of articles are very useful for those who are trying to become acquainted with the different open- source ILSs that are available and are in the evaluation phase of the process. Again, they are not helpful in understanding the entire process of adoption. Some best practices articles such as Tennant may be a little older, but his nine tips are still valid and very useful as a good foundation for anyone thinking about making the switch to open-source ILS.6 What Are the Factors for Moving to an Open-Source ILS? Another reason why an open-source ILS appeals to libraries is its underlying philosophy: “Open source and open access are philosophically linked to intellectual freedom, which is ultimately the mission of libraries.” 7 The other two common reasons are cost and functionality. The literature covering the decision to move to an open-source ILS makes it clear that there is a wide variety of ways that libraries come to this decision. In Espiau-Bechetoille, Bernon, Bruley, and Mousin, the consortium made the decision in four parts. 8 The article states that they initially determined that four open-source ILSs met their needs (Koha, Emilda, Gnuteca and Evergreen), although it is somewhat vague as to how they determined that Koha was the best for their situation. Indeed, most of the article is about how the three libraries involved had to work together, coordinating and dividing responsibilities. Bissels shares that money was the main reason that the Complementary and Alternative Medicine Library and Information Service (CAMLIS) decided to migrate to Koha.9 They explain the process of making that decision. CAMLIS was being developed from nothing, which makes their situation different than most libraries, and hence the process is different as well. Michigan is an area known for its number of Evergreen libraries. Much of that is due to Michigan Library Consortium. Dykhuis explains the long, involved process that led to a number of Evergreen installations. 10 MLC provides services to Michigan INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2013 38 libraries, such as training and support. When they started looking for an ILS system that all libraries could use, the main concerns were cost and functionality, which are the two key aspects that are mentioned in any discussion about choosing an ILS. Kohn and McCloy state that they decided to migrate to a new ILS due to frustration with their current ILS and that they involved all six of their librarians in the decision-making process.11 Dennison and Lewis show another reason why people migrate to open-source ILS.12 They say that the proprietary system they were using was much more complicated than they needed. In addition, because of staff turnover no one really understood the system. This lack of expertise combined with increasing annual costs led to the decision to move to an open-source ILS. An important lesson to take from this article is that they included all six of their librarians in the decision- making process. For a smaller library where everyone is an expert in their area of the library, it is important to get everyone involved in order to make sure that important functions or needed capabilities are not overlooked. Almost any library that chooses open-source ILS will name cost as one of their primary reasons. Functionality is usually what determines which ILS they choose. Riewe conducted a study where he asked why each library chose its current ILS. 13 Open source libraries responded most often with ability to customize, the freedom from vendor lock-in, portability, and cost. How does Migration Happen? There are two general ways to do a migration: all at once or in stages. Kohn and McCloy discuss a three-phase migration.14 The reason for this method was to spread the cost over several years. They did the public website and federated catalog as phase one and did the backend part during phases two and three. When multiple libraries are involved, phased migration is more like what is described in Dykhuis.15 In that case, first a pilot program was created where a few libraries migrated over to the new system. When that was successful, then more libraries migrated. In contrast to a phased migration, Walls discusses a migration completed in three months.16 This time includes installation, testing, and configuration. One interesting decision they made was to migrate at the end of the fiscal year in order to limit the amount of acquisitions data to be migrated. Dennison and Lewis completed their migration in two months. In this migration, most of the work was done by the company that was hosting their system. 17 This limited the amount of expertise that the library staff needed and made the migration much smoother from their perspective. Migration can also be an opportunity; for example, Morton-Owens, Hanson, and Walls mention that they used the migration to Koha to synchronize circulation rules between the branches. 18 It was also used to weed out inactive patrons (anyone who had not used the library in two years). Data migration can be a problem, though. In the old system, the location code had been used for where the item was within the branch library, what kind of item it was, and how it circulated, but EXPERIENCES OF MIGRATING TO AN OPEN-SOURCE INTEGRATED LIBRARY SYSTEM | SINGH 39 these are three separate fields in Koha. However, to some extent these issues are true of any migration between different systems. The migration experience is not always of a smooth transition. One of the advantages of open- source is the ability to customize and to develop functions that are specific to your library. In the case of New York Academy of Medicine Library (NYAM) working with its consortium WALDO (Westchester Academic Library Directors Organization), it was the decision to have developments completed before migration that caused the problems.19 Their migration schedule was delayed by a month, and even after the delay not all of the eleven key features were complete. In addition, their migration took place when LibLime (a proprietary vendor) with whom they were working announced their separation from the Koha open-source community, which caused additional confusion. There are a couple of lessons to take from this. First, if doing development, be sure that the time needed is built into the migration schedule. Also, when choosing an ILS, think about how many developments are going to be necessary to successfully run the ILS in your environment. Lastly, try to prioritize the developments to minimize the number needed before “going live.” What Does the Literature Say about Training? Very little is available about the training process for open-source ILS. In current studies, training can be done in two ways: either by buying training from a vendor, or doing it internally.20 Dennison and Lewis found that having staff work on the system together at first and then try it independently was the most successful. 21 They had a demonstration system to practice, which also helped. In addition to this self-training, they had onsite training done by module, which allowed staff to attend only the training that was relevant and needed for them. In all of the articles discussed in this section, only one talks about ongoing maintenance. 22 The two-paragraph section includes suggested methods and does not mention anything about the amount of time or expertise needed for ongoing maintenance. In summary, in this literature review we found that there is research about open-source ILS but that there is a need for much more work in this area. It was found that research articles and practitioner pieces are available and talk about different aspects of the adoption process. The main reasons for adoption are identified. There are also a few scattered individual articles about the process of migration, training, and maintenance. There is a gap in the studies of open-source ILS, and there is no comprehensive study that documents the process, explains the steps, and identifies best practices and challenges for librarians who are interested. DATA SOURCES The objective for data collection was to collect data from a variety of library types and sizes in order to collect a wide range of data. E-mail invitations for interviews were sent to Koha and Evergreen discussion list and to several other library-related discussion list. The e-mail requested volunteers for a telephone interview to share their experiences with open-source integrated library systems. Potential participants identified themselves as being willing to be interviewed for INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2013 40 the project via e-mail and were then contacted by researchers to set up times for phone interviews. The list of interview questions was e-mailed to the participants before the interviews so that they could review the questions and had enough time to reflect on their experiences. The interviews were conducted with librarians working in a variety of libraries, including nine libraries using Evergreen and one in the process of migrating to Evergreen. Seven libraries were using Koha, two were using other open-source ILSs, and one was using a proprietary ILS while evaluating open- source. Public libraries were the most numerous with eleven respondents, while there were also four special libraries, three academic libraries, and one school library. Researchers also requested information about the size of the library collection. Seven libraries owned collections of less than 100,000 items, seven had collections of 100,001–999,999 items, and four libraries owned collections of over 1,000,000 items. Geographically, the respondents ranged all over the United States and included one library located in Afghanistan (although the ILS was installed in the United States). Table 1 details the description of the data. DATA COLLECTION METHOD Interviews were chosen as the primary means of data collection in order to gather rich information that could be analyzed using qualitative methods. Researchers sought to interview professionals from a variety of library types and sizes in order to collect a variety of different experiences regarding the selection, implementation, and ongoing maintenance of open-source ILS. Interviewing was the chosen methodology for several reasons. First, the goal was to go past the practitioner articles to see what kinds of trends there are in the migration process. This requires getting experiences from multiple librarians. Interviews provide the in-depth “case-study description” that we were looking for.23 In addition, the most useful aspect of interviewing is the ability to follow up on an answer that the participant gives.24 This ensures that the same type of information is gathered from every interview. This is unlike surveys where sometimes participants do not respond in a way that answers what the researcher really wants to know. In our case, we used telephone interviews due to the geographic dispersion of the participants. It allowed us to talk to librarians from all over the country instead of just within our area. The interview questions are listed in appendix A. DATA ANALYSIS METHODOLOGY Interviews were transcribed, and identifying information was then removed from each of the transcribed documents. The transcripts were then uploaded into Dedoose (www.dedoose.com), a web-based analysis program supporting qualitative and mixed methods research. Dedoose provides online excerpt selection, coding, and analysis by multiple researchers for multiple documents. The research team used an iterative process of qualitatively analyzing the resulting documents. This method used multiple reviews of the data to initially code large excerpts which were then analyzed twice more to extract common themes and ideas. Researchers began by reviewing each document for quantitative information, including the library type, ILS in use, EXPERIENCES OF MIGRATING TO AN OPEN-SOURCE INTEGRATED LIBRARY SYSTEM | SINGH 41 number of IT staff, and size of the collection. This information was added as metadata descriptors to each document in Dedoose. Upon review of the transcriptions and in discussions about the interview process, researchers began a content analysis of the qualitative data. Codes were created based on this initial analysis to aid in categorizing the data from the interviews. Two coders coded the entire dataset, specifying categories and themes to the excerpts of the interview transcription. All of the excerpts from each coder were used to create two tests. Each coder then took the test of the other's codes by choosing their own codes for each excerpt. Researchers earned scores of .96 and .95 using Cohen’s kappa statistic, indicating very high reliability. Table 1. Description of Libraries Library Size (number of items in collection) Library Type ILS Used Under 100,000 Academic Koha 100,000–1,000,000 Public Evergreen Under 100,000 Special Proprietary—Considering open-source Under 100,000 Public Koha School Koha 100,000–1,000,000 Public Millennium—In process of migrating to Evergreen 100,000–1,000,000 Public Evergreen 100,000–1,000,000 Special Koha Under 100,000 Public Koha Public Evergreen 100,000–1,000,000 Academic Evergreen-Equinox Under 100,000 Special Koha Over 1,000,000 Academic Kuali OLE 100,000–1,000,000 Public Evergreen-Equinox Over 1,000,000 Public Evergreen 100,000–1,000,000 Public Evergreen Under 100,000 Public Koha-Bywater Over 1,000,000 Public Evergreen-Equinox Under 100,000 Public Evergreen Over 1,000,000 Special Collective Access INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2013 42 RESULTS Results from the interview questions were divided into eight categories identified as stages of migration, starting with evaluation of the ILS, creation of a demonstration site, data preparation, identification of customization and development needs, data migration, staff training and user testing, and going live and long-term maintenance plans. Best practices and challenges for each of the stages are presented below. This section begins with some general considerations gleaned from the responses. General Consideration When Migrating to an Open-Source ILS • Create awareness about open-source culture in your library—let them know what to expect. • Develop IT skills internally even if you use a vendor. • Assess your staff’s abilities before committing. Knowing what your staff can do will help determine whether you need to work with a vendor and to what degree or if you can do it alone. It is also a way to determine who is going to be on your migration team. • Have a demonstration system; pre-migration, it can be used to test and train, and after migration it can be used to help find solutions to problems. This will also help develop skills internally. • Communication is key. o If working with a vendor either as a single library or as a consortium, have a designated liaison with the vendor so all questions go through one person. In a consortium, ensure that everybody knows what is going on. • Be prepared to commit a significant amount of staff time for testing, development, and migration, especially if you are not hiring a proprietary vendor for support. Working with Vendors • Read contracts carefully. Do not be afraid to ask questions and request changes. Sometimes the other party has a completely different meaning for a word than you do. Make sure you are on the same page. • Ensure that there is an explicit timeline and procedure for the release of usable source code. • See that you are guaranteed and entitled to access the source code in case you need to switch developers, bring additional developers on board, or try to fix problems in-house. • Provide specific examples when reporting problems. Specific example will help the developers determine what the problem is and will help prevent any miscommunication. • Designate a liaison between library staff and developers. The liaison will have to be someone who understands or can learn enough about what the developers are doing so that he or she can translate any problems or complaints from one group to the other. EXPERIENCES OF MIGRATING TO AN OPEN-SOURCE INTEGRATED LIBRARY SYSTEM | SINGH 43 • Set up regular meetings for those involved in the migration project. Regular meetings keep everyone focused and on task. They also provide an opportunity for questions, concerns, and problems to be addressed quickly. Sample quote from interviews: One of the main things that came up is working with Equinox, it was amazing. To start with, they were very, very helpful. And I had made an assumption, and I think the rest of us had, too, that we were working with, that this was developed by librarians, and that the terminology used would be library jargon. But that was not the case. We had some stumbling points over, we would say, okay, we want this, or this is a transaction, or that’s a bill, but that’s not what they called it. They didn’t call it a transaction, or they didn’t call it a bill. And so when we wrote the contract, we wrote it so that none of the patrons’ current checkout record would migrate, which is a big issue. And we didn’t realize that we weren’t using the right terminology in order to put that in the contract so that those current checkouts would move over with the migration and not just the record. Stage 1—Evaluation When making the decision of whether to migrate to open-source and which open-source ILS is best for your library, the main things to start with are two questions: who makes the decision and on what basis. In practice, who makes the decision? • If a single library, one or two people make the decision, usually the library director and whoever is serving as the tech person. • If in a consortium, a committee makes the decision, often either the library directors or tech people. Best practice suggestion: Regardless of the size of the library system, even though these are the people making the decisions, you should always try to include as many groups as open-sourceible in the decision to move to open-source. Which ILS? • Make a list of requirements based on your current system and a wish list of requirements for the new system. This is one area where you can involve more than just the system staff. Asking the different departments (cataloging, acquisitions, and circulation) what their needs are ensures that the final decision includes everyone. • Talk to other libraries that have made the move to open-source. They are a great resource for seeing how the system actually works, asking questions about the migration process, and providing information about open-source problems. If available, talk to a library that migrated from your current proprietary system. Some systems are easier to migrate from than others, so this would be an opportunity to find out about any specific problems. INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2013 44 Stage 2—Set Up a Demonstration Site • This is the most important guideline in the entire paper. Create a demonstration site before making a final decision. o If there is still confusion in your team about which ILS to use, setting up a demo site and installing Koha and Evergreen will be the best way to decide which one works for your situation. o Doing at least one test migration will show what kind of data preparation needs to be done, usually by doing data mapping. Data mapping is where you determine where the fields in your current system go when you move into the new system. Another often-used term for data mapping is staging tables. o The demo site is also a good way to do staff training when needed. o The demo site also provides a way to determine what the best setup rules, policies, and settings are by testing them in advance. o It provides an opportunity to learn the processes of the different modules and how they differ from your library’s current practices. o Most importantly, it serves as a test run for migration, which will make the actual migration go smoothly. Sample quotes from interviews: Do you think that the tests with the data and doing that really helped? Oh yes, we were have had a disaster if we hadn’t done three tests and test loads. The PALS office has done conversions multiple times before so they have it done, and we have good tech people. So they knew that the three tests loads would be a good thing. We did discover some of the tools that should be used, like for example one of the things that’s recommended for Evergreen patron migration is to have a staging table, so you dump all your records into a database that you can then use to create the records in the Evergreen tables. And you know we found out why that was important by running into a couple, a few problems with not being able to line up the data in the multiple fields. But you know that’s the sort of thing we expect. That’s pretty, I classify it as pretty typical migration learning, is finding out what works one way, what doesn’t the other. But you know that was a good thing because all the documents were saying, “You should use a staging table.” And we had to figure out ourselves why that was such a good idea. You should use a staging table for migration, i.e. move records into a database that is then used to create records in Evergreen. It helps because some data doesn't line up in the same fields. It's a good idea to set up tables and rules far in advance in order to test before migration. It's very important to do data mapping very carefully because if you lose anything during migration it's difficult to get it back. Check it to make sure that all the fields will be EXPERIENCES OF MIGRATING TO AN OPEN-SOURCE INTEGRATED LIBRARY SYSTEM | SINGH 45 transported correctly, and run tests while the old system is still up to make sure everything is there. Stage 3—Data Preparation • Clean up the data in advance. The better the data is, the easier it will transfer. This is also an opportunity to start fresh in a new system, so if there were inconsistencies or irritations in the old system this is a good time to fix it. o Weeding—If you have records (either materials or patrons) that are out of date, get rid of them. The fewer the records, the easier migration will be. In addition, vendors often charge by record, so why pay for records you do not need? • Consistency in data is key. If multiple people are working on the data, make sure they are working based on the same standards. • Do a fine amnesty when migrating to a new system. Depending on the systems (current and new), it is sometimes impossible or very difficult to transfer fine data into the new ILS, so doing a fine amnesty will make the process simpler. • Spot check data (testing, during, and after migration). Catching problems early means there will be less work trying to fix problems later. Sample quotes from interviews: I would say that if you’re considering converting to an ILS software, that you’ve really got to do the data mapping very carefully with a fine-toothed comb because you don’t want to lose data. It’s too hard to get it back in. The data needs to be normalized so that the numbers of fields are uniform, names are in the correct order, and data is displayed correctly. The library has had to decide whether it is worthwhile to do things like getting rid of old abbreviations, etc. to make the data more easily understood. Problems occur with old data if information such as note fields has been entered inconsistently. It's important to have procedures and to make sure everyone is following them. Often things are put in different places, which causes a lot of trouble. They are doing a lot of cleanup of data, such as reducing the number of unique values in the case of some items that had a huge number of values in a drop down list. Would like to spend more time on data cleanup but need to go ahead and get data migrated. Stage 4—Development/Customization • One benefit of using an open-source ILS is that any development done by any library comes back to the community, so often if you want something done, someone else might have already created that functionality and you can use it. INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2013 46 • Develop partnerships. Often if you want a specific development, someone else does too. If your staff does not have the expertise, then you could provide more of the funding and the partner could provide the tech skills or vice versa. Partnerships mean the development will cost less than if you did it alone. • Grant money is also available for open-source development and may be another funding option. Sample quotes from interviews: The library does its own minor customizations and uses Equinox for major jobs. They will lay out and prepare everything then hire Equinox to write and implement new code. The library tries not to do things on its own but always looks for partnerships when doing any customizations. That way libraries that have similar needs can share resources. Stage 5—Migration Process • Write workflows and policies/rules beforehand. Writing these when working on the demo site should provide step-by-step instructions on how to do the final migration. • Having regular meetings during the migration process ensures that everyone stays on the same page and prevents miscommunications that will slow down the process. • If many libraries are involved, migration in waves will make things easier. This is generally a situation with a statewide consortium. Usually there is a pilot migration of four to eight libraries, then after that, each wave gets a little bigger as the system becomes more practiced. This can also be a useful model if the libraries involved in the consortium are accepting the migration at different rates. • For a consortium that is coming from multiple ILSs, having a vendor will make it easier. This is not to say that it could not be done without a vendor, but migrating from System A is going to be different than migrating from System B. This increases the complexity, which can make working with a vendor more cost effective. Stage 6—Staff Training and User Testing • Who does the training? There are two main ways: by a vendor or internally. o If trained by a vendor, there are two options:  The vendor sends someone to the library to conduct training.  The library sends someone to the vendor for training and then he or she comes back and trains the rest of the staff. o If trained internally, there are a lot of training materials available. There are several libraries that have created their own materials and then made them available online. This is another time where having contacts with other libraries can help in using common resources. EXPERIENCES OF MIGRATING TO AN OPEN-SOURCE INTEGRATED LIBRARY SYSTEM | SINGH 47 • Documentation is important for training. The best way is to find what documentation is already available and then customize it for your system. • Do training fairly close to the “Go Live” date. • Use a day or two for training. If a consortium is spread out geographically, use webinars and wikis. • When doing training, have specific tasks to do. This can be done a few ways. o Do the specific tasks at the training. o Demonstrate the tasks at training and then give “homework” where the staff does the specific tasks independently. To implement this option, staff has to have access to a demo system. o Have staff try the tests on their own and use the training session for questions or problems they had. Sample quotes from interviews: Well we had, we hired Equinox to come and do 2 days of training with us. So they’re here and did hands-on training with us. And then we also, they provided some packets of exercises that people could do on their own. And we had the system up and running in the background so that they could play with it about a week before we actually went live to the public so that they could get used to it, figure out how things worked, and work with it a little bit so they could answer questions before the public came and said, hey, how do I find my record, and I can’t get into this anymore. And the training was really good, but the hands-on was the best. And it’s not a difficult system to work, but you just need a little experience with it before it makes sense. Evergreen runs a test server that anybody can download the staff client for that and work in their test server and just examine all of the records and how the system works, to figure out our workflows. We looked up documentation online—Evergreen, Indiana, Pines, various places—copied the documentation they so graciously hosted online for everybody to use, went through it, found what worked for us. Those couple staff members worked with other staff. We printed out kind of our little how-to guides for other people, depending on which worked, and told them they’re going to sit down, we’ve got terminals set up here, sit down and learn it. The admin person, she went through some quite detailed training. She went to Atlanta and had training from Equinox on a lot of aspects of Evergreen. And then we also, she came back, and then she did training for all the libraries in the consortium, kind of an intensive day-long or half-day-long thing that she offered in several different central geographic locations so that all the libraries would have a chance to go and attend without having to drive too far. And we also did webinars, we got a couple webinars for the real outlying libraries. And we also have ongoing weekly webinars. And we have a wiki set up where we put all the information in the online manual and stuff like that. INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2013 48 All the training sessions were recorded, and so we had them on CD for new people coming on board. Marketing for Patrons • Most libraries have not done anything elaborate, generally just announcements through posters, local papers, flyers, and on websites. • If the migration is greatly changing the situation for patrons, then more marketing is needed. • Set up a demo computer for patrons to try or hold classes once the system is up. Training for Patrons • Most libraries did not find this necessary. Either the system is easy to use or it is set up to look like the old system. • If training patrons, create online tutorials. Stage 7—“Go Live” and After • If possible, have your old system running for a month or two until you are sure that all the data got migrated over properly. Sample quote from interviews: Check it to make sure that all the fields will be transported correctly, and run tests while the old system is still up to make sure everything is there. Maintenance—Library Staff (This assumes a migration being done in-house with little to no vendor support.) • Staff has to have the technical knowledge (Linux, SQL, and coding). • Often the money saved from moving to open-source is used to pay for additional staff. • Most time is not spent on maintenance but on customization, updates, or problem-solving. Maintenance—Vendor • Often start with higher vendor support, which lessens as the staff learns and develops expertise. DISCUSSION AND CONCLUSION Interviews with twenty librarians from different settings provided insight into the process of the adoption of open-source ILS and were used to develop the guidelines presented in this paper. These guidelines are not intended to serve as a complete guide to the process of adoption but are meant to give interested librarians an overview of the process. These guidelines can help libraries prepare themselves for the research and adoption far before they delve into the process. Since these guidelines are all based in the real-life adoption experiences of libraries, they provide insight EXPERIENCES OF MIGRATING TO AN OPEN-SOURCE INTEGRATED LIBRARY SYSTEM | SINGH 49 into the challenges as well as the opportunities in the process. These guidelines can be used to develop an adoption plan and requirements for the adoption process. In future research, we are working to create adoption blueprints and total cost of ownership assessments (with and without vendors) for libraries of different sizes and types. Also, as part of this research we have developed an information portal that contains resources that will help librarians in each phase of the process of open-source ILS adoption. The information portal along with these guidelines will fill a very important gap in the resources available for open-source ILS adoption. The URL for the portal is not being provided in this paper to ensure anonymous review. REFERENCES 1. Marshall Breeding, “Automation Marketplace 2012: Agents of Change,” Library Journal 137, no. 6 (April 1, 2012), http://lj.libraryjournal.com/2012/03/industry-news/automation- marketplace-2012-agents-of-change (accessed February 18, 2013). 2. Tristan Müller, “How to Choose a Free and Open-Source Integrated Library System,” OCLC Systems & Services: International Digital Library Perspectives 27, no. 1 (2011): 57–78, http://dx.doi.org/10.1108/10650751111106573 (accessed February 18, 2013). 3. Emily G. Morton-Owens, Karen L. Hanson, and Ian Walls, “Implementing Open-Source Software for Three Core Library Functions: A Stage-by-Stage Comparison,” Journal of Electronic Resources in Medical Libraries 8, no. 1 (2011), 1–14, http://dx.doi.org/10.1080/15424065.2011.551486 (accessed February 18, 2013). 4. Janet L. Balas, “How They Did It: ILS Migration Case Studies,” Computer in Libraries 31, no. 8 (2011): 37. 5. Müller, “How to Choose a Free and Open-Source Integrated Library System.” 6. Roy Tennant, “Technology Decision-Making: A Guide for the Perplexed,” Library Journal 125, no. 7 (2000): 30. 7. Xan Arch, “Ultimate Debate 2010: Open Source Software—Free Beer or Free puppy? A Report of the LITA Internet Resources & Services Interest Group Program, American Library Association Annual Conference, Washington, DC, June 2010,” Technical Services Quarterly 28, no. 2 (2011): 186–88, http://dx.doi.org/10.1080/07317131.2011.546268 (accessed February 18, 2013). 8. Camille Espiau-Bechetoille, Jean Bernon, Caroline Bruley, and Sandrine Mousin, “An Example of Inter-University Cooperation for Implementing Koha in Libraries: Collective Approach and Institutional Needs,” OCLC Systems & Services: International Digital Library Perspectives 27, no.1 (2011): 40–44, http://dx.doi.org/10.1108/10650751111106546 (accessed February 18, 2013). http://lj.libraryjournal.com/2012/03/industry-news/automation-marketplace-2012-agents-of-change/ http://lj.libraryjournal.com/2012/03/industry-news/automation-marketplace-2012-agents-of-change/ http://dx.doi.org/10.1108/10650751111106573 http://dx.doi.org/10.1080/15424065.2011.551486 http://dx.doi.org/10.1080/07317131.2011.546268 http://dx.doi.org/10.1108/10650751111106546 INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2013 50 9. Gerhard Bissels, “Implementation of an Open-Source Library Management System: Experiences with Koha 3.0 at the Royal London Homoeopathic Hospital,” Electronic Library and Information Systems 42, no. 3 (2008): 303–14, http://dx.doi.org/10.1108/00330330810892703 (accessed February 18, 2013). 10. Randy Dykhuis, “Michigan Evergreen: Implementing a Shared Open Source Integrated Library System,” Collaborative Librarianship 1, no. 2 (2009): 60–65, http://collaborativelibrarianship.org/index.php/jocl/article/view/7/8 (accessed February 18, 2013). 11. Karen Kohn and Eric McCloy, “Phased Migration to Koha: Our Library’s Experience,” Journal of Web Librarianship 4 no. 4 (2010): 427–34, http://dx.doi.org/10.1080/19322909.2010.485944 (accessed February 18, 2013). 12. L.H. Lyn Dennison and A.F. Lewis, “Small and Open-Source: Decisions and Implementation of an Open-Source Integrated Library System in a Small Private College,” Georgia Library Quarterly 48 no. 2 (2011): 6–8, http://digitalcommons.kennesaw.edu/glq/vol48/iss2/3 (accessed February 18, 2012). 13. Linda M. Riewe, “Survey of Open-Source Integrated Library Systems,” Master’s Theses, Paper 3481, http://scholarworks.sjsu.edu/etd_theses/3481 (accessed February 18, 2013). 14. Karen Kohn and Eric McCloy, “Phased Migration to Koha: Our Library’s Experience.” 15. Randy Dykhuis, “Michigan Evergreen: Implementing a Shared Open Source Integrated Library System.” 16. Ian Walls, “Migrating from Innovative Interfaces’ Millennium to Koha: The NYU Health Sciences Libraries’ Experiences,” OCLC Systems & Services: International Digital Library Perspectives 27, no. 1 (2011): 51–56, http://dx.doi.org/10.1108/10650751111106564 (accessed February 13, 2013). 17. L.H. Lyn Dennison and A.F. Lewis, “Small and Open-Source: Decisions and Implementation of an Open-Source Integrated Library System in a Small Private College.” 18. Emily G. Morton-Owens, Karen L. Hanson, and Ian Walls “Implementing Open-Source Software for Three Core Library Functions: A Stage-By-Stage Comparison.” 19. Lisa Genoese and Latrina Keith, “Jumping Ship: One Health Science Library’s Voyage from a Proprietary ILS to Open Source,” Journal of Electronic Resources in Medical Libraries 8, no. 2 (2011): 126–33, http://dx.doi.org/10.1080/15424065.2011.576605 (accessed February 18, 2013). 20. Ian Walls, “Migrating from Innovative Interfaces’ Millennium to Koha: The NYU Health Sciences Libraries’ Experiences”; Emily G. Morton-Owens, Karen L. Hanson, and Ian Walls, http://dx.doi.org/10.1108/00330330810892703 http://collaborativelibrarianship.org/index.php/jocl/article/view/7/8 http://dx.doi.org/10.1080/19322909.2010.485944 http://digitalcommons.kennesaw.edu/glq/vol48/iss2/3 http://scholarworks.sjsu.edu/etd_theses/3481 http://dx.doi.org/10.1108/10650751111106564 http://dx.doi.org/10.1080/15424065.2011.576605 EXPERIENCES OF MIGRATING TO AN OPEN-SOURCE INTEGRATED LIBRARY SYSTEM | SINGH 51 “Implementing Open-Source Software for Three Core Library Functions: A Stage-by-Stage Comparison.” 21. L. H. Lyn Dennison and A. F. Lewis, “Small and Open-Source: Decisions and Implementation of an Open-Source Integrated Library System in a Small Private College.” 22. Morton-Owens, Hanson, and Walls, “Implementing Open-Source Software for Three Core Library Functions.” 23. Laurel Jizba Mis, “An Essay on Our Interviews, and a Call for Participation,” Journal of Internet Cataloging 6 no. 2 (2003): 17–20, doi: 10.1300/J141v06n02_04 (accessed February 18, 2013). 24. Golnessa Galyani Moghaddan and Mostafa Moballeghi, “How Do We Measure the Use of Scientific Journals? A Note on Research Methodologies,” Scientometrics 76, no. 1 (2008): 125– 33, doi: 10.1007/s11192-007-1901-y (accessed February 18, 2013). doi:%2010.1300/J141v06n02_04 doi:%2010.1007/s11192-007-1901-y INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2013 52 Appendix A. Interview Questions Library Environment 1. What is your library type (school, academic, public, special, etc.)? 2. What is your library size (how many employees, population served, and number of materials)? Evaluation (We would like as much info as possible about why the system was chosen over others, including any existing system.) 3. What open-source ILS are you using and why did you choose it? 4. When choosing an open-source ILS, where did you go for information (vendor/ILS pages, community groups, personal contacts, etc)? 5. Who was involved in deciding which ILS to use? Adoption (We would like to document specific problems or issues that could be used by other libraries to ease their installation.) 6. Were there any problems during migration? 7. What do you know now that you wish you had known before migration? 8. How long did migration take? Were you on schedule? 9. If getting paid support, how did the vendors (previous and current) help with migration? Implementation (Again, specific examples of the things that worked well or didn't work. How can other libraries learn from this experience?) 10. What kind of (and how much) training did your library staff receive? 11. Did you do any kind of marketing to your patrons? 12. (If haven’t gotten to this part yet), what are your plans for implementation? 13. How much time did implementation take and were you on schedule? Maintenance (This information will be especially important when compared to the library type and size as a reference for other libraries. We would like to get answers that are as specific as possible). 14. How large is your systems staff? Is it sufficient to maintain the system? 15. How much time do you spend each week doing system maintenance? How does this compare to your old system? EXPERIENCES OF MIGRATING TO AN OPEN-SOURCE INTEGRATED LIBRARY SYSTEM | SINGH 53 16. What resources (or channels) do you use to solve your technical support issues? What roles do paid vendors play in maintenance of your system? Advice for other libraries (These open-ended questions are an opportunity to learn more information that we might not have thought of asking about. Responses could provide a valuable resource to other libraries as they plan their implementation). 17. What is the best thing and worst thing about having an open-source ILS? 18. Are there any lessons or advice that you would like to share with other librarians who are thinking about or migrating to an open-source ILS? ACKNOWLEDGMENT This research was funded by an Early Career IMLS grant. ABSTRACT Interest in migrating to open-source integrated library systems is continually growing in libraries. Along with the interest, lack of empirical research and evidence to compare the process of migration brings a lot of anxiety to the interested librari... 2284 ---- Student Use of Library Computers: Are Desktop Computers Still Relevant In Today’s Libraries? Susan Thompson INFORMATION TECHNOLOGY AND LIBRARIES |DECEMBER 2012 20 ABSTRACT Academic libraries have traditionally provided computers for students to access their collections and, more recently, facilitate all aspects of studying. Recent changes in technology, particularly the increased presence of mobile devices, calls into question how libraries can best provide technology support and how it might affect the use of other library services. A two-year study conducted at California State University San Marcos library analyzed student use of computers in the library, both the library’s own desktop computers and laptops owned by students. The study found that, despite the increased ownership of mobile technology by students, they still clearly preferred to use desktop computers in the library. It also showed that students who used computers in the library were more likely to use other library services and physical collections. INTRODUCTION For more than thirty years, it has been standard practice in libraries to provide some type of computer facility to assist students in their research. Originally, the focus was on providing access to library resources, first the online catalog and then journal databases. For the past decade or so, this has expanded to general-use computers, often in an information-commons environment, capable of supporting all aspects of student research from original resource discovery to creation of the final paper or other research product. However, times are changing and the ready access to mobile technology has brought into question whether libraries need to or should continue to provide dedicated desktop computers. Do students still use and value access to computers in the library? What impact does student computer use have on the library and its other services? Have we reached the point where we should reevaluate how we use computers to support student research? California State University San Marcos (CSUSM) is a public university with about nine thousand students, primarily undergraduates from the local area. CSUSM was established in 1991 and is one of the youngest campuses in the 23-campus California State University system. The library, originally located in space carved out of an administration building, moved into its own dedicated library building in 2004. One of the core principles in planning the new building was the vision of the library as a teaching and learning center. As a result, a great deal of thought went into the design of technology to support this vision. Rather than viewing technology’s role as just supporting access to library resources, we expanded its role to providing cradle-to-grave support for the entire research process. We also felt that encouraging students to work in the library would encourage use of traditional library materials and the expertise of library staff, since these resources would be readily available.1 Susan Thompson (sthompsn@csusm.edu) is Coordinator of Library Systems, California State University San Marcos. STUDENT USE OF LIBRARY COMPUTERS | THOMPSON 21 Rethinking our assumptions about library technology’s role in the student research process led us to consider the entire building as a partner in the students’ learning process. Rather than centralizing all computer support in one information commons, we wanted to provide technology wherever students want to use it. We used two strategies. First, we provided centralized technology using more than two hundred desktop computers, most located in four of our learning spaces: reference, classrooms, the media library, and the computer lab. Three of these spaces are configured like information commons, providing full-service research computers grouped around the service desks near each library entrance. In addition, simplified “walk-up” computers are available on every floor. The simplified computers provide limited web services to encourage quick turnaround and no login requirement to ensure ready access to library collections for everyone, including community members. The other major component of our technology plan was the provision of wireless throughout the building, along with extensive power outlets to support mobile computing. More than forty quiet study rooms, along with table “islands” in the stacks, help support the use of laptops for group study. However, only two of these quiet studies, located in the media library, provide desktop computers designed specifically to support group work. In 2009 and again in 2010, we conducted computer use studies to evaluate the success of the library’s technology strategy and determine whether the library’s desktop computers were still meeting student needs as envisioned by the building plan. The goal of the study was to obtain a better understanding of how students use the library’s computers, including types of applications used, computer preferences, and computer-related study habits. The study addressed several specific research questions. First, librarians were concerned that the expanded capabilities of the desktop computers distracted students from an academic and library research focus. Were students using the library’s computers appropriately? Second, the original technology plan had provided extensive support for mobile technology, but the technology landscape has changed over time. How did the increase in student ownership of mobile devices—now at more than 80 percent—affect the use of the desktop computers? Finally, did providing an application-rich computer environment encourage student to conduct more of their studying in the library, leading them more frequently to use traditional library collections and services? This article will focus on the study results pertaining to the second and third research questions. We found that, according to our expectations, students using library computer facilities also made extensive use of traditional library services. However, we were surprised to discover that the growing availability of mobile devices had relatively little impact on students’ continuing preference for library- provided desktop computers. LITERATURE REVIEW The concept of the information commons was just coming into vogue in the early 2000s, when we were designing our library building, and it strongly influenced our technology design as well as building design. Information commons, defined by Steiner as the “functional integration of technology and service delivery,” have become one of the primary methods by which libraries provide enhanced computing support for students studying in the library.2 One of the changes in libraries motivating the information-commons concept is the desire to support a broad range of learning styles, including the propensity to mix academic and social activities. Particularly influential to our design was the concept of the information commons supporting students’ projects “from inception to completion” by providing appropriate technologies to facilitate research, collaboration, and consultation.3 INFORMATION TECHNOLOGY AND LIBRARIES |DECEMBER 2012 22 Providing access to computers appears to contribute to the value of libraries as “place.” Shill and Toner, early in the era of information commons, noted “there are no systematic, empirical studies documenting the impact of enhanced library buildings on student usage of the physical library.” 4 Since then, several evaluations of the information-commons approach seem to show a positive correlation between creation of a commons and higher library usage because students are now able to complete all aspects of their assignments in the library. For example, the University of Tennessee and Indiana University have shown significant increases in gate counts after they implemented their commons.5 While many studies discuss the value of information commons, very few look at why library computers are preferred over computers in other areas on campus. Burke looked at factors influencing students’ choice of computing facilities at an Australian university.6 Given a choice of central computer labs, residence hall computers, and the library’s information commons, most students preferred the computers in the library over the other computer locations, with more than half using the library computers more than once a week. They rated the library most highly on its convenience and closeness to resources. Perhaps the most important trend likely to affect libraries’ support for student technology needs is the increased use of mobile technology. The 2010 nationwide EDUCAUSE Center for Applied Research (ECAR) study, from the same year as the second CSUSM study, showed that 89 percent of students had laptops.7 Other nationwide studies have corroborated this high level of laptop ownership.8 So, does this increased use of laptops and mobile devices have affect the use of desktop computers? The 2010 ECAR study reported that desktop ownership (about 50 percent in 2010) had declined by more than 25 percent between 2006 and 2009, a significant period in the lifetime of CSUSM’s new library building. Pew’s Internet & American Life Project Trend Data showed desktop ownership as the only gadget category in which ownership is decreasing, from 68 percent in 2006 to 55 percent at the end of 2011.9 Some libraries and campuses are beginning to respond to the increase in laptop ownership by changing their support for desktop computers. University of Colorado Boulder, in an effort to decrease costs and increase availability of flexible campus spaces, is making a major move away from providing desktop computers.10 While they found that 97 percent of their students own laptops and other mobile devices, they were concerned that many students still preferred to use desktop computers when on campus. To entice students to bring their laptops to campus, the university is enhancing their support for mobile devices by converting their central computer labs into flexible-use space with plentiful power outlets, flexible furniture, printing solutions, and access to the usual campus software. Nevertheless, it may be premature for all libraries and universities to eliminate their desktop computer support. Tom, Voss, and Scheetz found students want flexibility with a spectrum of technological options.11 Certainly, they want Wi-Fi and power outlets to support their mobile technology. However, students also want conventional campus workstations providing a variety of functions, such as quick print and email computers, long-term workstations with privacy, and workstations at larger tables with multiple monitors that support group work. While the ubiquity of laptops is an important factor today, other forms of mobile devices may become more important in the future. A 2009 Wall Street Journal article reported the trend for business travelers is to rely on smartphones rather than laptops.12 For the last three years, Educause’s Horizon reports have made support for non-laptop mobile technologies one of the top trends. The 2009 Horizon report mentioned that in countries like Japan, “young people equipped STUDENT USE OF LIBRARY COMPUTERS | THOMPSON 23 with mobiles often see no reason to own personal computers.”13 In 2010, Horizon reported an interesting pilot project at a community college in which one group of students was issued mobile devices and another group was not.14 Members of the group with the mobile devices were found to work on the course more during their spare time. The 2011 Horizon Report discusses mobiles as capable devices in their own right that are increasingly users’ first choice for Internet access.15 Therefore, rather than trying to determine which technology is most important, libraries may need to support multiple devices. Trends described in the ECAR and Horizon studies make it clear that students own multiple devices. So how do they use them in the study environment? Head’s interviews with undergraduate students at ten US campuses found that “students use a less is more approach to manage and control all of the IT devices and information systems available to them.”16 For example, in the days before final exams, students were selective in their use of technology to focus on coursework yet remain connected with the people in their lives. The question then may not be which technology libraries should support but rather how to support the right technology at the right time. METHOD The CSUSM study used a mixed-method approach, combining surveys with real-time observation to improve the effectiveness of assessment and generate a more holistic understanding of how library users made their technology choices. The study protocol received exempt status by the university human subjects review board. It was carried out twice over a two-year period to determine whether time of the semester affected usage. In 2009, the study was administered at the end of the spring term, April 15 to May 3. We expected that students near the end of the term would be preparing for finals and completing assignments, including major projects. The 2010 study was conducted near the beginning of the term, February 4 to February 18. We that early term students would be less engaged in academic assignments, particularly major research projects. We carried out each study over a two-week period. An attempt was made to check consistency by duplicating each time and location. Each location was surveyed Monday—Thursday, once in the morning and once in the afternoon during the heavy-use times of 11 a.m. and 2 p.m. The survey locations included two large computer labs (more than eighty computers each), one located near the library reference desk and one near the academic technology helpdesk. Other locations included twenty computers in the media library, a handful of desktop computers in the curriculum area, and laptop users, mostly located on the fourth and fifth floor of the library. The fourth and fifth floor observations also included the library’s forty quiet study rooms. For the 2010 study, the other large computer lab on campus (108 computers), located outside the library, also was included for comparison purposes. We used two techniques: a quantitative survey of library computer users and a qualitative observation of software applications usage and selected study habits. The survey tried to determine the purpose for which the student was using the computer for that day, what their computer preference was, and what other business they might have in the library. It also asked students for their suggestions for changes in the library. The survey was usually completed within the five-minute period that we had estimated and contained no identifying personal information. The survey administrator handed-out the one-page paper survey, along with a pencil if desired, to each student using a library workstation or using a laptop during each designated observation INFORMATION TECHNOLOGY AND LIBRARIES |DECEMBER 2012 24 period. Users who refused to take the survey were counted in the total number of students asked to do the survey. However, users who indicated they refused because they had already completed a survey on a previous observation date were marked as “dup” in the 2010 survey and were not counted again. The “dup” statistic proved useful as an independent confirmation of the popularity of the library computers. The second method involved conducting “over-the-shoulder” observations of students using the library computers. While students were filling out the paper survey, the survey administrator walked behind the users and inconspicuously looked at their computer screens. All users in the area were observed whether or not they had agreed to take the survey. The one exception was users in group-study rooms. The observer did not enter the room and could only note behaviors visible from the door window, such as laptop usage or group studying. Based on brief (one minute or less) observations, administrators noted on a form the type of software application the student was using at that point in time. The observer also noted other, nondesktop computer technical devices in use (specifically laptops, headphones, and mobile devices such as smart phones), and study behaviors, such as groupwork (defined as two or more people working together). The student was not identified on the form. We felt that these observations could validate information provided by the users on the survey. RESULTS We completed 1,452 observations in 2009 and 2,501 observations in 2010. The gate counts for the primary month each study took place—70,607 for April 2009 and 59,668 for February 2010— show the library was used more heavily during the final exam period. The larger number of results the second year was due to more careful observation of laptop and study-group computer users on the fourth and fifth floor and the addition of observations in a nonlibrary computer lab rather than an increase of students available to be observed. The observations looked at application usage, study habits, and devices present, but this article will only discuss the observations pertaining to devices. In 2009, 17 percent of students were observed using laptops (see table1). This number almost doubled in 2010 to 33 percent. Most laptop users were observed on the fourth and fifth floors where furniture, convenient electrical outlets, and quiet study rooms provided the best support for this technology. Very few desktop computers were available, so students desiring to study on these floors have to bring their own laptops. Almost 20 percent of students in 2010 were observed with other mobile technology, such as cell phones or iPods, and 16 percent were wearing headphones, which indicated there was other, often not visible, mobile technology in use. STUDENT USE OF LIBRARY COMPUTERS | THOMPSON 25 Table 1. Mobile Technology Observed In 2009, 1,141 students completed the computer-use survey. However, we were unable to accurately determine the return rate that year. The nature of the study, which surveyed the same locations multiple times, revealed that many of the students were approached more than once to complete the survey. Thus the majority of the refusals to take the survey were because the subject had already completed one previously. The 2010 study accounted for this phenomenon by counting refusals and duplications separately. In 2010, 1,123 students completed the survey out of 1,423 unique asks, resulting in a 79 percent return rate. The 619 duplicates counted represented about half of the 2010 surveys completed and could be considered another indicator of frequent use of the library’s computers. The 2010 results included an additional 290 surveys completed by students using the other large computer lab on campus outside the library. Table 2. Frequency of Computer Use 33% 16% 18% 17% 0% 5% 10% 15% 20% 25% 30% 35% Laptop in use Headphones in use Mobile device in use (cell phone, ipod) 2010 2009 49% 33% 11% 9% 42% 30% 15% 10% 0% 10% 20% 30% 40% 50% 60% Daily When on Campus Several Times a Week Several Times a Month Rarely use Comps in Library 2009 2010 INFORMATION TECHNOLOGY AND LIBRARIES |DECEMBER 2012 26 In both years of the study, 78 percent of students said they preferred to use computers in the library to other computer lab locations on campus. Students also indicated they were frequent users (see table 2). In 2009, 82 percent of students used the library computers frequently—49 percent daily and 33 percent several times a week. The frequency of use in the 2010 early term study dropped about 10 percent to 72 percent but with the same proportion of daily vs. weekly users. Convenience and quiet were the top reasons given by more than half of students as to why they preferred the library computers followed closely by atmosphere. About a quarter of students preferred library computers because of their close access to other library services. Table 3. Preferred Computer to Use in the Library The types of computer that students preferred to use in the library were desktop computers followed by laptops owned by the students (see table 3). It is notable that the preference for desktop computers changed significantly from 2009 and 2010: 84 percent of students preferred desktop computers in 2009 vs. 72 percent in 2010—a 12 percent decrease. Not surprisingly, few students preferred the simplified walk-up computers used for quick lookups. However, we did not expect such little interest in checking out laptops, with only 2 percent preferring that option. The 2010 study added a new question to the survey to better understand the types of technology devices owned by students (see table 4). In 2010, 84 percent of students owned a laptop (combining the netbook and laptop statistics). Almost 40 percent of students owned a desktop, therefore many students owned more than one type of computer. Of the 85 percent of students that indicated they had a cell phone, about one-third indicated they owned smart phones. The majority of students own music players. The one technology students were not interested in was e-book readers, with less than 2 percent indicating ownership. 84% 6% 23% 2% 71% 5% 28% 2% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% Sit-down PC Walk-up PC Own Laptop Laptop Checked Out in Library 2009 2010 STUDENT USE OF LIBRARY COMPUTERS | THOMPSON 27 Table 4. Technology Devices Owned by Students (2010) To understand how the use of technology might affect use of the library in general, the survey asked students what other library services they used on the same day they were using library computers. Table 5 shows survey responses are very similar between the late term 2009 study and the early term in 2010. By far the most popular use of the library, by more than three-quarters of the students, was for study. Around 25 percent of the students planned to meet with others, and 20 percent planned to use the media services. Around 15 percent of students planned to checkout print books, 15 percent planned to use journals, and 10 percent planned to ask for help. The biggest difference for students early in the term was an increased interest (5 percent more) in using the library for study. The late-term students were 9 percent more likely to meet with others. By contrast, users in the nonlibrary computer lab were much less likely to make use of other library services. Only 24 percent of nonlibrary users planned to study in the library, and 8 percent planned to meet with others in the library that day. Use of all other library services was less than 5 percent by the nonlibrary computer users. 1% 1% 7% 31% 40% 52% 59% 77% 0% 20% 40% 60% 80% 100% Kindle/book reader other handheld devices Netbook Smart Phone Desktop Computer Regular Cell phone iPod/MP3 music player Laptop INFORMATION TECHNOLOGY AND LIBRARIES |DECEMBER 2012 28 Table 5. Other Library Services Used In 2010, we also asked users what changes they would like in the library, and 58 percent of respondents provided suggestions. The question was not limited to technology, but by far the biggest request for change was to provide more computers (requested by 30 percent of all respondents). Analysis of the other survey questions regarding computer ownership, and preferences revealed who was requesting more traditional desktops in the library. Surprisingly, most were laptop users; 90 percent of laptop owners wanted more computers and 88 percent of the respondents making this request were located on the fourth and fifth floor, which were almost exclusively laptop users. The next most comments received were remarks indicating student satisfaction with the current library services: 19 percent of students said they were satisfied with current library services and 9 percent praised the library and its services. Commonality of requests dropped quickly at that point, with the fourth most common request being for more quiet (2 percent). 1% 0% 0% 2% 2% 3% 3% 4% 7% 23% 4% 3% 3% 9% 10% 13% 13% 22% 26% 81% 0% 3% 6% 8% 10% 15% 16% 20% 35% 76% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% Other Pick up ILL/Circuit Create a Video/Web Page Use a Reserve Book Ask Questions/Get Help Look for Journals/Newspapers Checkout a Book Use Media Meet With Others Study 2009 2010 non-library STUDENT USE OF LIBRARY COMPUTERS | THOMPSON 29 DISCUSSION The results show that students consistently prefer to use computers in the library, with 78 percent declaring a preference for the library over other computer locations on campus both years of the study. This preference is confirmed by the statistics reported by CSUSM’s campus IT department, which tracks computer login data. This data consistently shows the library computer labs are used more than nonlibrary computer labs, with the computers near the library reference desk as the most popular followed closely by the library’s second large computer lab, which is located next to the technology help desk. For instance, during the 2010 study period, the reference desk lab (80 computers) had 6,247 logins compared to 3,218 logins in the largest nonlibrary lab (108 computers)—double the amount of usage. The data also shows that use of the computers near the reference desk increased by 15 percent between 2007 and 2010. Supporting the popularity of using computers in the library is the fact that most students are repeat customers. Table 2 shows 82 percent of the 2009 late-term respondents used the library computers several times a week with almost half using our computers daily. In contrast, 72 percent of the 2010 early term students used the library computers daily or several times a week. The 10 percent drop in frequency of visits to the library for computing applied to both laptop and desktop users and seems to be largely due to not yet receiving enough work from classes to justify more frequent use. The kind of computer that users prefered changed somewhat over the course of the study. The preference for desktop computers dropped from 84 percent of students in 2009 to 72 percent in 2010 (see table 3). One reason for this 12 percent drop may be related to how the survey was adminstered. The 2010 study did a more thorough job of surveying the fourth and fifth library floors where most laptop users are. As a result, the laptop floors represented 29 percent of the response in 2010 vs. only 13 percent in 2009. These numbers are also reflected in the proporation of laptops observed each year—33 percent in 2010 vs. 17 percent in 2009 (see table 1). The drop in desktop computer preference is interesting because it was not matched by an equally large increase in laptop preference, which only increased by 5 percent. The other reason for the decrease in desktop preference is likely due to the larger change seen nationwide in student laptop ownership. For instance, the Pew study of gadget ownership showed a 13 percent drop in desktop ownership over a five-year period, 2006–2011, while at the same time laptop ownership almost doubled from 30 percent to 56 percent.17 However, it is interesting to note that, according to the Pew study, in 2011 the percent of adults who owned each type of device was nearly equal— 55 percent for desktops and 56 percent for laptops. The 2010 survey tried to better understand students’ preferences by identifying all the kinds of technology they had available to them. We found that 77 percent of CSUSM students owned laptops and an additional 7 percent owned the netbook form of laptops (see table 4). The combined 84 percent laptop ownership is comparable with the 2010 ECAR study’s finding of 89 percent student laptop ownership nationwide.18 This high level of laptop ownership may explain why the users who preferred laptop computers almost all preferred to use their own rather than laptops checked out in the library. Despite the high laptop ownership and decrease in desktop preference, it is significant that the majority of CSUSM students still prefer to use desktop computers in the library. Aside from the 72 percent of respondents who specifically stated a preference for desktop computers, the top suggestion for library improvement was to add more desktop computers, requested by 38 percent INFORMATION TECHNOLOGY AND LIBRARIES |DECEMBER 2012 30 of respondents. Further analysis of the survey data revealed that it was the laptop owners and the fourth and fifth floor laptop users who were the primary requestors of more desktop computers. To try to better understand this seemingly contradictory behavior, we have done some further investigation. Anecdotal conversations with users during the survey indicated that convenience and reliability are two factors affecting student’s decision to use desktop computers. The desktop computers’ speed and reliable Internet connections were regarded as particularly important when uploading a final project to a professor, with some students stating they came to the library specifically to upload an assignment. In May 2012, the CSUSM library held a focus group that provided additional insight to the question of desktops vs. laptops. All of the eight-student focus group participants owned laptops, yet all eight participants indicated that they preferred to use desktop computers in the library. When asked why, participants indicated the reliability and speed of the desktop computers and the convenience of not having to remember to bring their laptop to school and “lug” it around. Another factor influencing the convenience factor may be that our campus does not require that students own a laptop and bring it to class, so they may have less motivation to travel with their laptop. Supporting the idea that students perceive different benefits for each type of computer, six of the eight participants owned a desktop computer in addition to a laptop. The 2010 study also showed that students see value in owning both a desktop and a laptop computer, since the 40 percent ownership of desktop computers overlaps the 84 percent ownership of laptops (see table 4). Table 6. Reasons Students Prefer Using Library Computer Areas For almost half of the students surveyed, one of the reasons for their preference for using computers in the library was either the ready access to library services or staff (see table 6). Even more significant, when specifically asked what else they planned to do in the library that day besides using the computer (see table 5), more than 80 percent of the students indicated that they intended to use the library for purposes other than computing. The top two uses for the library were studying (76 percent in 2009, 81 percent in 2010) and meeting with others (35/26 percent), indicating the importance of the library as place. The most popular library service was the media 0% 5% 10% 15% 20% 25% 30% Library Services are Close Library Staff are Close 2009 2010 STUDENT USE OF LIBRARY COMPUTERS | THOMPSON 31 library (20/22 percent) followed by collections with 16/13 percent planning to checkout a book and 15/13 percent planning to look for journals and newspapers. It is interesting that the level of use of these library services was similar whether early or late in the term. The biggest difference was that early term students were less likely to be working with a group but were slightly more likely to be engaged in general studying. Even the less-used services, such as asking a question (10 percent) or using a reserve book (8 percent), exhibited an appropriate amount of usage if one looks at the actual numbers. For example, 8 percent of 1,123 2010 survey respondents represent 90 students who used reserve materials sometime during the 8 hours of the two-week survey period. To put the use of the library by computer users into perspective, we also asked students using the nonlibrary computer lab if they planned to use the library sometime that same day. Only 24 percent of the nonlibrary computer users planned to study in the library that day vs. 81 percent of the library computer users; only 4 percent planned to use media vs. 24 percent; and 2 percent planned to check out a book vs. 13 percent. The implication is clear that students using computers in the library are much more likely to use the library’s other services. We usually think of providing desktop computers as a service for students, and so it is. However, the study results show that providing computers also benefits the library itself. It reinforces its role as place by providing a complete study environment for students and encouraging all study behaviors including communication and working with others. The popularity of the library computers provide us with a “captive audience” of repeat customers. CONCLUSION The CSUSM library technology that was planned in 2004 is still meeting students’ needs. Although most of our students own laptops, most still prefer to use desktop computers in the library. In fact, providing a full-service computer environment to support the entire research process benefits the entire library. Students who use computers in the library appear to conduct more of their studying in the library and thus make more use of traditional library collections and services. Going forward, several questions arise for future studies. CSUSM is a commuter school. Students often treat their work space in the library as their office for the day, which increases the importance of a reliable and comfortable computer arrangement. One question that could be asked is whether the results would be different for colleges where most students live on campus or nearby. If the university requires that all students own their own laptop and expects them to bring them to class, how does that affect the relevance of desktop computers in the library? The 2010 study was completed just a few weeks before the first iPad was introduced. Since students have identified convenience and weight as reasons for not carrying their laptops, are tablets and ultra-light computers, like the MacBook Air, more likely to be carried on campus by students and used them more frequently for their research? How important is it to have a supportive mobile infrastructure with features such as high speed wifi, ability to use campus printers, and access to campus applications? Are students using smart phones and other mobile devices for study purposes? In fact, are we focusing too much on laptops, and are other mobile devices starting to take over that role? This study’s results make it clear that we can’t just look at data such as ECAR’s, which show high laptop ownership, and assume that means students don’t want or won’t use library computers. As INFORMATION TECHNOLOGY AND LIBRARIES |DECEMBER 2012 32 the types of mobile devices continue to grow and evolve, libraries should continue to develop ways to facilitate their research role. However, the bottom line may not be that one technology will replace another but rather that students will have a mix of devices and will choose which device is best suited to a particular purpose. Therefore libraries, rather than trying to pick which device to support, may need to develop a broad-based strategy to support them all. REFERENCES 1. Susan M. Thompson and Gabriella Sonntag. “Chapter 4: Building for Learning: Synergy of Space, Technology and Collaboration.” Learning Commons: Evolution and Collaborative Essentials. Oxford: Chandos Publishing (2008): 117-199. 2. Heidi M. Steiner and Robert P. Holley, “The Past, Present, and Possibilities of Commons in the Academic Library,” Reference Librarian 50, no. 4 (2009): 309–332. 3. Michael J. Whitchurch and C. Jeffery Belliston,“Information Commons at Brigham Young University: Past, Present, and Future,” Reference Services Review 34, no. 2 (2006): 261–78. 4. Harold Shill and Shawn Tonner, “Creating a Better Place: Physical Improvements in Academic Libraries, 1995–2002,” College & Research Libraries 64 (2003): 435. 5. Barbara I. Dewey, “Social, Intellectual, and Cultural Spaces: Creating Compelling Library Environments for the Digital Age,” Journal of Library Administration 48, no. 1 (2008): 85–94; Diane Dallis and Carolyn Walters, “Reference Services in the Commons Environment,” References Services Review 34, no. 2 (2006): 248–60. 6. Liz Burke et al., “Where and Why Students Choose to Use Computer Facilities: A Collaborative Study at an Australian and United Kingdom University,” Australian Academic & Research Libraries 39, no. 3 (September 2008): 181–97. 7. Shannon D. Smith and Judith Borreson Caruso, The ECAR Study of Undergraduate Students and Information Technology, 2010 (Boulder, CO: Educause Center for Applied Research, October 2010), http://net.educause.edu/ir/library/pdf/ers1006/rs/ers1006w.pdf (accessed March 21, 2012). 8. Pew Internet & American Life Project, “Adult Gadget Ownership Over Time (2006–2012),” http://www.pewinternet.org/Static-Pages/Trend-Data-(Adults)/Device-Ownership.aspx (accessed June 14, 2012); The Horizon Report: 2009 Edition, The New Media Consortium and EDUCAUSE Learning Initiative, http://net.educause.edu/ir/library/pdf/HR2011.pdf (accessed March 21, 2012); The Horizon Report: 2010 Edition, The New Media Consortium and EDUCAUSE Learning Initiative, http://net.educause.edu/ir/library/pdf/HR2011.pdf (accessed March 21, 2012); The Horizon Report: 2011 Edition, The New Media Consortium and EDUCAUSE Learning Initiative, http://net.educause.edu/ir/library/pdf/HR2011.pdf (accessed March 21, 2012). 9. Pew Internet, “Adult Gadget Ownership.” http://net.educause.edu/ir/library/pdf/ers1006/rs/ers1006w.pdf http://www.pewinternet.org/Static-Pages/Trend-Data-(Adults)/Device-Ownership.aspx http://net.educause.edu/ir/library/pdf/HR2011.pdf http://net.educause.edu/ir/library/pdf/HR2011.pdf http://net.educause.edu/ir/library/pdf/HR2011.pdf STUDENT USE OF LIBRARY COMPUTERS | THOMPSON 33 10. Deborah Keyek-Franssen et al., Computer Labs Study University of Colorado Boulder Office of Information Technology October 7, 2011, http://oit.colorado.edu/sites/default/files/LabsStudy- penultimate-10-07-11.pdf (accessed June 15, 2012). 11. J. S. C. Tom, K. Voss, and C. Scheetz[Full names?], “The Space is the Message: First Assessment of a Learning Studio,” Educause Quarterly 31, no. 2 (2008), http://www.educause.edu/ero/article/space-message-first-assessment-learning-studio (accessed June 25, 2012). 12. Nick Wingfield, “Time to Leave the Laptop Behind,” Wall Street Journal, February 23, 2009, http://online.wsj.com/article/SB122477763884262815.html (accessed June 15 2012). 13. The Horizon Report: 2009 Edition. 14. The Horizon Report: 2010 Edition. 15. The Horizon Report: 2011 Edition. 16. Alison J. Head and Michael B. Eisenberg, “Balancing Act: How College Students Manage Technology While in the Library During Crunch Time,” Project Information Literacy Research Report, Information School, University of Washington, October 12, 2011, http://projectinfolit.org/pdfs/PIL_Fall2011_TechStudy_FullReport1.1.pdf (accessed June 14, 2012). 17. Pew Internet, “Adult Gadget Ownership.” 18. Smith and Caruso, ECAR Study. http://oit.colorado.edu/sites/default/files/LabsStudy-penultimate-10-07-11.pdf http://oit.colorado.edu/sites/default/files/LabsStudy-penultimate-10-07-11.pdf http://www.educause.edu/ero/article/space-message-first-assessment-learning-studio http://online.wsj.com/article/SB122477763884262815.html http://projectinfolit.org/pdfs/PIL_Fall2011_TechStudy_FullReport1.1.pdf Table 1. Mobile Technology Observed DISCUSSION 2309 ---- Social Contexts of New Media Literacy: Mapping Libraries Elizabeth Thorne-Wallington INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2013 53 ABSTRACT This paper examines the issue of universal library access by conducting a geospatial analysis of library location and certain socioeconomic factors in the St. Louis, Missouri, metropolitan area. Framed around the issue of universal access to Internet, computers, and technology (ICT) for digital natives, this paper demonstrates patterns of library location related to race and income. This research then raises important questions about library location, and, in turn, how this impacts access to ICT for young people in the community. OBJECTIVES AND PURPOSE The development and diffusion of new media and digital technologies has profoundly affected the literacy experiences of today’s youth.1 Young people today develop literacy through a variety of new media and digital technologies.2 The dissemination of these resources has also allowed for youth to have literacy-rich experiences in an array of different settings. Ernest Morrell, literacy researcher, writes, As English educators, we have a major responsibility to help future English teachers to redefine literacy instruction in a manner that is culturally and socially relevant, empowering, and meaningful to students who must navigate a diverse and rapidly changing world.3 This paper will explore how mapping and geographic information systems (GIS) can help illuminate the cultural and social factors related to how and where students access and use new media literacies and digital technology. Libraries play an important role in encouraging new media literacy development;4 yet access to libraries must be understood through social and cultural contexts. The objective of this paper is to demonstrate how mapping and GIS can be used to provide rigorous analysis of how library location in St. Louis, Missouri, is correlated with socioeconomic factors defined by the US Census including median household income and race. By using GIS, the role of libraries in providing universal access to new media resources can be displayed statistically, both challenging and confirming previously held beliefs about library access. This analysis raises new questions about how libraries are distributed across the St. Louis area and whether they truly provide universal and equal access. Elizabeth Thorne-Wallington (ethornew@wustl.edu) is a doctoral student in the Department of Education at Washington University in St. Louis. mailto:ethornew@wustl.edu INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2013 54 LITERATURE REVIEW Advances in technologies are transforming the very meaning of literacy.5 Traditionally, literacy has been defined as the ability to understand and make meaning of a given text.6 The changing global economy requires a variety of digital literacies, which schools do not provide.7 Instead, young people acquire literacy through a multitude of in- and out-of-school experiences with new media and digital technology.8 Libraries play a vital role in supporting new media literacy by offering out-of-school access and experiences. To understand the role that libraries play in offering access to new media literacy technologies, a few key concepts must be defined. First is the concept of the digital native. Those born around 1980, who have essentially grown up with technology, are known as digital natives.9 Digital natives are expected to have a base knowledge of technology and to be able to pick up and learn new technology quickly because of that base knowledge. Digital natives have been exposed to technology from a young age and are adept at using a variety of digital technologies. The suggestion is that young people can quickly learn to make use of the new media and technology available in a specific location. Key to any discussion of digital natives is the concept of the digital divide. The digital divide has been a central issue of education policy since the mid-1990s.10 Early work on the digital divide was concerned primarily with equal access.11 More recently, however, the idea of a “binary digital divide” has been replaced by studies focusing on a multidimensional view of the digital divide.12 Hargattai asserts that even among digital natives, there are large variations in Internet skills and uses correlated with socioeconomic status, race, and gender.13 These variations call for a nuanced study examining social and cultural factors associated with new media literacy, including out-of- school contexts. The concept of literacy and learning in out-of-school contexts has a strong historical context. Hull and Schultz provide a review of the theory and research on literacy in out-of-school settings.14 A variety of studies, including self-guided literacy activities, after-school programs, and reading programs were reviewed, and the significance of out-of-school learning opportunities was supported by these studies. Importantly for the research here, research has also been done on the use of digital technology in out-of-school settings. Lankshear and Knobel examine out-of-school practices extensively with their work on new literacies.15 Lankshear and Knobel also make clear the complexity of out-of-school experiences among young people. Students participate in nontraditional literacy activities such as blogging and remix in a variety of out-of-school contexts, from home computers to community-based organizations to libraries. Most importantly, Lankshear and Knobel found that the students did connect what they learned in the classroom with these out-of-school activities. The connection between out-of-school literacies and in-school learning has also been studied. Education policy researcher Allan Luke writes, The redefined action of governments . . . is to provide access to combinatory forms of enabling capital that enhance students’ possibilities of putting the kinds of practices, texts, and discourses SOCIAL CONTEXTS OF NEW MEDIA LITERACIES: MAPPING LIBRARIES| THORNE-WALLINGTON 55 acquired in schools to work in consequential ways that enable active position taking in social fields.16 Collins writes about this relationship between in- and out-of-school literacies. Collins writes in her case study that there are a variety of “imports” and “exports” in terms of practices. That is, skill transaction works in both directions, with skills learned out of school used in school, and skills learned in school used out of school.17 Skerett and Bomer make this connection even more explicit when looking at adolescent literacy practices.18 Their article examines how a teacher in an urban classroom drew on her students’ out-of-school literacies to inform teaching and learning in a traditional literacy classroom. The authors found that the teacher in their study was able to create a curriculum that engaged students by inviting them to use literacies learned in out-of-school settings. However, the authors write that this type of literacy study was taxing and time-consuming for both the teacher and the student. Still, it is clear that connections between in- and out-of-school literacies can be made. The role libraries play in making this connection has not been studied as extensively. Yet it is clear that young people do use libraries to access technology. Becker et al., found that nearly half of the nation’s 14 to 18 year olds had used a library computer within the past year. Becker et al. additionally found that for poor children and families, libraries are a “technological lifeline.” Among those below the poverty line, 61 percent used public library computers and the Internet for educational purposes.19 Tripp writes that libraries have long played an important role in helping people gain access to digital media tools, resources, and skills.20 Tripp writes that libraries should capitalize on the potential of new media to engage young people. Additionally, Tripp argues that librarians need to develop skills to train young people to use new media. The idea that libraries are important in meeting the need is further supported by the recent grants, totaling $1.2 million, by the John D. and Catherine T. MacArthur Foundation to build “innovative learning labs for teens” in libraries. This grant making was a response to President Obama’s “Educate to Innovate” campaign, a nationwide effort to bring American students to the forefront in science and math.21 This literature review demonstrates that the body of research currently available focuses on digital natives and the digital divide, but that the research lacks the nuance needed to capture the complexity of social and cultural contexts surrounding the issue. This literature review further demonstrates both the importance of new media literacy and out-of-school learning, as well as the key role that libraries play in supporting these learning opportunities. The study provided here uses GIS analysis to demonstrate important socioeconomic and cultural factors that surround libraries and library access. First, I describe the role of GIS in understanding context. Next, I describe the methods used in this paper. Finally, I analyze the results and implications for the study. Geographic Information Systems Analysis in Education There is a burgeoning body of research which uses geographic information systems (GIS) to better understand socioeconomic and cultural contexts of education and literacy issues.22 INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2013 56 There are several key works that link geography and social context. Lefebvre defines space as socially produced, and he writes that space embodies social relationships shaped by values and meanings. He describes space as a tool for thought and action or as a means of control and domination. Lefebvre writes that there is a need for spatial reappropriation in everyday urban life. The struggle for equality, then, is central to the “right of the city.”23 The unequal distributions of resources in the city help to maintain social and economic advantaged positions, which is important to the analysis here of library access. This unequal distribution of resources continues today. De Souza Briggs and others write that there is clear geographical segregation in American cities today.24 This is seen in housing choice, racial attitudes, and discrimination, as well as metropolitan development and policy coalitions. In the conclusion of his book, De Souza Briggs writes that housing choice is limited for low-SES minorities, and these limitations produce myriad social effects. Again, this finding is important to the contexts of where libraries are located. Jargowsky writes of similar findings.25 Like De Souza Briggs, Jargowsky focuses on the role that geography plays in terms of neighborhood and poverty. Jargowsky even finds social characteristics of these neighborhoods: there is a higher prevalence of single-parent families, lower educational attainment, a higher level of dropouts, and more children living in poverty. Important here, though, is that all such characteristics can be displayed geographically, which means that varying housing, economic, and social conditions can be displayed with library locations. Soja goes beyond the geographic analysis offered by De Souza Briggs and Jargowsky and writes that space should be applied to contemporary social theory.26 Soja found that spatiality should be used in terms of critical human geography to advance a theory of justice on multiple levels. He writes that injustice is spatially construed and that this spatiality shapes social injustice as much as social injustice shapes a specific geography. This understanding, then, shapes how I approach the study of new media literacies as influenced by cultural and social factors. These factors are particularly prevalent in the St. Louis, Missouri, area. Colin Gordon reiterates the arguments of Lefbvre Jargowsky and De Souza Briggs in arguing that St. Louis is a city in decline.27 By providing maps that project housing policies, Gordon is able to provide a clear link between historical housing policies such as racial covenants and current urban decline. Gordon is able to show that vast populations are moving out of St. Louis City and into the county, resulting in a concentration of minority populations in the northern part of the city. Gordon argues that the policies and programs offered by St. Louis City have only exacerbated the problem and led to greater blight.28 In terms of literacy, Morrell makes the most explicit connection between literacy and mapping with a study that used a community-asset mapping activity to make the argument that teachers need to make an explicit connection between literacy at school and the new literacies experienced in the community.29 The significance of this is that GIS can be used to illuminate the social and economic contexts of new media literacy opportunities as well, which in turn could help inform social dialogue about the availability of and access to informal education opportunities for new media literacy. SOCIAL CONTEXTS OF NEW MEDIA LITERACIES: MAPPING LIBRARIES| THORNE-WALLINGTON 57 METHODS AND DATA The GIS analysis performed here concerns library locations in the St. Louis metropolitan area, including St. Louis City and St. Louis County. The St. Louis metropolitan area was chosen because of past research mapping the segregation of the city, largely because the city and county are so clearly segregated racially and economically along the north–south line. This segregation is striking when displayed geographically and illuminating when mapped with library location. Maps were created using TIGER files (www.census.gov/geo/maps-data/data/tiger.html) and US Census data (http://factfinder2.census.gov/faces/nav/jsf/pages/index.xhtml), both freely available to the public via Internet download. Libraries were identified using the St. Louis City Library’s “Libraries & hours” webpage (www.slpl.org/slpl/library/article240098545.asp), the St. Louis County Library “Locations & Hours” webpage (www.slcl.org/about/hours_and_locations), Google Maps (www.maps.google.com), and the yellow pages for the St. Louis metropolitan area (www.yellowpages.com). The address of each library was entered into iTouchMap (http://itouchmap.com ) to indentify the latitude and longitude of the library. A spreadsheet containing this information was then loaded into the GIS software and displayed as X–Y data. The maps were then displayed using median household income, African American population, and Latino and Hispanic population as obtained from the US Census at census tract level. For median household income, the data was from 1999. For all other census data, the year was 2010. For district-level data, communication arts data from the Missouri Department of Elementary and Secondary Education (MODESE) website (http://dese.mo.gov/dsm ), was entered into Microsoft Excel, and then displayed on the maps. The data is district level, representing all grades tested for Communication Arts across all district schools. The MODESE data was from 2008, the most recent year available at the time the analysis was performed. The Communication Arts data was taken from the Missouri Assessment Program test. This test is given yearly across the state to all public school students. The state then collects the data and makes it available at the state, district, and school level. The data used here is district-level data. Scores are broken into four categories: advanced, proficient, basic, and below basic. The groups for proficient and advanced were combined to indicate the district’s success on the MAP test. These are the two levels generally considered acceptable or passing by the state.30 Before looking at patterns of library location and these socioeconomic and educational factors, density analysis was performed on the library locations using ESRI ArcGIS software, version 9.0, to analyze whether clustering was statistically significant. This analysis was used to demonstrate whether libraries were clustered in a statistically significant pattern, or if location was random. The Nearest Neighbor tool of ArcGIS was used to determine if a set of features, in this case the libraries, shows a statistically significant level of clustering. This was done by measuring the distance from each library to its single nearest neighbor and calculating the average distance of all the measurements. The tool then created a hypothetical set of data with the same number of features, but placed randomly within the study area. Then an average distance was calculated for these features and compared to the real data. That is, a hypothetical random set of locations was compared to the set of actual library locations. A near-neighbor index was produced, which expresses the ratio of the observed distance divided by the distance from the hypothetical data, thus comparing the two sets.31 This score was then standardized, producing a z-score, reported below in the results section. http://www.census.gov/geo/maps-data/data/tiger.html http://factfinder2.census.gov/faces/nav/jsf/pages/index.xhtml http://www.slpl.org/slpl/library/article240098545.asp http://www.slcl.org/about/hours_and_locations http://www.maps.google.com/ http://www.yellowpages.com/ http://dese.mo.gov/dsm INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2013 58 RESULTS AND CONCLUSIONS Using the Nearest Neighbor tool produced a z-score of -3.08, showing that the data is clustered beyond the 0.01 significance level. This means that there is a less than 1 percent chance that library location would be clustered to this degree based on chance. Knowing, then, that library location is not random, we can now examine socioeconomic patterns of the areas where libraries are located. Figure 1 shows library location and population of individuals under the age of 18 at the census tract level for St. Louis City and County, using data from the 2010 US Census. To clarify, the city and county are divided by the bold black line crossing the middle of the map, the only such boundary in figure 1, where the county is the larger geographic area. Library location is important because previous research shows that young people use informal learning environments to access new media technologies,32 and libraries are a key informal learning environment.33 This map demonstrates, however, that libraries are not located in census tracts with the highest populations of individuals under the age of 18 in St. Louis City and County. In fact, for all the tracts with the highest number of individuals under the age of 18, there are zero libraries located in these tracts. This is especially concerning given that young people may have less access to transportation, so their access of facilities in neighboring census tracts may be quite limited. Figure 1. Number of individuals under the age of 18 by census tract and library location in St. Louis City and St. Louis County. Source: 2010 US Census. SOCIAL CONTEXTS OF NEW MEDIA LITERACIES: MAPPING LIBRARIES| THORNE-WALLINGTON 59 Figure 2 includes maps showing library locations in St. Louis City and County in terms of poverty and race by census tract level, as well as ACT score by district, represented by the bold lines, where St. Louis City is represented by a single district, the St. Louis Public School district. Median household income in indicated by the gray shading, with white areas not having data available. First, census tracts with low median household income are clustered in the northern part of the city and county. There are four libraries in the northern half of the city, and eleven libraries in the central and southern parts of the city. There are fewer libraries in the census tracts with low median household income. Figure 2. Median household income, ACT score, and library location, St. Louis City and County. Source: 2010 US Census and Missouri Department of Elementary and Secondary Education, 2010, www.modese.gov. While the Nearest Neighbor analysis has already demonstrated the libraries are significantly clustered, the maps seem to suggest the pattern of that clustering. This is especially concerning given the report by Becker that 61 percent of those living below the poverty line use libraries to access the Internet.34 First, in terms of median household income, it does appear that many libraries are located in higher income areas of the city and county. While the libraries appear to be http://www.modese.gov/ INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2013 60 clustered centrally, and particularly near major freeways, there appear to be libraries in many of the higher income census tracts. Adding to the concern of location is that of access to these library locations. For those living below the poverty line, transportation is often a prohibitive cost, so access from public transportation should also be a major concern for libraries. Additionally, in a pattern repeated in figure 4, the location of libraries does not appear to have any effect on ACT scores, but there are clearly higher ACT scores in wealthier areas of the city and county. This is not to say that there is a statistical relationship between ACT score and library location, but rather to look at the spatial patterns of each in order to note similarities and differences in these patterns. Figure 3 shows library location by race, including African American or Black and Hispanic or Latino. First, it is important to note that patterns of race in St. Louis have been carefully documented by Gordon.35 The St. Louis area is clearly a highly segregated region, which makes the social contexts of libraries in the St. Louis area even more important. This map demonstrates that while there are many libraries in the northern parts of St. Louis City and County, none of these libraries is located in the census tracts with the highest populations of those identifying themselves as African American or Black in either the city or county. This raises questions about the inequality of access to the libraries. On the other hand, the densest populations of those identifying themselves as Hispanic or Latino are in the southern part of the city, but not the county. There is a library located in one of those tracts. It appears the areas with higher concentrations of African Americans or Blacks have fewer libraries, while areas with the higher concentrations of Latinos or Hispanics are located in the southern parts of the city that do have libraries. It is important to note, however, that the concentrations of Latinos and Hispanics is quite low, and those areas are majority white census tracts. As noted above, beyond location, access from public transportation is also an important issue. At the same time, the clustering and patterns shown on these maps raise key issues about access based on income and race. Libraries are not located in areas with low median household income or in areas with high concentrations of African Americans or Blacks. This raises serious questions about why libraries are located where they are, and whether the individuals located in these areas have equal access to library resources, particularly new media technologies. SOCIAL CONTEXTS OF NEW MEDIA LITERACIES: MAPPING LIBRARIES| THORNE-WALLINGTON 61 Figure 3. African American or Black and Hispanic, library location, St. Louis City and County. Source: 2010 US Census. The final map raises a slightly different issue, one of test scores and student achievement. Figure 4 shows library location by percent proficient or advanced on the Missouri Achievement Program test by district. Beyond the location of the libraries, one factor that stands out is that the areas with the lowest percent proficient or advanced are also the areas with the lowest median household income and the highest percentage of those identifying as African American or Black. Here an interesting pattern emerges. While there are many libraries in the city and northern part of the county, the percent proficient or advanced on the communication arts portion of exam is quite low (20–30 percent). On the other hand, in the western part of the county, there are few libraries, but the percent proficient or advanced is at its highest level. This suggests that there may not be a strong connection between achievement on the MAP exam and library location, similar to the lack of relationship seen in between ACT average score and library location in figure 2. At the same time, there does appear to be a correlation between race, income, and test scores. This correlation is noted throughout the literature on student achievement.36 Clearly, these maps raise important questions such as how and why libraries are located in a certain area, who uses libraries in a given area, as well as what other informal learning environments and community assets exist in these areas. What is made clear by the maps, though, is that GIS can be used as a tool to help understand the context of new media literacy. INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2013 62 Figure 4. Proficient or advanced, Communication Arts MAP by district, 2009, and library location. Source: Missouri Department of Elementary and Secondary Education, 2010, www.modese.gov. SIGNIFICANCE These results demonstrate that GIS can be used to illuminate the social, cultural, and economic complexity that surrounds informal learning environments, particularly libraries. This can help demonstrate not only where young people have the opportunity to use new media literacy, but also the complex contextual factors surrounding those opportunities. Paired with traditional qualitative and quantitative work, GIS can provide an additional lens for understanding new media literacy ecologies, which can help inform dialogue about this topic. For the results of this study, there does appear to be a relationship between library location and race and income. This study illuminates the complex contextual factors affecting libraries. Because of the important role that libraries can play in offering young people out of school learning opportunities, particularly in terms of access to new media resources, these contextual factors are important to ensuring equal access and opportunity for all. http://www.modese.gov/ SOCIAL CONTEXTS OF NEW MEDIA LITERACIES: MAPPING LIBRARIES| THORNE-WALLINGTON 63 REFERENCES 1. Ernest Morrell, “Critical Approaches to Media in Urban English Language Arts Teacher Development,” Action in Teacher Education 33, no. 2 (2011): 151–71, doi: 10.1080/01626620.2011.569416. 2. Mizuko Ito et al., Hanging Out, Messing Around, and Geeking Out: Kids Living and Learning with New Media (Cambridge: MIT Press/MacArthur Foundation, 2010). 3. Morrell, “Critical Approaches to Media in Urban English Language Arts Teacher Development.” 4. Lisa Tripp, “Digital Youth, Libraries, and New Media Literacy,” Reference Librarian 52, no. 4 (2011): 329–41, doi: 10.1080/02763877.2011.584842. 5. Gunther Kress, Literacy in the New Media Age (London: Routledge, 2003). 6. Ibid. 7. Donna E. Alvermann and Alison H. Heron, “Literacy Identity Work: Playing to Learn with Popular Media,” Journal of Adolescent & Adult Literacy 45, no. 2 (2001): 118–22. 8. Colin Lankshear and Michele Knobel, New Literacies: Everyday Practices and Classroom Learning (Maidenshead: Open University Press, 2006). 9. John Palfrey and Urs Gasser, Born Digital: Understanding the First Generation of Digital Natives (New York: Perseus, 2009). 10. Karin M. Wiburg, “Technology and the New Meaning of Educational Equity,” Computers in the Schools 20, no. 1–2 (2003): 113–28, doi: 10.1300/J025v20n01_09. 11. Rob Kling, “Learning About Information Technologies and Social Change: The Contribution of Social Informatics,” Information Society 16, no. 3 (2000): 212–24. 12. James R. Valadez and Richard P. Durán, “Redefining the Digital Divide: Beyond Access to Computers and the Internet,” High School Journal 90, no. 3 (2007): 31–44, http://www.jstor.org/stable/40364198. 13. Eszter Hargittai, “Digital Na(t)ives? Variation in Internet Skills and Uses among Members of the ‘Net Generation,’” Sociological Inquiry 80, no. 1 (2010): 92–113, doi: 10.1111/j.1475- 682X.2009.00317.x. 14. Glynda Hull and Katherine Schultz, “Literacy and Learning out of School: A Review of Theory and Research,” Review of Educational Research 71, no. 4 (2001): 575–611, http://www.jstor.org/stable/3516099. 15. Colin Lankshear and Michele Knobel, New Literacies. http://www.jstor.org/stable/40364198 http://www.jstor.org/stable/3516099 INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2013 64 16. Allan Luke, “Literacy and the Other: A Sociological Approach to Literacy Research and Policy in Multilingual Societies,” Reading Research Quarterly 38, no. 1 (2003): 132–41, http://www.jstor.org/stable/415697. 17. Stephanie Collins, “Breadth and Depth, Imports and Exports: Transactions between the In-and Out-of-School Literacy Practices of an ‘at Risk’ Youth,” in Cultural Practices of Literacy: Case Studies of Language, Literacy, Social Practice, and Power (Mahwah, NJ: Lawrence Erlbaum, 2007). 18. Allison Skerrett and Randy Bomer, “Borderzones in Adolescents Literacy Practices: Connecting out-of-School Literacies to the Reading Curriculum,” Urban Education 46, no. 6 (2011): 1256–79, doi: 10.1177/0042085911398920. 19. Samantha Becker et al., Opportunity for All: How the American Public Benefits from Internet Access at U.S. Libraries (Washington, DC: Institute of Museum and Library Services). 20. Lisa Tripp, “Digital Youth, Libraries, and New Media Literacy.” 21. Nora Fleming, “Museums and Libraries Awarded $1.2m to Build Learning Labs,” Education Week (blog), December 7, 2012, http://blogs.edweek.org/edweek/beyond_schools/2012/12/museums_and_libraries_awarde d_12_million_to_build_learning_labs_for_youth.html. 22. See William F. Tate IV and Mark Hogrebe, “From Visuals to Vision: Using Gis to Inform Civic Dialogue About African American Males,” Race Ethnicity and Education 14, no. 1 (2011), 51– 71, doi: 10.1080/13613324.2011.531980; Mark C. Hogrebe and William F. Tate IV, “School Composition and Context Factors that Moderate and Predict 10th-Grade Science Proficiency,” Teachers College Record 112, no. 4 (2010), 1096–1136; Robert J. Sampson, Great American City: Chicago and the Enduring Neighborhood Effect (Chicago: University of Chicago Press, 2012). 23. Henri Lefebvre, The Production of Space (Oxford: Blackwell, 1991). 24. Xavier De Souza Briggs, The Georgraphy of Opportunity: Race and Housing Choice in Metropolitan America (Washington, DC: Brookings Institute Press, 2005). 25 Paul Jargowsky, Poverty and Place: Ghettos, Barrios, and the American City (New York: Russell Sage Foundation, 1997). 26. Edward W. Soja, Postmodern Geographies: The Reassertion of Space in Critical Social Theory (New York: Verso, 1989). 27. Collin Gordon, Mapping Decline: St. Louis and the Fate of the American City (University of Pennsylvania Press, 2008). 28. Ibid. http://www.jstor.org/stable/415697 http://blogs.edweek.org/edweek/beyond_schools/2012/12/museums_and_libraries_awarded_12_million_to_build_learning_labs_for_youth.html http://blogs.edweek.org/edweek/beyond_schools/2012/12/museums_and_libraries_awarded_12_million_to_build_learning_labs_for_youth.html SOCIAL CONTEXTS OF NEW MEDIA LITERACIES: MAPPING LIBRARIES| THORNE-WALLINGTON 65 29. Ernest Morrell, “Critical Approaches to Media in Urban English Language Arts Teacher Development.” 30. Missouri Department of Elementary and Secondary Education, http://dese.mo.gov/dsm/. 31. David Allen, Gis Tutorial II: Spatial Analysis Workbook (Redlands, CA: ESRI Press, 2009). 32. Becker et al., Opportunity for All: How the American Public Benefits from Internet Access at U.S. Libraries (Washington, DC: Institute of Museum and Library Services). 33. Lisa Tripp, “Digital Youth, Libraries, and New Media Literacy.” 34. Becker et al., Opportunity for All: How the American Public Benefits from Internet Access at U.S. Libraries (Washington, DC: Institute of Museum and Library Services). 35. Collin Gordon, Mapping Decline: St. Louis and the Fate of the American City. 36. See Mwalimu Shujaa, Beyond Desegregation: The Politics of Quality in African American Schooling (Thousand Oaks, CA: Corwin, 1996); William J. Wilson, The Truly Disadvantaged: The Inner City, the Underclass, and Public Policy (Chicago: University of Chicago Press, 1987); Gary Orfield and Mindy L. Kornhaber, Raising Standards or Raising Barriers: Inequality and High- Stakes Testing in Public Education (New York: Century Foundation, 2010). http://dese.mo.gov/dsm/ 2311 ---- Modeling a Library Website Redesign Process: Developing a User-Centered Website Through Usability Testing Danielle A. Becker and Lauren Yannotta INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2013 6 ABSTRACT This article presents a model for creating a strong, user-centered web presence by pairing usability testing and the design process. Four rounds of usability testing were conducted throughout the process of building a new academic library web site. Participants were asked to perform tasks using a talk-aloud protocol. Tasks were based on guiding principles of web usability that served as a framework for the new site. Results from this study show that testing throughout the design process is an effective way to build a website that not only reflects user needs and preferences, but can be easily changed as new resources and technologies emerge. INTRODUCTION In 2008 the Hunter College Libraries launched a two-year website redesign process driven by iterative usability testing. The goals of the redesign were to: • update the design to position the library as a technology leader on campus; • streamline the architecture and navigation; • simplify the language used to describe resources, tools, and services; and • develop a mechanism to quickly incorporate new and emerging tools and technologies. Based on the perceived weaknesses of the old site, the libraries’ web committee developed guiding principles that provided a framework for the development of the new site. The guiding principles endorsed solid information architecture, clear navigation systems, strong visual appeal, understandable terminology, and user-centered design. This paper will review the literature on iterative usability testing, user-centered design, and think- aloud protocol and the implications moving forward. It will also outline the methods used for this study and discuss the results. The model used, building the design based on the guiding principles and using the testing to uphold those principles, led to the development of a strong, user-centered site that can be easily changed or adapted to accommodate new resources and technologies. We believe this model is unique and can be replicated by other academic libraries undertaking a website redesign process. Danielle A. Becker (dbe0003@hunter.cuny.edu) is Assistant Professor/Web Librarian, Lauren Yannotta (lyannotta@hotmail.com) was Assistant Professor/Instructional Design Librarian, Hunter College Libraries, New York, New York. mailto:dbe0003@hunter.cuny.edu mailto:lyannotta@hotmail.com MODELING A LIBRARY WEBSITE REDESIGN PROCESS | BECKER 7 BACKGROUND The goals of the research were to (1) determine the effectiveness of the Hunter College Libraries website, (2) discover how iterative usability testing resulting in a complete redesign impacts how the students perceive the usability of a college library website, and (3) reveal student information- seeking habits. A formal usability test was conducted both on the existing Hunter College Libraries website (appendix A) and the following drafts of the redesign (appendix B) with twenty users over an eighteen-month period. The testing occurred before the website redesign began, while the website was under construction, and after the site was launched. The participants were selected through convenience sampling and informed that participation was confidential. The intent of the usability test was to uncover the flaws in navigation and terminology of the current website and, as the redesign process progressed, to incorporate the users’ feedback into the new website’s design to closely match their wants and needs. The redesign of the website began with a complete inventory of the existing webpages. An analysis was done of the website that identified key information, links, units within the department, and placement of information in the information architecture of the website. We identified six core goals that we felt were the most important for all users of the library’s website: 1. User should be able to locate high-level information within three clicks. 2. Eliminate library jargon from navigational system using concise language. 3. Improve readability of site. 4. Design a visually appealing site. 5. Create a site that was easily changeable and expandable. 6. Market the libraries’ services and resources through the site. LITERATURE REVIEW In 2010, OCLC compiled a report, “The Digital Information Seeker,” that found 84 percent of users begin their information searches with search engines, while only 1 percent began on a library website. Search engines are preferred because of speed, ease of use, convenience, and availability.1 Similar studies such as Emde et al., and Gross and Sheridan, have shown that students are not using library websites to do their research.2 Gross and Sheridan assert in their article on undergraduate search behavior that “although students are provided with library skills sessions, many of them still struggle with the complex interfaces and myriad of choices the library website provides.” 3 This research shows the importance of creating streamlined websites that will INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2013 8 compete for our students’ attention. In building a new website at the Hunter College Libraries, we thought the best way to do this was through user-centered design. Web designers both inside and outside the library have recognized the importance of user- centered design. Nielsen advises that website structure should be driven by the tasks the users came to the site to perform.4 He asserts the amount of graphics on webpages should be minimized because they often affect page download times and that gratuitous graphics (including text rendered as images) should be eliminated altogether. 5 He also contends it is important to ensure that page designs are accessible to all users regardless of platform or newness of technology. 6 In their article, “How do I find an article? Insights from a web usability study,” Cockrell and Jayne cited instances when researchers concluded that library terminology contributed to patrons’ difficulties when using library websites, thus highlighting the importance of understandable terminology. Hulseberg and Monson found in their investigation of student-driven taxonomy for library website design that “by developing our websites based on student-driven taxonomy for library website terminology, features, and organization, we can create sites that allow students to get down to the business of conducting research.” 7 Performing usability testing is one way to confirm user-centered design. In his book Don’t Make Me Think!, Krug insists that usability testing can provide designers with invaluable input. That, taken together with experience, professional judgment, and common sense, makes design choices easier.8 Ipri, Yunkin, and Brown, in their article “Usability as a Method for Assessing Discovery,” emphasize the important role usability testing has in capturing emotional and aesthetic responses users have to websites, along with expressions of satisfaction with the layout and logic of the site. Even the discovery of basic mistakes, such as incorrect or broken links and ineffective wording, can negatively affect discovery of library resources and services. 9 In Battleson, Booth, and Weatherford’s literature review for their usability testing of an academic library website case study, they summarize Dumas and Redish's discussion of the five facets of formal usability testing: (1) the goal is to improve the usability of the interface, (2) testers should represent real users, (3) testers perform real tasks, (4) user behavior and commentary are observed and recorded, and (5) data are analyzed to recognize problems and suggest solutions. They conclude that when usability testing is "applied to website interfaces, this test method not only results in a more usable site, but also allows the site design team to function more efficiently, since it replaces opinion with user-centered design."10 This allows the designers to evaluate the results and identify problems with the design being tested. 11 Usability experts Nielsen and Tahir contend that the earlier and more frequently usability tests are conducted, the more impact the results will have on the final design of the website because the results can be incorporated throughout the design process. They conclude it is better to conduct frequent, smaller studies with a maximum of five users. They assert, “You will always have discovered so many blunders in the design that it will be better to go back to the drawing board MODELING A LIBRARY WEBSITE REDESIGN PROCESS | BECKER 9 and redesign the interface than to discover the same usability problems several more times with even more users.” 12 Based on the strength of the literature, we decided to use iterative testing for our usability study. Krug points out that testing is an iterative process because designers need to create, test, and fix based on test results, then test again.13 According to the United States Department of Health and Human Services report “Research-Based Web Design and Usability Guidelines,” conducting before and after studies when revising a website will help designers determine if changes actually made a difference in the usability of the site.14 Manzari and Trinidad-Christensen found in their evaluation of user-centered design for a library website, iterative testing is when a product is tested several times during development, allowing users’ needs to be incorporated into the design. In their study, their aim was that the final draft of their website would closely match the users’ information needs while remaining consistent, easy to learn, and efficient.15 Battleson, Booth, and Weintrop report that there is “a consensus in the literature that usability testing be an iterative process, preferably one built into a Web site’s initial design.” 16 They explain that “site developers should test for usability, redesign, and test again—these steps create a cycle for maintaining, evaluating and continually improving a site.” 17 George used iterative testing in her redesign of the Carnegie Mellon University Libraries website and concluded that it was “necessary to provide user-centered services via the web site.” 18 Cobus, Dent, and Ondrusek used six students to usability test the “pilot study.” Then eight students participated in the first round of testing; then librarians modified the prototype and tested fourteen students in the second and final round. After the second round of testing they used the results of this test to analyze the user recordings and deliver the findings and proposed “fixes” to the prototype pages to the web editor.19 McMullen’s redesign of the Roger Williams University library website was able to “complete the usability-refinement cycle” twice before finalizing the website design.20 But continued refinements were needed, leading to another round of usability tests to identify and correct problem areas.21 Bauer-Graham, Poe, and Weatherford did a comparative study of a library websites’ usability via a survey and then redesigned the website after evaluating the survey’s results. They waited a semester, distributed another survey to determine the functionality of the current site. The survey had the participants view the previous design and the current design in a side-by-side comparison to determine how useful the changes made to the site were. 22 When testing participants, in the article “How do I find an article? Insights from a Web usability study,” Cockrell and Jayne suggest using a web interface to perform specified tasks while a tester observes, noting the choices made, where mistakes occur, and using a “think aloud” protocol. They found that modifying the website through an ongoing, iterative process of testing, refining, and retesting its component parts improves functionality. 23 In conducting our usability testing we used a think-aloud protocol to capture the participants’ actions. Van Den Haak, De Jong, and Schellens define think-aloud protocol as relying on a method INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2013 10 that asks users to complete a set of tasks and to constantly verbalize their thoughts while working on the tasks. The usefulness of this method of testing lies in the fact that the data collected reflect the actual use of the thing being tested and not the participants’ judgments about its usability. Instead, the test follows the individual’s thoughts during the execution of the tasks. 24 Nielsen states that think-aloud protocol “may be the single most valuable usability engineering method. . . . One gets a very direct understanding of what parts of the [interface/user] dialog cause the most problems, because the thinking aloud method shows how users interpret each individual interface item.” 25 Turnbow ‘s article “Usability testing for web redesign: a UCLA case study” states that using the “think-aloud protocol” provides crucial real-time feedback on potential problems in the design and organization of a website.26 Cobus, Dent, and Ondrusek used the think-aloud protocol in their usability study. They encouraged participants to talk out loud as they answered the questions, audio taped their comments, and captured their on-screen navigation using Camtasia.27 This information was used to successfully reorganize Hunter College Library’s website. METHOD An interactive draft of Hunter College Libraries redesigned website was created before the usability study was conducted. In spring 2009, the authors created the protocol for the usability testing. A think-aloud protocol was agreed upon for testing both the old site and the drafts of the new site, including a series of post-test questions that would allow participants to share their demographic information and give subjective feedback on the drafts of the site. Draft questions were written, and we conducted mock usability tests on each other. After several drafts we revised our questions and performed pilot tests on an MLIS graduate student and two undergraduate student library assistants with little experience with the current website. We ascertained from these pilot tests that we needed to slightly revise the wording of several questions to make them more understandable to all users. We made the revisions and eliminated a question that was redundant. All recruitment materials and finalized questions were submitted to the Institutional Review Board (IRB) for review and went through the certification process. After receiving approval we secured a private room to conduct the study. Participants were recruited using a variety of methods. Signs were posted throughout the library, an e-mail was sent out to several Hunter College distribution lists, and a tent sign was erected in the lobby of the library. Participants were required to be students or faculty. Participants were offered a $10.00 Barnes & Noble gift card as incentive. Applicants were accepted on a rolling basis. Twenty students participated in the web usability study (appendix C). No faculty responded to our requests for participation so a decision was made to focus this usability test on students rather than faculty because students comprise our core user base. Another usability test will be conducted in the future that will focus on faculty to determine how their academic tasks differ from undergraduates when using the library MODELING A LIBRARY WEBSITE REDESIGN PROCESS | BECKER 11 website. The redesigned site is malleable, which makes revisions and future changes in the design a predicted outcome of future usability tests. Tests were scheduled for thirty-minute intervals. We conducted four rounds of testing using five participants per round. The two researchers switched questioner and observer roles after each round of testing. Each participant was asked to think aloud while they completed the tasks and navigated the website. Both researchers took notes during the tests to ensure detailed and accurate data was collected. Each participant was asked to review the IRB forms detailing their involvement in the study, and they were asked to consent at that time. Their consent was implied if they participated in the study after reading the form. The usability test consisted of fifteen task-oriented questions. The questions were identical when testing the old and new draft site. The first round tested only the old site, while the following three rounds tested only the new draft site. We tested both sites because we believed that comparing the two sites would reveal if the new site improved performance. The questions (appendix D) were not changed after they were initially finalized and remained the same throughout the entire four rounds of the usability study. Participants were reminded at the onset of the test and throughout the process that the design and usability of the site(s) were being tested, not their searching abilities. The tests were scheduled for an hour each, allowing participants to take the tests without time restrictions or without being timed. As a result, the participants were encouraged to take as much time as they needed to answer the questions, but were also allowed to skip questions if they were unable to locate answers. Initially the tests were recorded using Camtasia software. This allowed us to record participants’ navigation trails through their mouse movements and clicks. But, after the first round of testing, we decided that observing and taking notes was appropriate documentation, and we stopped using the software. After the participants completed the tests we asked them user preference questions to get a sense of their user habits and their candid opinions of the new draft of the website. These questions were designed to elicit ideas for useful links to include on the website and also to gauge the visual appeal of the site. INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2013 12 RESULTS Table 1. Percent of Tasks Answered Correctly DISCUSSION Hunter College Libraries’ website was due for a redesign because the site was dated in its appearance and did not allow new content to be added quickly and easily. As a result, a decision was made to build a new site using a content management system (CMS) to make the site easily expandable and simple to update. This study tested the simple tasks to determine how to structure the information architecture and to reinforce the guiding principles of the redesigned website. Task Successes and Failures The high percentage of success of participants finding books on the redesigned website using the online library catalog and easily find library hours reinforced our guiding principle of understandable terminology and clear navigational systems. Krug contends that navigation educates the user on the site’s contents through its visible hierarchy. The result is a site that guides the user through their options and instills confidence in Task Old Site New Site Find a book using online library catalog 80% 86% Find library hours 100% 100% Get help from a librarian using QuestionPoint 40% 93% Find a journal article 20% 66% Find reference materials 0% 7% Find journals by title 40% 66% Find circulation policies 60% 53% Find books on reserve 80% 73% Find magazines by title 0% 73% Find the library staff contact information 60% 100% Find contact information for the branch libraries 40% 100% MODELING A LIBRARY WEBSITE REDESIGN PROCESS | BECKER 13 the website and its designers.28 We found this to be true in the way our users easily found the hours and catalog links on the prototype of our library website. The users on the old site knew where to look for this information because they were accustomed to how to navigate the old site. Given that the prototype was a complete departure from the navigation and design of the old site, it was crucial that the labels and links were clear and understandable in the prototype or our design would fail. We made “Hours” the first link under the “About” heading and “CUNY+/Books” the first link under the “Find” heading and as a result both our terminology and our structure was a success with participants. On the old website, users rarely used the libraries’ online chat client. Despite our efforts to remind students of its usefulness, the website didn’t sufficiently place the link in a reasonably visible location on the home page. In the old site, only 40 percent of participants located the link as it was on the bottom left of the screen and easy to overlook. Instead, on the new site, the “Ask a Librarian” link was prominently featured on the top of the screen. These results upheld the guiding principles of solid information architecture and understandable terminology. It also supported Nielsen’s assertion that “site design must be aimed at simplicity above all else, with as few distractions as possible and with a very clear information architecture and matching navigation tools.” 29 As a result the launch of the redesigned site, the use of the QuestionPoint chat client has more than doubled. Finding a journal article on a topic was always problematic for users of the old library website. The participants we tested were familiar with the site, and 80 percent erroneously clicked on “Journal Title List” when the more appropriate link would have been “Databases” if they didn’t have an exact journal title in mind. Although we taught this in our information literacy courses, it was challenging getting the information across. In order to address this on the new site, “Databases” was changed to “Databases/Articles” and categorized under the heading “Find.” The participants using the new site had greater success with the new terminology; 66 percent correctly chose “Databases/Articles.” This question revealed an inconsistency with the guiding principals of understandable terminology and clear navigation systems on the old site. These issues were addressed by adding the word “Articles” after “Databases” on the new site to clarify what resources could be found in a database and also by placing the link under the heading “Find” to further explain the action a student would be taking by clicking on the “Databases/Articles” link. Finding reference materials was challenging for the users of the old site as none of the participants clicked on the intended link “Subject Guides.” In an effort to increase usage of the research guides, the library not only purchased the LibGuides tool, but also changed the wording of the link to “Topic Guides.” As we neared the end of our study we observed that only one participant knew to click on the “Topic Guides” link for research assistance. The participants suggested calling it “Research Guides” instead of “Topic Guides” and we changed it. Unfortunately, the usability study was completed and we were unable to further test the effectiveness of the rewording of this link. Anecdotally, the rewording of this link appears to be more understandable to users as the INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2013 14 research guides are getting more usage (based on hit counts) than the previous guides. The rewording of these guides adhered to both principles of understandable terminology and user- centered design. These results supported Nielsen’s assertion that the most important material should be presented up front, using the inverted pyramid principal. “Users should be able to tell in a glance what the page is about and what it can do for them.” 30 Our results also supported the HHS report, which states that terminology “plays a large role in the user’s ability to find and understand information. Many terms are familiar to designers and content writers, but not to users.” 31 We concluded that rewriting the link based on student feedback reduces the use of terminology. Although librarians are “Subject Specialists” and “Subject Liaisons” and are familiar with those labels and that terminology, our students were looking for the word “research” instead of “subject” so they were not connecting with the library’s LibGuides. As previously discussed, students of the old site thought the link “Journal Title List” would give them access to the library’s database holdings. When asked to find a specific journal title the correct answer to this question on the old site was “Journal Title List,” with only 40 percent of the participants answering correctly. Another change to terminology in the new site, both were placed under the heading “Find,” and, after testing of the first prototype, “Journal Title List” was changed to “List of Journals and Magazines.” In the following tests 66 percent of the participants were able to answer correctly. The percentages of success in finding circulation policies between the old site and the prototype site were slight, only a 7 percent difference. This can be attributed to the fact that participants on the old site could click on multiple links to get to the correct page, and they were familiar enough with the site to know that. In the prototype of the site there were several paths as well, some direct, some indirect. Testing the wording of this link supported the understandable terminology principle, more so than the old website’s “Library Policies” link, yet to be true to our user-centered design principle, we needed to reword it once more. Therefore, after the test was completed and the website was launched, we reworded the link to “Checkout Policies,” which utilizes the same terminology that users are familiar with because they checkout books at our checkout desk. The remaining tasks consisted of locating information, such as finding books on reserve, magazines by title, library staff contact information, and finding branch information were all met with higher success rates in the prototype site because in the redesign process the links were reworded to support the understandable terminology and user-centered design principles. Participant Feedback: Qualitative The usability testing process informed the redesign of our website in many specific ways. If the layout of the site didn’t test well with participants, we planned to create another prototype. In their evaluation of Colorado State Universities Libraries’ digital collections and the Western Waters Digital Library websites, Zimmerman and Paschal describe the importance of first impressions of a website as the determining factor of whether users return to a website; if it is positive they will return and continue to explore.32 MODELING A LIBRARY WEBSITE REDESIGN PROCESS | BECKER 15 When given an opportunity to give feedback on what they thought of the design of the website the participants commented: • “There were no good library links at the bottom before and there wasn’t the Ask A Librarian link either which I like a lot.” • “The old site was too difficult to navigate, new site has a lot of information, I like the different color schemes for the different things.” • “It is contemporary and has everything I need in front of me.” • “Cool.” • “Helpful.” • “Straightforward.” • “The organization is easier for when you want to find things.” • “Interactivity and rollovers make it easy to use.” • “Intuitive, straight-forward and I like the simplicity of the colors.” • “More professional, more aesthetically pleasing than the old site.” • “The four menu options (About, Find, Services, Help) break the information down easily.” Additional research conducted by Nathan, Yeow, and Murguesan claims attractiveness (referring to aesthetic appeal of a website) is the most important factor in influencing customer decision- making and affects the usability of the website.33 Not only that, but users feel better when using a more attractive product. Fortunately, the feedback from our participants revealed that the website was visually appealing, and the navigation scheme was clear and easy to understand. Other Changes Made to the Libraries’ Website because of Usability Testing Participants commented that they expected to find library contact information on the bottom of the homepage, so the bottom of the screen was modified to include this information as well as a “Contact Us” link. Participants did not realize that the “About,” “Find,” “Services,” and “Help” headings were also links, so we modified them so they were underlined when hovered over. There were also adjustments to the gray color bars on the top of the page because participants thought they were too bright, so they were darkened to make the labels easier to read. Participants also commented that they wanted links to various public libraries in New York City under the “Quick Links” section of the homepage. We designed buttons for Brooklyn Public Library, Queens Public Library, and the New York Public Library and reordered this list to move these links closer to the top of the “Quick Links” section. INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2013 16 CONCLUSION Conducting a usability study of Hunter College Libraries existing website and the various stages of the redesigned website prototypes was instrumental in developing a user-centered design. Approaching the website redesign in stages, with guidance from iterative user testing and influenced by the participants’ comments, gave the web librarian and the web committee an opportunity to incorporate the findings of the usability study into the design of the new website. Rather than basing design decisions on assumptions of users’ needs and information seeking behaviors, we were able to incorporate what we’d learned from the library literature and the users’ behavior into our evolving designs. This strategy resulted in a redesigned website that, with continued testing, user feedback, and updating, has aligned with the guiding principles we developed at the onset of the redesign project. The one unexpected outcome from this study is that we discovered that despite how well a library website is designed, users will still need to be educated in how to use the site with an emphasis on developing strong information literacy skills. REFERENCES 1. “The Digital Information Seeker: Report of the Findings from Selected OCLC, RIN, and JISC User Behaviour Projects,” OCLC Research, ed. Lynn Silipigni-Connaway and Timothy Dickey (2010): 6, www.jisc.ac.uk/publications/reports/2010/digitalinformationseekers.aspx. 2. Judith Emde, Lea Currie, Frances A. Devlin, and Kathryn Graves, “Is ‘Good Enough’ OK? Undergraduate Search Behavior in Google and in a Library Database,” University of Kansas Scholarworks (2008), http://hdl.handle.net/1808/3869; Julia Gross and Lutie Sheridan, “Web Scale Discovery: The User Experience,” New Library World 112, no. 5/6 (2011): 236, doi: 10.1108/03074801111136275. 3. Ibid, 238. 4. Jakob Nielsen, Designing Web Usability (Indianapolis: New Riders, 1999), 198. 5. Ibid, 134. 6. Ibid, 97. 7. Barbara J. Cockrell and Elaine A. Jayne, “How Do I Find an Article? Insights from a Web Usability Study,” Journal of Academic Librarianship 28, no. 3 (2002): 123, doi: 10.1016/S0099- 1333(02)00279-3. 8. Steve Krug, Don't Make Me Think! A Common Sense Approach to Web Usability, 2nd ed. (Berkeley, CA: New Riders, 2006), 135. 9. Tom Ipri, Michael Yunkin, and Jeanne Brown, “Usability as a Method for Assessing Discovery,” Information Technology & Libraries 28, no. 4 (2009): 181, doi: 10.6017/ital.v28i4.3229. 10. Brenda Battleson, Austin Booth, and Jane Weintrop, “Usability Testing of an Academic Library Web Site: A Case Study,” Journal of Academic Librarianship 27, no. 3 (2001): 189–98, doi: 10.1016/S0099-1333(01)00180-X. http://www.jisc.ac.uk/publications/reports/2010/digitalinformationseekers.aspx http://hdl.handle.net/1808/3869 doi:%2010.1108/03074801111136275 doi:%2010.1108/03074801111136275 doi:%2010.1016/S0099-1333(02)00279-3 doi:%2010.1016/S0099-1333(02)00279-3 doi:%2010.6017/ital.v28i4.3229 doi:%2010.1016/S0099-1333(01)00180-X doi:%2010.1016/S0099-1333(01)00180-X MODELING A LIBRARY WEBSITE REDESIGN PROCESS | BECKER 17 11. Ibid. 12. Jakob Nielsen and Marie Tahir, “Keep Your Users in Mind,” Internet World 6, no. 24 (2000): 44. 13. Steve Krug, Don't Make Me Think! A Common Sense Approach to Web Usability, 135. 14. Research-Based Web Design and Usability Guidelines, ed. Ben Schneiderman (Washington: United States Dept. of Health and Human Services, 2006), 190. 15. Laura Manzari and Jeremiah Trinidad-Christensen, “User-Centered Design of a Web Site for Library and Information Science Students: Heuristic Evaluation and Usability Testing,” Information Technology & Libraries 25, no. 3 (2006): 163, doi: 10.6017/ital.v25i3.3348. 16. Battleson, Booth, and Weintrop, “Usability Testing of an Academic Library Web Site,” 190. 17. Ibid. 18. Carole A. George, “Usability Testing and Design of a Library Website: An Iterative Approach,” OCLC Systems & Services 21, no. 3 (2005): 178, doi: 10.1108/10650750510612371. 19. Laura Cobus, Valeda Dent, and Anita Ondrusek, “How Twenty-Eight Users Helped Redesign an Academic Library Web Site,” Reference & User Services Quarterly 44, no. 3 (2005): 234–35. 20. Susan McMullen, “Usability Testing in a Library Web Site Redesign Project,” Reference Services Review 29, no. 1 (2001): 13, doi: 10.1108/00907320110366732. 21. Ibid. 22. John Bauer-Graham, Jodi Poe, and Kimberly Weatherford, “Functional by Design: A Comparative Study to Determine the Usability and Functionality of One Library's Web Site,” Technical Services Quarterly 21, no. 2 (2003): 34, doi: 10.1300/J124v21n02_03. 23. Cockrell and Jayne, “How Do I Find an Article?,” 123. 24. Maaike Van Den Haak, Menno De Jong, and Peter Jan Schellens, “Retrospective vs. Concurrent Think-Aloud Protocols: Testing the Usability of an Online Library Catalogue,” Behavior & Information Technology 22, no. 5 (2003): 339. 25. Battleson, Booth, and Weintrop, “Usability Testing of an Academic Library Web Site,” 192. 26. Dominique Turnbowet al., “Usability Testing for Web Redesign: A UCLA Case Study,” OCLC Systems & Services 21, no. 3 (2005): 231, doi: 10.1108/10650750510612416. 27. Cobus, Dent, and Ondrusek, “How Twenty-Eight Users Helped Redesign an Academic Library Web Site,” 234. 28. Krug, Don't Make Me Think! 59. 29. Nielsen, Designing Web Usability, 164. 30. Ibid., 111. doi:%2010.6017/ital.v25i3.3348 doi:%2010.1108/10650750510612371 doi:%2010.1108/00907320110366732 doi:%2010.1300/J124v21n02_03 doi:%2010.1108/10650750510612416 INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2013 18 31. Schneiderman, Research-Based Web Design and Usability Guidelines, 160. 32. Don Zimmerman and Dawn Bastian Paschal, “An Exploratory Evaluation of Colorado State Universities Libraries’ Digital Collections and the Western Waters Digital Library Web Sites,” Journal of Academic Librarianship 35, no. 3 (2009): 238, doi: 10.1016/j.acalib.2009.03.011. 33. Robert J. Nathan, Paul H. P. Yeow, and Sam Murugesan, “Key Usability Factors of Service- Oriented Web Sites for Students: an Empirical Study,” Online Information Review 32, no. 3 (2008): 308, doi: 10.1108/14684520810889646. doi:%2010.1016/j.acalib.2009.03.011 doi:%2010.1108/14684520810889646 MODELING A LIBRARY WEBSITE REDESIGN PROCESS | BECKER 19 Appendix A. Hunter College Libraries’ Old Website INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2013 20 Appendix B. Hunter College Libraries’ New Website MODELING A LIBRARY WEBSITE REDESIGN PROCESS | BECKER 21 Appendix C. Test Participant Profiles Participant Sex Academic Standing Major Library Instruction Session? How Often in the Library 1 Female Senior History Yes Every day 2 Female Sophomore Psychology No Every day 3 Male Junior Nursing No 1/week 4 Female Junior Studio Art No 5/week 5 Female Senior Accounting Yes 2–3/week 6 Male Freshman Undeclared Yes 1/week 7 Female Freshman Undeclared No Every day 8 Male Senior Music Yes 3–4/week 9 Male Freshman Physics/English No Every day 10 Female Senior English Lit/ Media Studies No 1/week 11 Female Junior Fine Arts/ Geography Yes 2–3/week 12 Male Sophomore Computer Science Yes Every day 13 Male Sophomore Econ/Psychology Yes 6 hours/week 14 Female Senior Math/Econ Yes 2–3/week 15 Female Senior Art Yes Everyday 16 Male n/a* Pre-nursing No Daily 17 Female Senior** Econ Didn’t remember 3/week 18 Male Senior Pre-Med Yes 2/week 19 Female Grad Art History Yes 3/week 20 Male Grad Education (TESOL) No Every day Note: *This student at Hunter fulfilling pre-requisites; already had Bachelor of Arts degree from another college. **This student had just graduated. INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2013 22 Appendix D. Test Questions/Tasks • What is the first thing you noticed (or looked at) when you launched the Hunter Libraries Homepage? • What’s the second? • If your instructor assigned the book To Kill a Mockingbird what link would you click on to see if the library owns that book? • When does the library close on Wednesday night? • If you have a problem researching a paper topic and are at home, where would you go to get help from a librarian? • Where would you click if you needed to find two journal articles on “Homelessness in America”? • You have to write your first sociology paper and wanted to know what databases, journals, and web sites would be good resources for you to begin your research. Where would you click? • Does Hunter Library subscribe to the e-journal Journal of Communication? • How long can you check out a book for? • How would you find items on reserve for Professor Doyle’s LiIBR100 class? • Does Hunter Library have the latest issue of Rolling Stone magazine? • What is the e-mail for Louise Sherby, Dean of Libraries? • What is the phone number for the Social Work Library? • You are looking for a guide to grammar and writing on the web, does the library’s webpage have a link to such a guide? • Your friend is a Hunter student who lives near Brooklyn College. She says that she may return books she borrowed from the Brooklyn College Library to Hunter Library. Is she right? Where would you find out? • This website is easy to navigate (Agree, Agree Somewhat, Disagree Somewhat, Disagree)? • This website uses too much jargon (Agree, Agree Somewhat, Disagree Somewhat, Disagree)? • I use the Hunter Library’s website (Agree, Agree Somewhat, Disagree Somewhat, Disagree)? 2384 ---- Eclipse Editor for MARC Records Bojana Dimić Surla INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2012 65 ABSTRACT Editing bibliographic data is an important part of library information systems. In this paper we discuss existing approaches in developing user interfaces for editing MARC records. There are two basic approaches: screen forms that support entering bibliographic data without knowledge of the MARC structure, and direct editing of MARC records shown on the screen. This paper presents the Eclipse editor, which fully supports editing of MARC records. It is written in Java as an Eclipse plug-in, so it is platform-independent. It can be extended for use with any data store. The paper also presents a Rich Client Platform (RCP) application made of a MARC editor plug-in, which can be used outside of Eclipse. The practical application of the results is integration of the RCP application into the BISIS library information system. INTRODUCTION An important module of every library information system (LIS) is one for editing bibliographic records (i.e., cataloguing). Most library information systems store their bibliographic data in a form of MARC records. Some of them support cataloging by direct-editing of MARC record; others have a user interface that enables entering bibliographic data by a user who knows nothing about how MARC records are organized. The subject of this paper is user interfaces for editing MARC records. It gives software requirements and analyzes existing approaches in this field. As the main part of the paper, we present the Eclipse editor for MARC records, developed at the University of Novi Sad, as a part of the BISIS library information system. Eclipse uses the MARC 21 variant of the MARC format. The remainder of this paper describes the motivation for the research, presents the software requirements for cataloging according to MARC standards, and provides background on the MARC 21 format. It also describes the development of the BISIS software system, reviews the literature concerning tools for cataloging, and analyzes existing approaches in developing user interfaces for editing MARC records. The results of the research are presented in the final section, which describes the functionality and technical characteristics of the Eclipse MARC editor. The Rich Client Platform (RCP) version of the editor, which can be used independently of Eclipse, is also presented. MOTIVATION The motivation for this paper was to provide an improved user interface for cataloging by the MARC standard that will lead to more efficient and comfortable work for catalogers. Bojana Dimić Surla (bdimic@uns.ns.ac.yu) is an Associate Professor, University of Novi Sad, Serbia. ECLIPSE EDITOR FOR MARC RECORDS |SURLA 66 There are two basic approaches in developing user interfaces for MARC cataloging. The first approach includes using a classic screen form made of text fields and labels with the description of the bibliographic data, without MARC standard indication. The second approach is direct editing of a record that is shown on the screen. Those two approaches will be discussed in detail in “Existing Approaches in Developing User Interfaces for Editing MARC Records” below. The current editor in the BISIS system is a mixture of these two approaches—it supports direct editing, but data input is done via text field, which opens on double click.1 The idea presented in this paper is to create an editor that overcomes all drawbacks of previous solutions. The approach taken in creating the editor was direct record-editing with real-time validation and no additional dialogs. Software Requirements for MARC Cataloging The user interface for MARC cataloging needs to support following functions: • Creating MARC records that satisfy constraints proposed by the bibliographic format • Selecting codes for field tags, subfield names, and values of coded elements, such as character positions in leader and control fields, indicators, and subfield content • Validating entered data • Access to data about the MARC format (a “user manual” for MARC cataloging) • Exporting and importing created records • Providing various previews of the record, such as catalog cards BACKGROUND MARC 21 As was previously mentioned, the Eclipse editor uses the MARC 21 variant. MARC 21 consists of five formats: bibliographic data, authority data, holdings data, classification data, and community information.2 MARC 21 records consist of three parts: record leader, set of control fields, and set of data fields. The record leader content, which follows the LDR label, includes the logical length of the record (first five characters) and the code for record status (sixth character). After the record leader, there are control fields. Every control field is written in new line and consists of the three- character numeric tag and content of the control field. The content of the control field can be a single datum or a set of fixed-length bibliographic data. Control fields are followed by data fields in the record. Every line in the record that contains a data field consists of a three-character numeric tag, the value for the first and the second indicator—or the number sign (#) if indicators are not defined for the field—and the list of subfields that belong to the field. INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2012 67 Detailed analysis of MARC 21 shows that there are some constraints on the structure and content of the MARC 21 record. Constraints on the structure define which fields and subfields can appear more than once in the record (i.e., are the fields and subfields repeatable or not), the allowed length of the record elements, and all the elements of the record defined by MARC 21. Constraints on the record content are defined on the content of the leader, indicators, control fields and subfields. Moreover, some constraints connect more elements in the record (when the content of one element depends on the content of the other element in the record). An example of constraint on the structure for data field 016 is that the field has the first indicator whereas the second indicator is undefined. The field 016 can have subfields a, z, 2, and 8, of which z and 8 are repeatable. BISIS The results presented in this paper belong to the research on the development of the BISIS library information system. This system, which has been in development since 1993, is currently in its fourth version. The editor for cataloging in the current version of BISIS was the starting point for the development of Eclipse, the subject of this paper. 3 Apart from an editor for cataloging, the BISIS system has a module for circulation and an editor for creating Z39.50 queries.4 The indexing and searching of bibliographic records was implemented using the Lucene text server.5 As a part of the editor for cataloging, we developed the module generating various reports and catalog cards from MARC records.6 BISIS also supports creating an electronic catalog of UNIMARC records on the web, where the input of bibliographic data can be down without knowing UNIMARC but the entered data are mapped to UNIMARC and stored in the BISIS database.7 The recent research within the BISIS project relates to its extension for managing research results at the University of Novi Sad. For that purpose, we developed the Current Research Information System (CRIS) on the recommendation of the nonprofit organization euroCRIS.8 The paper “CERIF Compatible Data Model Based on MARC 21 Format” gives the proposal for the Common European Research Information Format (CERIF), a compatible data model based on MARC 21. In this model, a part of the CERIF data model that relates to research results is mapped to MARC 21. Furthermore, on the basis of this model, research management at the University of Novi Sad was developed.9 The paper “CERIF Data Model Extension for Evaluation and Quantitative Expression of Scientific Research Results” explains the extension of CERIF for evaluation of published scientific research. The extension is based on the semantic layer of CERIF, which enables classification of entities and their relationships by different classification schemas.10 The current version of the BISIS system is based on a variant of the UNIMARC format. The development of the next version of BISIS, which will be based on MARC 21, is in progress. The first task was migrating existing UNIMARC records.11 The second task is developing the editor for MARC 21 records, which is the subject of this paper. ECLIPSE EDITOR FOR MARC RECORDS |SURLA 68 Cataloging Tools An editor for cataloging is a standard part of a cataloger’s workstation and the subject of numerous studies. Lange describes the cataloging development process from handwritten cataloging cards, to typewriters (first manual then electronic), to the appearance of MARC records and PC-based cataloger’s workstations.12 Leroya and Thomas debate the influence of web development on cataloging. They stress that the availability of information on the web, as well as the possibility that more applications can be opened in the same time in different windows, greatly influence the process of creating bibliographic records. Their paper also indicates that there are some problems that result from using large numbers of resources from the web, such as errors that arise from copy-paste methods. Consequently, there is a need for automatic check of spelling errors and the possibility of a detailed review by a cataloger during editing.13 Khurshid deals with general principles of the cataloger’s workstation, its configuration, and its influence on a cataloger’s productivity. In addition to efficient access to remote and local electronic resources, Khurshid includes record transfer through a network and sophisticated record editing as important functions of a cataloger’s workstation. Furthermore, Khurshid says it is possible to improve cataloging efficiency in the Windows-based cataloger’s workstation by finding bibliographic records in other institutions and cutting and pasting lengthy parts of the record (such as summary notes) to their own catalog.14 Existing Approaches in Developing User Interfaces for Editing MARC Records The basic source for this analysis of existing user interfaces for editing MARC records was the official site for MARC standards of the Library of Congress in addition to scientific journals and conferences. The analysis of existing systems shows that there are two basic approaches in the implementation of editing MARC records: 15 • Entering bibliographic data in classic screen forms made of text fields and labels, which does not require knowledge of the MARC format (Concourse,16 Koha,17 J-MARC18) • Direct editing of a MARC record shown on the screen (MARCEdit,19 IsisMARC,20 Catalis,21 Polaris,22 MARCMaker and MARCBraker,23 ExLibris Voyager24). Both of these approaches have advantages and disadvantages. The drawback of the first approach is that it provides a limited set of bibliographic data to edit, and the extension of that set implies changes to the application, or in the best cases changes in configuration. Another problem is that there are usually a lot of text fields, text areas, combo boxes, and labels on the screen that need to be organized into several tabs or additional windows. This situation usually makes it difficult for the users to see errors or to connect different parts of the record when checking their work. Moreover, all found solutions from the first group perform little validation of data entered by the user.25 One important advantage of the first approach is that the application can be used by a user INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2012 69 who is not familiar with the standard, thus the need for access to MARC data can be avoided (one of functions listed “MARC 21” above). As for second approach, editing a MARC record directly on the screen overcomes the problem of extending the set of bibliographic data to enter. It also enables users to scan entered data and check the whole record, which appears on the screen. Users can also copy and paste parts of records from other resources into the editor. However, a majority of those applications are actually editors for editing MARC files that are later uploaded in some database or transformed in some other format (Marcedit, MARCMaker and MARCBreaker, Polaris), and they usually support little or no data validation.26 They allow users to write anything (i.e., the record structure is not controlled by the program), and only validate at the end of the process when uploading or transforming the record. Among those editors there are those, such as Catalis and IsisMARC, that present the MARC record as a table. They support the control of structure, but the record presented in this way is usually too big to fit on the screen, so it is separated into several tabs. An important function of editing MARC records is selecting code for coded elements that can be positioned in the leader or control field, value of the indicator, or value of the subfield. There are also field tags or subfield codes that sometimes need to be selected for addition to a record. All analyzed editors provide additional dialogs for picking this code that require the user to constantly open and close dialogs, which sometimes can be annoying for the user. One important fact about editors in the second group is that they can be used only by a user who is familiar with MARC, so access to the large set of MARC element descriptions can make the job easier. Some of the mentioned systems provide descriptions of the fields and subfields (e.g., IsisMARC), but most of them do not. FINDINGS The editor for MARC records was developed as a plug-in for Eclipse; therefore it is similar to Eclipse’s Java code editors. As the editor is written in Java, it is platform-independent. The main part of this editor was created using oAW Xtext framework for developing textual domain-specific languages.27 It was created using model-driven software development by specifying the model of MARC record in a form of Xtext grammar and generating the editor. All main characteristics of the editor were generated on the basis of the specification of constraints and extensions of the Xtext grammar—therefore all changes to the editor can be realized by changing the specification. Moreover, this editor can be easily adjusted for any database by using the concept of extension and extension point in the Eclipse plug-in. We make this application independent of Eclipse by using Rich Client Platform (RCP) technology. This editor is implemented for MARC 21 bibliographic and holdings formats. User Interface ECLIPSE EDITOR FOR MARC RECORDS |SURLA 70 Figure 1 shows the editor opened within Eclipse. The main area is marked with “1”—it shows the MARC 21 file that is being edited. That file contains one MARC 21 bibliographic record. The tags of the fields and subfields codes are highlighted in the editor, which contributes to presentation clarity. The area marked with “2” serves for listing the errors in the record, that is, nonvalid elements entered in the record. The area marked with “3” shows data about MARC 21 in a tree form. This part of the screen has two other possible views: a MARC 21 holdings format tree and a navigator, which is the standard Eclipse view for browsing resources for the opened project. The actions available for creating a record are available in the cataloging menu and on the cataloging toolbar, which is marked with “4.” These are actions for previewing the catalog card, creating a new bibliographic record, loading a record from a database (importing the record), uploading a record to a database (exporting the record), and creating a holdings record for this bibliographic record. Figure 1. Eclipse Editor for MARC Records In the Eclipse editor for MARC, selecting codes is enabled without opening additional dialogs or windows (figure 2). That is a standard Eclipse mechanism for code completion: typing Ctrl + Space opens the dropdown list with all possible values for the cursor’s current position. INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2012 71 Figure 2. Selecting Codes Record validation is done in real time, and every violation is shown while editing (figure 3). Figure 3 depicts two errors in the record: one is a wrong value in the second character position in control field 008, and another is that two 100 fields were entered, which is a field that cannot be duplicated in a record. Figure 3. Validation Errors RCP Application of the Cataloging Editor As shown above, the editor is available as an Eclipse plug-in, which raises the question of what a cataloger will do with all the other functions of the Eclipse Integrated Development Environment (IDE). As seen in figures 1 and 3, there are a lot of additional toolbars and menus that not related ECLIPSE EDITOR FOR MARC RECORDS |SURLA 72 to cataloging. The answer lies in RCP technology. RCP technology generates independent software applications on the basis of a set of Eclipse plug-ins.28 The main window of an RCP application with additional actions is shown in figure 4. Beside the Cataloguing menu that is shown, the window also contains the File menu, which includes Save and Save As actions, as well as the Edit menu, which includes Undo and Redu actions. All of these actions are also available via the toolbar. Figure 4. RCP Application CONCLUSION The goal of this paper was to review current user interfaces for editing MARC records. We presented two basic approaches in this field and analyzed of advantages and disadvantages of each. We then presented the Eclipse MARC editor, which is part of the BISIS library software system. The idea behind Eclipse is inputting structured MARC data in the form similar to programming language editors. The author did not find this approach in the accessible literature. The RCP application of the presented editor will find its practical application in future versions of the BISIS system. It represents an upgrade of the existing editor and a starting point for forming the version of the BISIS system that will be based on MARC 21. The acquired results can also be INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2012 73 used for the input of other data into the BISIS system, including data from the CRIS system used at the University of Novi Sad. This paper shows that Eclipse plug-in technology can be used for creating end user applications. The development of applications with the plug-in technology enables the use of a big library of created components from the Eclipse user interface, whereby writing source code is avoided. Additionally, the plug-in technology enables the development of extendible applications by using the concept of the extension point. In this way, we can create software components that can be used by a great number of different information systems. By using the concept of “extension point,” the editor can be extended by the functions that are specific for a data store. An extension point was created for export and import of MARC records, which means the MARC editor plug-in can be used with any database management system by extending this extension point in Eclipse plug-in technology. Future work in the development of the Eclipse MARC editor is to implement support for additional MARC formats, for authority and classification data, and for community information. These formats propose the same record structure but have different constraints on the content and different sets of fields and subfields, as well as different codes for character positions and subfields. Therefore the appearance of the editor will remain the same. The only difference will be the specification of the constraints and codes for code completion. Another interesting topic for discussion is considering implementation of other modules of library information systems in Eclipse plug-in technology. REFERENCES 1. Bojana Dimić and Dušan Surla, “XML Editor for UNIMARC and MARC21 cataloging,” Electronic Library 27 (2009): 509–28; Bojana Dimić, Branko Milosavljević, and Dušan Surla, “XML Schema for UNIMARC and MARC 21 Formats,” Electronic Library 28 (2010): 245–62. 2. Library of Congress, “MARC Standards,” http://www.loc.gov/marc (access February 19, 2011). 3. Dimić and Surla, “XML Editor,” Dimić, Milosavljević, and Surla, “XML Schema.” 4. Danijela Tešendić, Branko Milosavljević, and Dušan Surla, “A Library Circulation System for City and Special Libraries,” Electronic Library 27 (2009): 162–68; Branko Milosavljevic and Danijela Tešendić, “Software Architecture of Distributed Client/Server Library Circulation,” Electronic Library, 28 (2010): 286–99; Danijela Boberić and Dušan Surla, “XML Editor for Search and Retrieval of Bibliographic Records in the Z39.50 Standard,” Electronic Library 27 (2009): 474–95. 5. Branko Milosavljević, Danijela Boberić, and Dušan Surla, “Retrieval of Bibliographic Records Using Apache Lucene,” Electronic Library 28 (2010): 525–36. http://www.loc.gov/marc ECLIPSE EDITOR FOR MARC RECORDS |SURLA 74 6. Jelana Rađenović, Branko Milosavljеvić, and Dušan Surla, “Modelling and Implementation of Catalogue Cards Using FreeMarker,” program: Electronic Library and Information Systems 43 (2009): 63–76. 7. Katarina Belić and Dušan Surla, “Model of User Friendly System for Library Cataloging,” ComSIS 5 (2008): 61–85; Katarina Belić and Dušan Surla, “User-Friendly Web Application for Bibliographic Material Processing,” Electronic Library 26 (2008): 400–410; EuroCRIS homepage, www.eurocris.org (accessed February 21, 2011). 8. Dragan Ivanović, Dušan Surla, and Zora Konjović, “CERIF Compatible Data Model Based on MARC 21 Format,” Electronic Library 29 (2011). http://www.emeraldinsight.com/journals.htm?articleid=1906945. 9. euroCRIS, “Common European Research Information Format,” http://www.eurocris.org/Index.php?page=CERIFreleasesandt=1 (accessed February 21, 2011); Dragan Ivanović et al., “A CERIF-Compatible Research Management System Based on the MARC 21 Format,” program: Electronic Library and Information Systems 44 (2010): 229–51. 10. Gordana Milosavljević et al., “Automated Construction of the User Interface for a CERIF- Compliant Research Management System,” The Electronic Library 29 (2011). http://www.emeraldinsight.com/journals.htm?articleid=1954429; Dragan Ivanović, Dušan Surla, and Miloš Racković, “A CERIF Data Model Extension for Evaluation and Quantitative Expression of Scientific Research Results,” Scientometrics 86 (2010): 155–72. 11. Gordana Rudić and Dušan Surla, “Conversion of Bibliographic Records to MARC 21 Format,” Electronic Library 27 (2009): 950–67. 12. Holley R. Lange, “Catalogers and Workstations: A Retrospective and Future View,” Cataloging & Classification Quarterly 16 (1993): 39–52. 13. Sarah Yoder Leroya and Suzanne Leffard Thomas, “Impact of Web Access on Cataloging,” Cataloging & Classification Quarterly 38 (2004): 7–16. 14. Zahirrudin Khurshid, “The Cataloger’s Workstation in the Electronic Library Environment,” Electronic Library 19 (2001): 78–83. 15. Library of Congress, “MARC Standards,” http://www.loc.gov/marc (access February 19, 2011). 16. Book Systems, “Concourse Software Product,” http://www.booksys.com/v2/products/concourse (accessed February 19, 2011). 17. Koha Library Software Community homepage, http://koha-community.org (accessed February 19, 2011). http://www.emeraldinsight.com/journals.htm?articleid=1906945 http://www.emeraldinsight.com/journals.htm?articleid=1954429 http://www.loc.gov/marc http://www.booksys.com/v2/products/concourse http://koha-community.org/ INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2012 75 18. Wendy Osborn et al., “A Cross-Platform Solution for Bibliographic Record Manipulation in Digital Libraries,” (paper presented at the Sixth IASTED International Conference Communications, Internet and Information Technology, July 2–4, 2007, Banf, Alberta, Canada). 19. Terry Reese, “MarcEdit—Your Complete Free MARC Editing Utility,” http://people.oregonstate.edu/~reeset.marcedit/html/index.php (accessed February 19, 2011). 20. United Nations Educational Scientific and Cultural Organization, “IsisMARC,” http://portal.unesco.org/ci/en/ev.php- URL_ID=11041&URL_DO=DO_TOPIC&URL_SECTION=201.html (accessed February 19, 2011). 21. Fernando J. Gómez “Catalis,” http://inmabb.criba.edu.ar/catalis (accessed February 19, 2011). 22. Polaris Library Systems homepage, http://www.gisinfosystems.com (accessed February 19, 2011). 23. Library of Congress, “MARCMaker and MARCBreaker User’s Manual,” http://www.loc.gov/marc/makrbrkr.html (accessed February 19, 2011). 24. ExLibris, “ExLibris Voyager,” http://www.exlibrisgroup.com/category/Voyager (accessed February 19, 2011). 25. Book Systems, “Concourse Software Product.” 26. Bonnie Parks, “An Interview with Terry Reese,” Serials Review 31 (2005): 303–8. 27. Eclipse.org, “XText,” http://www.eclipse.org/Xtext (accessed February 19, 2011). 28. The Eclipse Foundation, “Rich Client Platform,” http://wiki.eclipse.org/index.php/Rich_Client_Platform (accessed February 19, 2011). http://people.oregonstate.edu/~reeset.marcedit/html/index.php http://portal.unesco.org/ci/en/ev.php-URL_ID=11041&URL_DO=DO_TOPIC&URL_SECTION=201.html http://portal.unesco.org/ci/en/ev.php-URL_ID=11041&URL_DO=DO_TOPIC&URL_SECTION=201.html http://inmabb.criba.edu.ar/catalis http://www.gisinfosystems.com/ http://www.loc.gov/marc/makrbrkr.html http://www.exlibrisgroup.com/category/Voyager http://www.eclipse.org/Xtext http://wiki.eclipse.org/index.php/Rich_Client_Platform 18. Wendy Osborn et al., “A Cross-Platform Solution for Bibliographic Record Manipulation in Digital Libraries,” (paper presented at the Sixth IASTED International Conference Communications, Internet and Information Technology, July 2–4, 2007, Banf, ... 25. Book Systems, “Concourse Software Product.” 26. Bonnie Parks, “An Interview with Terry Reese,” Serials Review 31 (2005): 303–8. 2526 ---- Editorial Board Thoughts: Appreciation for History Cynthia Porter INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2012 2 The future looks exciting for ITAL, with our new open access and online only journal. As I look forward, I have been thinking about librarians and the changes I have witnessed in library technology. I would like to thank Judith Carter for her work on ITAL for over 13 years. She encouraged me to volunteer for the editorial board. I will miss her. I believe that lessons from the past can help us. ITAL’s first issue appeared in 1982—the same year that I graduated from high school. I typed all my school papers with a typewriter except for my last couple of papers in college. My father bought an early Macintosh computer (called Lisa). He had a daisy wheel printer—if we wanted to change fonts, we changed out the daisy wheel. I am thankful for the editing capabilities and font choices I have now when I create documents. As an undergraduate student, I worked on dedicated OCLC terminals in the Interlibrary Loan (ILL) department at my college library. I was hired because I had the two hours open when ILL usually used mail. I thought our ILL service was a big help for our students. I could not imagine then that electronic copies of articles could be delivered to ILL customers within one day. Today’s ILL staff doesn’t have to worry about paper cuts now, either. I graduated from library school in 1989. When I first started working as a cataloger, we were able to access OCLC on PC’s (an improvement from the dumb terminals) in the libraries. Our subject heading lists were in the big red books from the Library of Congress. I tried to use the red books as an example for today’s students and they had no idea what I was talking about. Even though “subject headings” are a foreign concept to many students today, I will always value them and fight for their continuation. I worked on several retrospective conversion projects when I worked for a library contractor until 1991. The libraries still had card catalogs and we converted these physical catalogs to online catalogs. Nicholson Baker’s article “Discards1,” published in 1994, fondly remembered card catalogs. This article was discussed fervently in library school, but it seems quaint now. I grew up with card catalogs and I liked being able to browse through the subject listings. Browsing online does not provide the same satisfaction, but I would never give up the ability to keyword search an electronic document. I liked browsing the classification schemes, too. I did like easily seeing where your chosen number appeared within the scheme. It’s harder to do the same thing online. In 1991 I worked at an academic library where we were still converting catalog cards. We all had Cynthia Porter (cporter@atsu.edu) is Distance Support Librarian at A.T. Still University of Health Sciences, Mesa, Arizona. EDITORIAL BOARD THOUGHTS: APPRECIATION FOR HISTORY| PORTER 3 computers on our desks by then and we were comfortable with regular use of e-mail. The Internet was still young and Gophers were the new technology. Even though Gophers were text-based, I thought it was amazing how easy it was to access information from a university on the other side of the country. The Internet was the biggest technology development for me. I currently work with distance students who rely on their Internet connections to use our online library. I could not imagine even having distance students if we weren’t connected with computers as we are now. A 2009 issue of ITAL was dedicated to discovery tools. In Judith Carter’s introduction to the issue she cites the browsing theory of Shan-Ju Lin Chang. Browsing is an old practice in libraries and I am very happy to see that discovery tools use this classic library practice. Bringing like items together has been a helpful organization method for ages. When I studied S.R. Ranganathan and his Colon Classification scheme, I realized that faceted classification would work very well on the web. I found his ideas to be fascinating, but difficult to implement on book labels for classification numbers. Some discovery tools even identify “facets” in searching and limiting. Ranganathan’s work is a beautiful example of an old idea blossoming years after its conception. Classification, facets, and browsing are old ideas that are still helping us organize information in our libraries. We can’t see the heavily used subjects by how dirty the cards are, but getting exact statistics on search terms is more useful anyway. I would also like to thank Marc Truitt for his time and contributions to ITAL. Marc recently finished serving for four years as ITAL editor. He helped me remember library technology. I wanted to know about his collaboration with Judith Carter. He said that he “thought no one this side of Pluto could do as well as she” as Managing Editor. We are lucky to have had brave librarians like Ranganathan, Carter, and Truitt. Although I enjoy remembering the past, I am very happy to utilize modern technology in my library. I don’t want to live in the past, but I definitely don’t want to forget it either. Thank you library technology pioneers. REFERENCES 1. Nicholson Baker, “Discards,” The New Yorker, April 4, 1994, vol. 70, no. 7, p. 64-85. 2528 ---- Editor’s Comments Bob Gerrity INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2012 1 G’day, mates, and welcome to our third open-access issue. ITAL takes on an additional international dimension with this issue, as your faithful editor has taken up residence Down Under, in sunny Queensland, Australia. The recent ALA Annual Meeting in Anaheim marked some changes to the ITAL Editorial Board that I’d like to highlight. Cynthia Porter and Judith Carter are ending their tenure with ITAL after many years of service. Cynthia is featured in this month’s Editorial Board Thoughts column, offering her perspective on library technology past and present. Judith Carter ends a long run with ITAL as Managing Editor, and I thank her for her years of dedicated service. Ed Tallent, Director of Levin Library at Curry College, is the incoming Managing Editor. We also welcome two new members of the Editorial Board: Brad Eden, the Dean of Library Services and Professor of Library Science at Valparaiso University, and Jerome Yavarkovsky, former University Librarian at Boston College, and the 2004 recipient of ALA’s Hugh C. Atkinson Award. Jerome currently co-chairs the Library Technology Working Group at The MediaGrid Immersive Education Initiative. We cover a broad range of topics in this issue. Ian Chan, Pearl Ly, and Yvonne Meulemans describe the implementation of the open-source instant messaging (IM) network Openfire at California State University San Marcos, in supporting of the integration of chat reference and internal library communications. Richard Gartner explores the use of the Metadata Encoding and Transmission Standard (METS) as an alternative to the Fedora Content Model (FCM) for an “intermediary” digital-library schema. Emily Morton and Karen Hanson present an innovative approach to creating a management dashboard of key library statistics. Kate Pittsley and Sara Memmott describe navigational improvements made to LibGuides at Eastern Michigan University. Bojana Surla reports on the development of a platform-independent, Java-based MARC editor. Yongming Wang and Trevor Dawes delve into the need for next-generation integrated library systems and early initiatives in that space. Melanie Schlosser and Brian Stamper begin to explore the effects of reposting library digital collections on Flickr. In addition to the compelling new content in this issue of ITAL, we have compelling old content from the print archive of ITAL and its predecessor, Journal of Library Automation (JOLA), that will soon be available online, thanks in large to the work of Andy Boze and colleagues at the University of Notre Dame. Scans of all of the back issues have now been deposited onto the server that currently hosts ITAL, and will be processed and published online over the coming months. Bob Gerrity (r.gerrity@uq.edu.au) is University Librarian, University of Queensland, St. Lucia, Queensland, Australia. 2867 ---- Mapping for the Masses: GIS Lite and Online Mapping Tools in Academic Libraries Kathleen W. Weessies and Daniel S. Dotson INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2013 23 ABSTRACT Customized maps depicting complex social data are much more prevalent today than in the past. Not only in formal published outlets, interactive mapping tools make it easy to create and publish custom maps in both formal and more casual outlets such as social media. This article defines GIS Lite, describes three commercial products currently licensed by institutions, and discusses issues that arise from their varied functionality and license restrictions. INTRODUCTION News outlets from newspapers to television to Internet these days are filled with maps that make it possible for readers to visualize complex social data. Presidential election results, employment rates, and the plethora of data arising from the Census of Population are just a small sampling of social data mapped and consumed daily. The sharp rise in published maps in recent years has increased consumer awareness of the effectiveness of presenting data in map format and has raised expectations for finding, making and using customized maps. Not just in news media, but in academia also, researchers and students have high interest in being able to make and use maps in their work. Just a few years ago even the simplest maps had to be custom made by specialists. Researchers and publishers had to seek out highly trained experts to make maps on their behalf. As a result, custom maps were generally only to be found in formal publications. The situation has changed partly because geographic information system (GIS) software for geographic analysis and map making is more readily available than in years past. It does, however, remain specialized and wants considerable training for users to be proficient at even a basic level.1 This gap between supply and demand has been partly filled, especially in the last five years, by the growth of Internet-based “GIS Lite” tools. While some basic tools are freely available on the Internet, several tools are subscription-based and are licensed by libraries, schools and businesses for use. College and university libraries especially are quickly becoming a major resource for data visualization and mapping tools. The aim of this article is to describe several data-rich GIS Lite tools available in the library market and how these products have met or failed to meet the needs of several real-life college class Kathleen W. Weessies (weessie2@msu.edu), a LITA member, is Geosciences Librarian and Head of the Map Library, Michigan State University, Lansing. Michigan. Daniel S. Dotson (dotson.77@osu.edu) is Mathematical Sciences Librarian and Science Education Specialist, Associate Professor, Ohio State University Libraries, Columbus, Ohio. mailto:weessie2@msu.edu mailto:dotson.77@osu.edu MAPPING FOR THE MASSES: GIS LITE & ONLINE MAPPING TOOLS IN ACADEMIC LIBRARIES | WEESSIES AND DOTSON 24 situations. This is followed by a discussion of issues arising from user needs and restrictions posed by licensing and copyright. WHAT IS GIS LITE? Students and faculty across the academic spectrum often discover that their topic has a geographic element to it and a map would enhance their work (paper, presentation, project, poster, article, book, thesis or dissertation, etc.). If their research involves data analysis, geospatial tools will draw attention to spatial patterns in the data that might not otherwise be apparent. Every scholar with such needs must make a cost/benefit decision concerning GIS: is his or her need greater than the cost in time and effort (sometimes money) necessary to learn or hire skills to produce map products? A full functioning GIS, being a specialized system of software designed to work with geospatially referenced datasets, is designed to address all the problems above. The data may be analyzed and output into customized maps exactly to the researcher’s need. The traditional low- end solution available to non-experts, on the other hand, is colorizing a blank outline map, either with hand-held tools (markers, colored pencils, etc.) or on a computer using a graphic editing program. The profusion of web mapping options dangles tantalizingly with possibility, and occasionally (and increasingly) is able to provide an output that illustrates a useful point of users’ research in a professional enough manner to fill a need. In recent years the web has blossomed with map applications collectively called the “GeoWeb” or “geospatial web.” GeoWeb or geospatial web refers to the “emerging distributed global GIS, which is a widespread distributed collaboration of knowledge and discovery.”2 Some GeoWeb applications are well known street map resources such as Google Maps and MapQuest. Others are designed to deliver data from an organization, such as the National Hazards Support System (http://nhss.cr.usgs.gov), National Pipeline Mapping System (http://www.npms.phmsa.dot.gov/PublicViewer), and the Broadband Map (http://www.broadbandmap.gov). A few tools focus on map creation and output such as ArcGIS Online (http://www.arcgis.com/home/webmap/viewer.html) and Scribble Maps (http://www.scribblemaps.com). The newest subgenre of the GeoWeb consists of participatory mapping sites such as OpenStreet Map (http://www.openstreetmap.org), Did You Feel It? (http://earthquake.usgs.gov/earthquake.usgs.gov/earthquakes/dyfi), and Ushahidi (http://community.ushahidi.com/deployments). The GeoWeb literature is small but growing. 3 Elwood reviewed published research on the geographic web.4 The GeoWeb literature tends to focus on creation of mappable data and delivery of GeoWeb services.5 In these the map consumer only appears as a contributor of data. Very little has been written about users’ needs from the GeoWeb. The term GIS Lite has arisen among map and GIS librarians to describe a subset of GeoWeb applications. GIS Lite is useful to library patrons lacking specialized GIS training but who wish to conduct some GIS and map-making activities on a lower learning curve. For the purpose of this article, GIS Lite will refer to applications, usually web-based, which allow users to manipulate geospatial data and create map outputs without programming skills or training in full GIS software. http://nhss.cr.usgs.gov/ http://www.npms.phmsa.dot.gov/PublicViewer http://www.broadbandmap.gov/ http://www.arcgis.com/home/webmap/viewer.html http://www.scribblemaps.com/ http://www.openstreetmap.org/ http://earthquake.usgs.gov/earthquake.usgs.gov/earthquakes/dyfi http://community.ushahidi.com/deployments INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2013 25 While many GeoWeb applications allow only low-level output options, GIS Lite will provide an output intended to be used in activities or rolled into a GIS for further geospatial processing. In libraries, GIS Lite is closely allied with data and statistics resources. Data and statistics librarianship have already been discussed as disciplines in the literature such as by Hogenboom6 and Gray.7 New technologies and access to deeper data resources such as the ones presented here have raised the bar for librarians’ responsibilities for curating, serving, and aiding patrons in its use. Rather than be passive shepherds of information resources, librarians are now active participants and even information partners. Librarians with map and GIS skills similarly can directly enhance the quality of student scholarship across academic disciplines.8 The GIS Lite resources, however, need not remain specialized tools of map and GIS librarians. Librarians working in disciplines across the academic spectrum may incorporate them into their arsenal of tools to meet patron needs. DATA VISUALIZATION TOOLS A growing number of academic libraries have licensed access to online data providers. The following data tools contain enough GIS Lite functionality to aid patrons in visualizing and manipulating data (primarily social data) and creating customized map outputs. Three of the more powerful commercial products described here are Social Explorer, SimplyMap, and ProQuest Statistical Datasets. Social Explorer Licensed by Oxford University Press, Social Explorer provides selected data from the US Decennial Census 1790 to 2010, plus American Community Survey 2006 through 2010.9 The interface enables either retrieval of tabular data or visualization of data in an interactive map. As the user selects options through pull-down menus, the map automatically refreshes to reflect the chosen year and population statistics. The level of geography depicted defaults to county level data. If a user zooms in to an area smaller than a county, then data refreshes to smaller geographies such as census tracts if they are available at that level for that year. Output is in the form of graphic files suitable for sharing in a computer presentation (see figure 1). One advantage of Social Explorer is that it utilizes historic boundaries as they existed for states, territories, counties, and census tracts for each given year. Social Explorer utilizes data and boundary files generated by the National Historical GIS (NHGIS) based at the University of Minnesota in collaboration with other partners. The creation of these historical boundaries was a significant undertaking and accomplishment.10 Custom tables of data and the historic geographic boundaries may also be retrieved and downloaded for use from an affiliated engine through the NHGIS website (http://www.nhgis.org). A disadvantage of this product is that the tool, while robust, does not completely replicate all the data available in the original paper census volumes. Also, historical boundaries have not been created for city or township-level data. The final map layout is not customizable either in the location of title and legend or in the data intervals. http://www.nhgis.org/ MAPPING FOR THE MASSES: GIS LITE & ONLINE MAPPING TOOLS IN ACADEMIC LIBRARIES | WEESSIES AND DOTSON 26 Figure 1: Map Depicting Population Having Four or More Years of College, 1960 (Source: Social Explorer, 2012; image used with permission) SimplyMap SimplyMap (http://geographicresearch.com/simplymap) is a product of Geographic Research. This powerful interface brings together public and licensed proprietary data to offer a broad array of 75,000 data variables in the United States. US Census Data are available 1980–2010 normalized to the user’s choice of either year 2000 or year 2010 geographies. Numerous other licensed datasets primarily focus on demographics and consumer behavior, which makes it popular as a marketing research tool. Each user establishes a personal login which allows created maps and tables to persist from session to session. Upon creating a map view, the user may adjust the smaller geographic unit at which the theme data is displayed and also may adjust the data intervals as desired. The user creates a layout, adjusting the location of the map legend and title before exporting as a graphic or PDF (see figure 2). Data are also exportable as GIS-friendly shapefiles. http://geographicresearch.com/simplymap INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2013 27 The great advantage of this product is the ability to customize the data intervals. This makes it possible to filter the data and display specific thresholds meaningful to the user. For instance if a user needs to illustrate places where an activity or characteristic is shared by “over half” of the population, then one may change the map to display two data categories: one for places where up to 50 percent of the population shares the characteristic and a second category for places where more than 50 percent of the population shares the characteristic. Another potential advantage is that all local data have been allocated pro rata so that all variables, regardless of their original granularity, may be expressed by county boundaries, by zip code boundaries, or by census tract. A disadvantage of the product is the lack of historical boundaries to match historical data. Figure 2. Map Depicting Census Tracts That Have More Than 50% Black Population (Yellow Line Indicates Cincinnati City Boundary) (Source: SimplyMap, 2012; image used with permission) MAPPING FOR THE MASSES: GIS LITE & ONLINE MAPPING TOOLS IN ACADEMIC LIBRARIES | WEESSIES AND DOTSON 28 ProQuest Statistical Datasets Statistical Datasets was developed by Conquest Systems Inc. and is licensed by ProQuest. This product also mingles a broad array of several thousand public and licensed proprietary datasets, including some international data, in one interface. The user may retrieve data and view it in tabular or chart form. If the data have a geographic element, then the user may switch the view to a map interface. The resulting map may be exported as an image. The data may also be exported to a GIS-friendly shapefile format. This product offers more robust data manipulation than the other products, in that the user may perform calculations between any of the data tables and create a chart or map of the created data element (see figure 3). Statistical Datasets, however, has more simplistic map layout capabilities than the other products. Figure 3. Map of Sorghum Production, by Country, in 2010 (Source: ProQuest Statistical Datasets, 2012; image used with permission) CASE STUDIES The following three case studies are of college classroom situations in which students utilized maps or map making as part of the assigned course work. The above mapping options are assessed for how well they met the assignment needs. INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2013 29 Case Study 1 An upper level statistics course at The Ohio State University requires students to create maps using SAS (http://www.sas.com). While many may not associate the veteran statistical software package with creating maps, this course uses it along with SAS/GRAPH to combine statistical data with a map. The project requires data articulated at the county level in Ohio, which the students then combine into multi-county regions. The end result is a map with regions labeled and rendered in 3D according to the data values. An example of the type of map that could be produced from such data using SAS can be seen in figure 4. Figure 4. Map of Observed Rabbit Density in Ohio using SAS, SAS/GRAPH, and Mail Carrier Survey Data,1998 (image used with permission) While the data are provided in this course, students could potentially seek help from the library in a traditional way to find numerical data expressed at a county level. The librarian would guide http://www.sas.com/ MAPPING FOR THE MASSES: GIS LITE & ONLINE MAPPING TOOLS IN ACADEMIC LIBRARIES | WEESSIES AND DOTSON 30 patrons through appropriate avenues to locate data such as to the three products listed above. All three options contain numerous data variables for Ohio at the county level. Because the students are further processing the data elsewhere (in this case SAS), the output options of the three products are less important. Ultimately the availability of data on a desired subject would be the primary determinant for choosing one of the three GIS Lite options discussed here. Social Explorer will export the data in tabular form which can then be ingested into SAS. SimplyMap and ProQuest Statistical Datasets would both be a bit easier, though, because both packages allow the user to export the data as shapefiles which are directly imported into SAS/GRAPH as both boundary files and joined tabular data. Case Study 2 A first year writing class at Michigan State University has a theme of the American ethnic and racial experience. Assignments all relate to a student’s chosen ethnic group and geographic location from approximately 1880 to 1930. Assignments build upon each other to culminate in a final semester paper. Students with ancestors living in the United States at that time are encouraged to examine their own family’s ethnicity and how they fit in their geographic context. Otherwise, students may choose any ethnic group and place of interest. Maps are a required element in the assignments. Maps that display historical census data help students place the subject ethnic group into the larger county, state, and national context over the time frame. The students can see, for instance, if their subject household was part of an ethnic cluster or an outlier to ethnic clusters. The parameters for finding data and maps are generous and open to each student’s interpretation. The wish is for students to find social statistics and maps that are insightful to their topic and will help them tell their story. Of the three statistical resources considered above, currently the only useful one is Social Explorer because it covers the time period studied by the class. The students may map several social indicators at the county level across several decades and compare their local area to the region and the nation. Also they may save their maps and include them in their papers (properly credited). Case Study 3 “The Ghetto” is an elective Geography class restricted to upperclassman at Michigan State University. In the semester project, students analyze the spatial organization and demographic variables of “ghetto” neighborhoods in a chosen city. A ghetto is defined as neighborhoods that have a 50 percent or higher concentration of a definable ethnic group. Since black and white are the only two races consistently reported at the Census Tract level for all the years covered by the class (1960 through 2010) the students necessarily use that data for their projects. Data needs for the class are focused and deep. The students specifically need to visualize US census data from 1960 through 2010 at the census tract level within the city limits for several social indicators. Indicators include median income, median housing value, median rent, educational attainment, income, and rate of unemployment. The instructor has traditionally required use of the paper census volumes and students created hand-made maps that highlight INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2013 31 tracts in the subject city that conformed to the ghetto definition and those that did not for each of the census years covered. Computer-retrieved data and computer-generated maps would be acceptable, but at the time of this writing no GIS Lite product is able to make all the maps that meet the specific requirements of this class. Social Explorer covers all of the date range and provides data down to the tract level. However it does not provide an outline of the city limits and does not provide all the data variables required in the assignment. SimplyMap will only work for 2000 through 2010 because tract boundaries are only available for those two years even though the data go back to 1980. SimplyMap does provide two excellent features though: it is the only product that allows an overlay of the (modern) city boundary on top of the census tract map, ands it is the only product that allows manipulation of the data intervals. Students may choose to break the data at the needed 50 percent mark, while the other products utilize fixed data intervals not useful to this class. ProQuest Statistical Datasets can compute the data into two categories to create the necessary data intervals; however Census data are only available beginning with Census 2000. MAP PRODUCTS FOR USER NEEDS These three real-life class scenarios illustrate how the rich and seemingly duplicative resources of the library can range from perfectly suitable to perfectly useless depending on each project’s exact needs. The appropriateness of any given tool can only be assessed fairly if the librarian is familiar with all the “ins and outs” of every product. The GeoWeb and GIS Lite tools mentioned throughout this article are summarized in table 1. The suitability of GIS Lite tools will be further affected by the following issues. Historical Boundaries The range and granularity of data tools are subject to factors sometimes at odds with what a researcher would wish to have. At this time, for instance, many historical resources provide data only as detailed as the county level. County level data are available largely due to the efforts of the NHGIS mentioned above and the Newberry Library’s Atlas of County Boundaries Project (http://publications.newberry.ort/ahcbp). Far fewer resources provide historical data at smaller geographies such as city, township, or census tract levels. This is because the smaller the geographies get, the exponentially more there are to create and for map interfaces to process. From the well-known resource City and County Data Book,11 it is easy enough to retrieve US city data. The historical boundaries of every city in the United States, however, have not been created. This is because city boundaries are much more dynamic than county boundaries and there is no centralized authoritative source for their changes over time. Two of the three case studies presented here utilized historic data. This isn’t necessarily a representative proportion of user needs; librarians should assess data resources in light of their own patrons’ needs. Normalization Two equally valid data needs concerning any kind of time series data concern changing geographic boundaries. Census tracts, for instance, provide geographic detail roughly at the neighborhood level designed by the Bureau of Census to encompass approximately 2,500 to 8,000 http://publications.newberry.ort/ahcbp MAPPING FOR THE MASSES: GIS LITE & ONLINE MAPPING TOOLS IN ACADEMIC LIBRARIES | WEESSIES AND DOTSON 32 people.12 Because people move around and the density of population changes from decade to decade, so the configuration and numbering of tracts change over time. Some scholars will wish to see the data values in the tracts as they were drawn at the time of issue. In this situation, a neighborhood of interest might belong to different tracts over the years or even be split between two or more tracts. Other scholars focused on a particular neighborhood may wish to see many decades of census data re-cast into stable tracts in order to be directly comparable. Data providers will take one approach or the other on this issue, and librarians will do well to be aware of their choice. License Restrictions A third issue affecting use of these products is the ability to use derived map images, not only in formal outlets such as professional presentations, articles, books, and dissertations, but also informal outlets such as blogs and tweets. For the most part GIS Lite vendors are willing—even pleased—to see their products promoted in the literature and in social media. The vendors uniformly wish any such use to be properly credited. The license that every institution signs when acquiring these products will specify allowed and disallowed activities. The license, fixated on disallowing abuse or resale or other commercialization of the data, might leave a chilling effect on users wishing to use the images in their work. If a user is in any doubt as to the suitability of an intended use of a map, he or she should be encouraged to contact the vendor to seek permission for its use. As data resources grow and become more readily usable, the possibility for scholarly inquiry grows. Librarians with familiarity with GIS Lite tools may partner with their patrons and guide them to the best resources. INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2013 33 Table 1: A Selection of GeoWeb and GIS Lite Tools and Their Output Options Tool Name URL Free or Fee Electronic Output Options* GeoWeb Tools Atlas of Historical County Boundaries http://publications.newberry.org/ahcbp/ Free Spatial data as Shapefile, KMZ; Image as PDF Did You Feel It? http://earthquake.usgs.gov/earthquakes/dyfi/ Free Tabular data as TXT, XML. Image as JPG, PDF, PS Google Maps https://maps.google.com/ Free None MapQuest http://www.mapquest.com Free None National Broadband Map http://www.broadbandmap.gov/ Free Image as PNG National Hazards Support Systems (USGS) http://nhss.cr.usgs.gov/ Free Image as PDF, PNG National Pipeline Mapping System https://www.npms.phmsa.dot.gov/PublicView er/ Free Image as JSF OpenStreetMap http://www.openstreetmap.org/ Free Tabular data as XML; Image as PNG, JPG, SVG, PDF Ushahidi Community - Deployments http://community.ushahidi.com/deployments/ Free Image as JPG GIS Lite Tools ArcGIS Online http://www.arcgis.com Limited free options; access is part of institutional site license Spatial data as ArcGIS 10; Image as PNG (in ArcExplorer) ProQuest Statistical Datasets http://cisupa.proquest.com/ws_display.asp?filt er=Statistical%20Datasets%20Overview Fee Tabular data as Excel, PDF, Delimited text, SAS, XML; Spatial data as Shapefile; Image may be copied to clipboard SAS/GRAPH http://www.sas.com/technologies/bi/query_re porting/graph/index.html Fee Image as PDF, PNG, PS, EMF, PCL Scribble Maps http://www.scribblemaps.com/ Free Spatial data as KML, GPX; Image as JPG SimplyMap http://geographicresearch.com/simplymap Fee Tabular data as Excel, CSV, DBF, Spatial data as Shapefile; Image as PDF, GIF * Does not include taking a screen shot of the monitor or making a durable URL to the page http://publications.newberry.org/ahcbp/ http://earthquake.usgs.gov/earthquakes/dyfi/ https://maps.google.com/ http://www.mapquest.com/ http://www.broadbandmap.gov/ http://nhss.cr.usgs.gov/ https://www.npms.phmsa.dot.gov/PublicViewer/ https://www.npms.phmsa.dot.gov/PublicViewer/ http://www.openstreetmap.org/ http://community.ushahidi.com/deployments/ http://www.arcgis.com/ http://cisupa.proquest.com/ws_display.asp?filter=Statistical%20Datasets%20Overview http://cisupa.proquest.com/ws_display.asp?filter=Statistical%20Datasets%20Overview http://www.sas.com/technologies/bi/query_reporting/graph/index.html http://www.sas.com/technologies/bi/query_reporting/graph/index.html http://www.scribblemaps.com/ http://geographicresearch.com/simplymap INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2013 34 REFERENCES 1. National Research Council, Division on Earth and Life Studies, Board on Earth Sciences and Resources, Geographical Sciences Committee, Learning to Think Spatially (Washington, D.C.: f Academies Press, 2006): 9. 2. Pinde Fu and Jiulin Sun, Web GIS: Principles and Applications (Redlands, CA: ESRI Press, 2011): 15. 3. For good overviews of the GeoWeb, see Muki Haklay, Alex Singleton and Chris Parker, “Web mapping 2.0: The Neogeography of the GeoWeb,” Geography Compass 2, no. 6 (2008): 2011- 2039, http://dx.doi.org/10.1111/j.1749-8198.2008.00167.x; Jeremy W Crampton, “Cartography: Maps 2.0,” Progress in Human Geography 33, no. 1 (2009): 91-100, http://dx.doi.org/10.1177/0309132508094074. 4. Sarah Elwood, “Geographic Information Science: Visualization, Visual Methods, and the GeoWeb,” Progress in Human Geography 35, no. 3 (2010): 401-408, http://dx.doi.org/10.1177/0309132510374250. 5. Songnian Li; Suzana Dragićević, and Bert Veenendaal eds, Advances in Web-based GIS, Mapping Services and Applications (Boca Raton, FL: CRC Press, 2011). 6. Hogenboom, Karen, Carissa Phillips, and Merinda Hensley, "Show Me the Data! Partnering with Instructors to Teach Data Literacy," in Declaration of Interdependence: The Proceedings of the ACRL 2011 Conference, March 30-April 2, 2011, Philadelphia, PA, ed. Dawn M. Mueller. (Chicago: Association of College and Research Libraries, 2011), 410-417, http://www.ala.org/acrl/files/conferences/confsandpreconfs/national/2011/papers/show_ me_the_data.pdf. 7. Ann S. Gray, “Data and Statistical Literacy for Librarians,” IASSIST Quarterly 28 no. 2/3 (2004): 24-29, http://www.iassistdata.org/content/data-and-statistical-literacy-librarians. 8. Kathy Weimer, Paige Andrew, and Tracey Hughes, Map, GIS and Cataloging / Metadata Librarian Core Competencies (Chicago: American Library Association Map and Geography Round Table, 2008), http://www.ala.org/magirt/files/publicationsab/MAGERTCoreComp2008.pdf. 9. Social Explorer. http://www.socialexplorer.com/pub/home/home.aspx. 10. Catherine Fitch and Steven Ruggles, Building the National Historical Geographic Information System Historical Methods 36, no. 1 (2003): 41-50, http://dx.doi.org/10.1080/01615440309601214 . 11. U. S. Bureau of Census. County and City Data Book, http://www.census.gov/prod/www/abs/ccdb.html. http://dx.doi.org/10.1111/j.1749-8198.2008.00167.x http://dx.doi.org/10.1177/0309132508094074 http://dx.doi.org/10.1177/0309132510374250 http://www.ala.org/acrl/files/conferences/confsandpreconfs/national/2011/papers/show_me_the_data.pdf http://www.ala.org/acrl/files/conferences/confsandpreconfs/national/2011/papers/show_me_the_data.pdf http://www.iassistdata.org/content/data-and-statistical-literacy-librarians http://www.ala.org/magirt/files/publicationsab/MAGERTCoreComp2008.pdf http://www.socialexplorer.com/pub/home/home.aspx http://dx.doi.org/10.1080/01615440309601214 http://www.census.gov/prod/www/abs/ccdb.html INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2013 35 12. Census Tracts and Block Numbering Areas. http://www.census.gov/geo/www/cen_tract.html. ACKNOWLEDGMENTS The authors wish to thank Dr. Michael Fligner, Dr. Clarence Hooker, and Dr. Joe Darden for permission to use their courses as case studies. http://www.census.gov/geo/www/cen_tract.html 2892 ---- Animated Subject Maps for Book Collections Tim Donahue INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2013 7 ABSTRACT Of our two primary textual formats, articles by far have received the most fiscal and technological support in recent decades. Meanwhile, our more traditional format, the book, seems in some ways to already be treated as a languishing symbol of the past. The development of OPACs and the abandonment of card catalogs in the 1980s and 1990s is the seminal evolution in print monograph access, but little else has changed. To help users locate books by call number and browse the collection by subject, animated subject maps were created. While the initial aim is a practical one, helping users to locate books and subjects, the subject maps also reveal the knowledge organization of the physical library, which it displays in a way that can be meaningful to faculty, students, and other community members. We can do more with current technologies to assist and enrich the experience of users searching and browsing for books. The subject map is presented as an example of how we can do more in this regard. LC CLASSIFICATION, BOOKS, AND LIBRARY STACKS During the last few decades of technological evolution in libraries, we have helped facilitate a seismic shift from print-based to digital research. Our library websites are jammed with electronic resources, digital collection components, database links, virtual reference assistance, online tutorials, and mobile apps. Collection budgets too have shifted from a print to electronic focus. Many libraries are now spending less than 20 percent of their material budgets on print monographs. And yet, our stacks are still filled with books that often take up more than fifty percent of our library spaces. Knowledge organization schemas have also evolved in libraries. We have subject lists to help users to decide on which databases to select that reflect current disciplines and majors in higher education. Internal database navigation continues to evolve in terms of limits, fields, and subject searching. Web searching is based on the contemporary keyword approach where “everything is miscellaneous” and need not be organized, but nationwide, billions of books still sit on shelves according to Dewey or Library of Congress classification systems that were initially developed over a century ago. Some say these organizing systems are woefully antiquated and do not reflect our contemporary post-modern realities, though they still amply serve their purpose to assign call number locations for our books. We hear scant little of plans to update these classification schemes. Why invest more time, energy, and resources on revamped organization schemes for libraries? The HathiTrust now contains the Tim Donahue (tdonahue@montana.edu) is Assistant Professor/Instruction Librarian, Montana State University, Bozeman, MT. ANIMATED SUBJECT MAPS FOR BOOK COLLECTIONS | DONAHUE 8 scanned text of more than ten million books. Google claims there are almost 130 million published titles in the world and intends to digitize all of them.1 What will happen to our physical book collections? How long will they reside on our library shelves? How long will they be located using the Dewey and LC systems? Is the library a shrinking organism? Profession-wide, there seems to be no concrete vision in regards to the future of our book collections. There is, of course, general acknowledgement that acquisition of e-books will increase as print acquisitions decrease and that, overall, print collections will accordingly shrink to reflect the growing digital nature of knowledge consumption. But for now and into the foreseeable future these billions of monographs remain on our shelves in the same locations our call number systems assigned to them decades ago. And while online library users are now able to utilize an array of electronic access delivery systems and web technologies for their article research and consumption, book seekers still need a call number. Books and articles have been our two primary textual formats for centuries. Articles have moved into the digital realm more fleetly than their lengthier counterparts. Their briefer length, the cyclical serial publication process, and the evolution of database containment and access have enabled, in a relatively short time, a migration from print to primarily digital access. Books, however, are accessed in much the same way they were a hundred years ago. The development of OPACs in the 1980s and 1990s and abandonment of card catalogs is the seminal evolution in print monograph access, but little else has changed.2 Once a call number is attained, the rest of the process remains physical, usually requiring pencil, paper, feet, sometimes a librarian, and a trip through the library until the object itself is found and pulled from the shelf. So while the process of article acquisition may employ a plethora of finding aids, keyword searching, database features, full text availability, and various delivery methods through our richly developed websites, beyond the OPAC and possibly a static online map, book seekers are on their own or need a librarian in what may seem a meaningless labyrinth of stacks and shelves. While the primary and most practical purpose of our classification schemes is to provide an assigned call number for book finding, these organizational outlines create an order to the layout of our stacks that maps a universe of knowledge within our library walls. This structure of knowledge reveals a meaning to our collections that includes the colocation of books by topic and proximity of related subjects. These features enhance the browsing process and often lead to the act of serendipitous discovery. To locate a book by call number, a user may consult library floor plans, which are typically limited to broad ranges or LC main classes, then rely on stack-end cards to home in on the exact stack location. To browse books by subject without using the catalog, a user typically must rely on a combination of floor plans and LC Outline posters if they exist at all. Often, informed browsing by subject cannot take place without a visit to the reference desk for mediation by a librarian. Even then, many librarians are barely familiar with their book collection’s organizational structure and are reticent to recommend broad subject browsing. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2013 9 PURPOSE AND DESCRIPTION OF THE SUBJECT MAP To help users locate books by call number and browse the collection by subject, animated subject maps were created at Skidmore College and Montana State University. Displaying overhead views of library floors, users mouse over stacks to reveal the LC sub-classes located within. Alternatively, they may browse and select LC subject headings to see which stacks contain them. The LC Outline contains 21 main subject classes and 224 sub-classes, corresponding to the first two elements of a book call number. On stack mouse-over, three items are displayed: the call number by range, the main subject heading, and all sub-classes contained within the stack. When using the browse by subject option, users select and click an LC main class and the stacks where this class is located are highlighted. While the initial aim is a practical one, helping users to locate books and subjects, the subject map also reveals the knowledge organization of the physical library, which it displays in a way that can be meaningful to faculty, students, and other community members. The map also provides local electronic access to the LC Classification Outline. At both institutions the maps are linked from prominent web locations and electronic points of need that are relevant and proximate to other book searching functions and tools. Figure 1. Skidmore College subject map showing stack mouse-over display. ANIMATED SUBJECT MAPS FOR BOOK COLLECTIONS | DONAHUE 10 Figure 2. Montana State University subject map showing stack mouse-over display. DESIGN RATIONALE AND METHODOLOGY The inspiration for the subject map started with a question: What if users could see on a map where individual subjects were located within the library? Most library maps examined were limited to LC main classes or broad ranges denoting wide swaths of call numbers. Including hundreds of LC subclasses would convolute and clutter a floor map beyond usability. But what if an online map contained each individual stack and only upon user-activation was the information revealed, saving space and avoiding clutter? Such a map should be as devoid of congestion as possible and focus the user’s attention on library stack locations and LC Classification. Working from existing maps and architectural blueprints of the library building, a basic perimeter was rendered using Adobe Illustrator and InDesign software. These perimeters were then imported into Adobe Flash and a new .FLA file created. Library stacks were then measured, counted, and added as a separate layer within each floor perimeter. Basic location elements such as stairways, elevators, and doors were added for locational reference points. Each stack was then programmed as a button with basic rollover functionality. Flash ActionScript was coded so that the correct call number, main class, and sub-class information appear within the interface upon rollover activation. This functionality accounts for the stack searching ability of the subject map. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2013 11 Additionally, the LC outline was made searchable within the map so that users can mouse over subjects and upon clicking, see what stacks contain those main classes. This functionality accounts for the subject searching ability of the map. Left-hand navigation was built in so users can toggle between these two main search functions. Maintaining visual minimalism and simplicity was a priority and inclinations to render the map more comprehensively were resisted in order to maximize attention to subject and stack information. Black, white, and gray colors were chosen to enhance the contrast of the map and aid the user’s eye for quick and clear use. Other relevant links and instructional context were added to the left-hand navigation including links to the Catalog, Official LC Outline, and library homepage. Finally, after uploading to the local server and creating a simple URL, links to the subject map were established in prominent and meaningful points of need within the library website. USER ACCEPTANCE Once the subject map was completed and links to it were made public, a brief demonstration was provided for reference team members who began showing it to users at the reference desk. Initial reaction was enthusiastic. Students thought it was “cool” and enjoyed “playing with it.” One reported, “I didn’t know the library actually made sense like that. It’s neat to see the logic about where things are.” Another student said, “Now I can see where all the books on Buddhism are!” Faculty, too, were pleased. Though faculty members typically know a little about LC Classification, they are not accustomed to seeing it visualized and grafted onto their institutional library’s stacks. Making transparent the intellectual organization of the library for other faculty can bolster their confidence in our order and structure. Professors are often pleased to see their discipline’s place within our stacks and where related subjects are located. The most positive praise for the subject map, however, comes from the sense of convenience it lends. Many comments express appreciation for the ability to directly locate an individual book stack. Because primary directional and finding elements like stairs and elevators are included in the maps, users are able to see the exact path that leads to the book they are seeking. For those not interested in browsing, in a hurry, or challenged in terms of mobility, the subject map is a time and energy saver. Some users however have reported frustration with the sensitivity required for the mouse-over functions. Others desire a more detailed level of searching beyond the sub-class level. One user pointed out that the subject map was of no help to the blind. MULTIPLE USES AND INTERNAL APPLICATIONS The primary use and most obvious application of the subject map is as a reference tool. As a front line finding aid, librarians and other public service staff at reference, circulation, or other help desks can easily and conveniently deploy the map to point users in the right direction and orient them to the book collection. In library instruction sessions, the subject map is not only a practical local resource worth pointing out, but also serves as an example of applied knowledge organization. When accompanying a demonstration of the library catalog, the map is not only a valuable finding aid, but adds a layer of meaning as well. Students who understand the map are ANIMATED SUBJECT MAPS FOR BOOK COLLECTIONS | DONAHUE 12 not only more able to browse and locate books, but learn that a call number represents a detailed subject meaning as well as locational device. Used in conjunction with a tour, the map reinforces the layout of library shelves and helps to bridge the divide between electronic resources and physical retrieval. The subject map facilitates a concrete and visual introduction to the LC Classification Outline, a knowledge of which can be applied to most college and research libraries in the United States. The subject map can also be of assistance with Collection Development. Perusal of the map can reveal relative strengths and weaknesses within the collection. Subject liaisons and bibliographers may use the map to home in on and visualize their assigned areas. Circulation staff and stacks maintenance workers find the map useful for book retrieval, shifting projects, and in the training and acclimation of new workers to the library. The subject map has proven to be a useful reference for library redesign and space planning considerations. At information fairs and promotional events where devices or projection screens are available, the map has served as a talking point and promotional piece of digital outreach. The map has been demonstrated by information science professors to LIS graduate students as an example of applied knowledge organization in libraries. Recently, a newly hired incoming library dean commented that the map helped him “get to know the book collection” and familiarized him with the library. Figure 3. Skidmore College subject map showing subject search display. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2013 13 ISSUES AND CHALLENGES In some libraries, books don’t move for decades. The same subjects may reside on the same shelves during an entire library’s lifetime. In this case, a subject map can be designed once and never edited. But, of course, most library buildings go through changes and evolutions. In many libraries, collection shifting seems to be ongoing. Book collections wax and wane. Certain subjects expand with their times, while others shrink in irrelevancy. Weeding does not affect all subjects and stacks equally and adjustments to shelves and end cards are necessary. In addition to the transitions of weeding and shifting, sometimes whole floors are reconfigured. In the library commons era of the last few decades, substantial redesigns have been commonplace as book collections make way for computer stations and study spaces. In all these cases, adjustments and updates will be necessary to keep a subject map accurate. This is easily done by going back into the master .FLA file and editing as needed. In many cases only a stack or two need be adjusted, but in instances of major collection shifting some planning ahead may be necessary and more time allotted for redesign. Shifting can be a complex spatial exercise and it is difficult to predict where subjects will realign exactly. Subject map editing may have to wait until physical shifting is completed. It should be noted that each stack must be hand-coded separately. In libraries with hundreds of stacks this can seem a tedious and time-consuming design method. Both subject maps rely on Adobe Flash animation technology. Flash is proprietary software, so the benefits of open source software cannot be utilized with subject maps at this time. Further, Abobe Flash Reader software must be installed on a computer for the subject map to render. This has almost never been a problem, however, as the Flash Reader is ubiquitous and automatically installed on most public and private machines upon initial boot up. Another concern, however, relating to Flash technology is human assets. Not every library has a Flash designer or even someone who can implement the most fundamental Flash capabilities. Flash is not hard to learn and the subject maps utilize only its most basic functionalities, but still, for some it remains a niche software and many libraries will not have the resources to invest. Reaction, though, to the live subject maps and the rollover interactivity they provide, has been so positive that more fully integrated Flash maps have been proposed. Why not have all physical elements of the library incorporated into one Flash-enabled map? This is possible but may come at some expense to the functionality of the subject-rendering aspect of the maps. By limiting the application to stacks and LC classes, a user may remain more focused. Avoiding clutter, over- crowding, and a preponderance of choice is a design strategy that has gained much credibility in recent years.3 The subject map enjoys the usability success of clean design, limited purpose, and simple rendering. While demonstrating the potential of user-activated animation for other proposed library applications, the subject map might be best maintained as a limited specialty map. A final concern regarding the long-term success of subject maps should be mentioned. How long will books remain in libraries? How long will they be organized by subject? When the physical ANIMATED SUBJECT MAPS FOR BOOK COLLECTIONS | DONAHUE 14 arrangement and organization of information objects no longer exists in libraries, maps of any kind will seemingly lose all efficacy. But will libraries themselves exist in this future? Whither books? Whither libraries? FUTURE DEVELOPMENTS The most prominent and practical attribute of the subject map is its ability to show a user the exact stack where the book they are seeking is located. But in its current state as a stand-alone application, a user must obtain a call number from a catalog search, then open the subject map by going to its independent URL. Investigation is underway to determine what is necessary in order to integrate the subject map with the online catalog. In this scenario, a catalog item record might also display an embedded subject map that automatically highlights the floor and stack where the call number is located. This seemingly requires .SWF files and Flash ActionScript to be embedded in catalog coding. One potential solution is to attribute an individual URL to each stack rendering so that a get URL function can be applied and embedded in each catalog item record. This synthesis of subject map and catalog poses a complex challenge but promises meaningful and time-saving results for the item retrieval process. QR code technology in conjunction with subject map use is also being deployed. By fixing QR codes on stack end cards that link to relevant sections of the LC Outline, a researcher may use a mobile device to browse digitally and physically within the stacks at the same time. In this way a user may conduct digital subject browsing and physical item browsing simultaneously. The URLs linked to by QR coding contain detailed LC sub-levels not contained within the subject map, which is limited to the level of sub-class. The active discovery of new knowledge facilitated by exploiting pre- existing LC organization inside library stacks in real time can be quite impressive when experienced firsthand. Another development exploiting LC knowledge organization is in beta mode at this time. An LC search database has been created allowing users to enter words and find matching LC subject terminology. Potentially, this database could be merged with the subject map, allowing users to correlate subject word search with physical locations independent of call numbers. Despite its intent as a limited specialty map, possibilities are also being explored to incorporate the subject map into a more fully integrated library map. One way forward in this regard is to create map layers that could be toggled on and off by users. In this way, the subject map could exist as its own layer, maintaining its clarity and integrity when isolated but integrated when viewed with other layers. Flash technology excels at allowing such layer creation. OTHER STACK MAPS AND RELATED TECHNOLOGIES Searching the web for “subject map” and relative terminology such as stack, shelf, book, and LC maps, does turn up various efforts and approaches to organizing and exploiting classification scheme data, but no animated, user-activated maps are found. Similar searches across library and information science literature turn up some explorative research on the possibilities of mapping INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2013 15 LC data, but again no animated stack maps are found.4 There is a product licensed by Bowker Inc. called StackMap that can be linked to catalog search results. When a user clicks on the map link next to a call number result, a map is displayed with the destination stack highlighted, but the information provided is locational only. StackMap is not animated or user-activated. No subject information is given and the map offers no browsing features. Since the release of HTML5, we are beginning to see more animation on the web that is not Flash- driven. Steve Jobs and Apple’s determined refusal to run Flash on their mobile devices has motivated many to seek other animation options. New HTML5 animation tools such as Adobe Edge, Hippo Animator, and Hype offer promising starts at dislodging the Flash grip on web animation, but they have far to go and do not yet offer either the ease of design nor the range of creative possibilities of Flash. Building an animated subject map with HTML5 alone does not seem possible at this time. UNIVERSAL APPLICABILITY OF THE SUBJECT MAP So far, subject maps have been created for two very different libraries. The commonality shared between the Montana State University and Skidmore College libraries is their possession of hundreds of thousands of books in stacks shelved by the LC classification system. This is a trait shared by nearly all college and research libraries. Subject maps can be easily structured on the Dewey Decimal System as well so that public libraries could benefit from their functionality, making the subject map appropriate and creatable for more than 12,000 libraries.5 Of our two primary textual formats, articles by far have received the most fiscal and technological support in recent decades. Article searching and retrieval continues to evolve through the rich implementation of assets such as locally constructed resource management tools, independent journal title searches, complexly designed database search interfaces, and dedicated electronic resource librarians. Meanwhile, our more traditional format, the book, seems in some ways to already be treated as a languishing symbol of the past. Because its future is uncertain, does that justify our neglect in the present? As a profession we seem a bit complacent about the state of our book collections. Why dedicate our technical resources to a format that is on the way out? But has the book disappeared yet? As we make room for more student lounges, coffee bars, computer stations, writing labs, and information commons, we should carefully ask what makes a library special. Good books and the focused, sustained treatment of knowledge they contain are part of the correct answer, symbolically and as yet, practically speaking. While our books still occupy our library shelves, shouldn’t they also fully benefit from the ongoing technological explosion through which we continue to evolve? OPACs haven’t evolved much in recent years. In fact they seem quite stymied to many librarians and users. We can do more with current technologies to assist and enrich the experience of users searching and browsing for books. The subject map is hopefully an example of how we can do more in this regard. While we have grown accustomed to increasingly look forward in order to position our libraries for the future, we should also remember to sometimes look back. Our classification systems and ANIMATED SUBJECT MAPS FOR BOOK COLLECTIONS | DONAHUE 16 book collections are assets built from the past that represent many decades of great labor, investment, and achievement. More than 12,000 public and academic libraries together make up one of our greatest national treasures and bulwarks of living democracy. Libraries are among the dearest valued assets in any of our states. Many of the most beautiful buildings in our nation are libraries. Based on library insurance values and estimated replacement costs, library buildings and the collections they hold amount cumulatively to hundreds of billions of dollars of worth.6 This astounding worth is figured mainly from the buildings themselves and the books they contain. A few have commented that there is some aesthetic quality to the subject maps. If this is true, the appeal comes from the synthesis of architectural form and the universe of knowledge revealed within, from the beauty of libraries both real and ideal, from physical and mental constructions unified. Animated subject maps can help bring the physical and intellectual beauty of libraries into the digital realm, but the main appeal is a practical one: to point the user directly to the book or subject they are seeking. So in conclusion, perhaps we should measure the subject map’s potential in the light of Ranganathan’s Five Laws of Library Science:7 1. Books are for use. 2. Every reader his [or her] book. 3. Every book its reader. 4. Save the time of the reader. 5. The library is a growing organism. The subject maps can be found at the following URLs: Skidmore College Subject Map: http://lib.skidmore.edu/includes/files/SubjectMaps/subjectmap.swf Montana State University Subject Map: www.lib.montana.edu/subjectmap REFERENCES 1. Google, “Google Books Library Project—An Enhanced Card Catalog of the World’s Books,” http://books.google.com/googlebooks/library.html, accessed November 8, 2012. 2. Antonella Iacono, “OPAC, Users, Web. Future Developments for Online Library Catalogues,” Bollettino AIB 50, no. 1–2 (2010): 69–88, http://bollettino.aib.it/article/view/5296. 3. Geoffrey Little, “Where Are You Going, Where Have You Been? The Evolution of the Academic Library Web Site,” The Journal of Academic Librarianship 38, no. 2, (2012): 123–25, doi:10.1016:j.acalib.2012.02.005. http://lib.skidmore.edu/includes/files/SubjectMaps/subjectmap.swf http://www.lib.montana.edu/subjectmap/ http://books.google.com/googlebooks/library.html http://bollettino.aib.it/article/view/5296 http://dx.doi.org/10.1016:j.acalib.2012.02.005 INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2013 17 4. Kwan Yi and Lois Mai Chan, “Linking Folksonomy to Library of Congress Subject Headings: An Exploratory Study,” Journal of Documentation 65, no. 6 (2009): 872–900, doi:10.1108:00220410910998906. 5. American Library Association, “Number of Libraries in the United States, ALA Library Fact Sheet 1,” www.ala.org/tools/libfactsheets/alalibraryfactsheet01. 6. Edward Marman, “A Method for Establishing a Depreciated Monetary Value for Print Collections,” Library Administration and Management 9, no. 2 (1995): 94–98. 7. S. R. Ranganathan, The Five Laws of Library Science (New Delhi: Ess Ess, 2006), http://hdl.handle.net/2027/mdp.39015073883822. http://dx.doi.org/10.1108:00220410910998906 http://www.ala.org/tools/libfactsheets/alalibraryfactsheet01 http://hdl.handle.net/2027/mdp.39015073883822 3001 ---- Editorial Board Thoughts: Technology and Mission: Reflections of a First-Year College Library Director Ed Tallent INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2012 3 As I reflect on my first year as director for a small college library, several themes are clear to me, but perhaps none resonates as vibrantly as the challenges in managing technology, technology planning, and the never-ending need for technology integration, both within the library and the college. It is all-encompassing, involving every library activity and initiative. While my issues will naturally have a contextual flavor unique to my place of employment, I imagine they reflect issues that all librarians face (or have already faced). What is perhaps less unique is how these issues of library technology intersect with some very high priority college initiatives and challenges. And, given myriad reports on students’ ongoing ambivalent attitudes toward libraries (after everything we have done for them!), it still behooves us to keep working at this integration of the library into the learning and teaching process and to hitch our wagon to larger strategic missions. So, what issues have I faced? The campus portal vs. library web site: this issue is neither new nor unique, but is still is a tangled web of conflicting priorities and attitudes, campus politics and technology vision, the extent and location of technology support, and the flexibility of the campus portal or content management system (CMS) and the people who direct it. It is not a question of any misunderstandings, as the need to market the library via the campus web site is obvious and the goal of personalized service is laudatory. Yet, marrying the external marketing needs with the internal support needs is a difficult balance to achieve. The web offers a more dramatic entrée to the Library than a portal/intranet, and portal technology is not perfect, as Jacob Neilson highlights in a recent post. The goal obviously is further complicated by the fact that the support needed to maintain a quality web presence--one that is well graphically interesting, vibrant and intuitive--is significant when one considers library web sites are rarely used a place to begin research by students and faculty. Ed Tallent (edtallent@ curry.edu) is Director, Levin Library, Curry College, Milton, Massachusetts. http://www.useit.com/alertbox/intranet-usability.html EDITORIAL BOARD THOUGHTS: TECHNOLOGY AND MISSION | TALLENT 4 The portal, on the other hand, promises a personalized approach and easier maintenance, but lacks the level of operability that would be desirable. The web presence can support both user needs and offer visitors a sense of the quality services and collections the library provides. So, at this writing, what we have is a litany of questions not yet resolved. Mobile, tablets, and virtual services: the questions also abound in these areas. Should we build our own mobile services, or contract out the development? Do we (can we) focus on creating a leadership role for the library in the area of emerging technology, or wait for a coordinated institutional vision and plan to emerge? In the area of tablets, we are about to commence circulating iPads and anyone who has gone through the labyrinthian process just to load apps will know that the process gives one pause as to the value of such an initiative, and that is before they circulate and need to be managed. Still, it is a technology initiative that demands review of library work flows, security, student training, and collection access. Virtual services were at a fairly nascent state upon my arrival and have grown slowly, as they are being developed in a culture that stressed individual, hands-on, and personalized services. Virtual services can be all that, but that needs to be demonstrated not only to the user but to the people delivering the service. The added value here is that the work engages us in valuable reflections on the way in which we work or should work. Value of the library: I began my new position at time when the college was deeply engrossed in the issue of student recruitment, retention, and success. For my employer these are significant institutional identity issues, and the library is expected to document its contributions to student outcomes and success. Not nearly enough has been done, though a working relationship with a new Director of Institutional Research is developing and critical issues such information literacy, integrated student support, learning spaces, learning analytics, and the need for a data warehouse will be incorporated into the into the college’s strategic plan. The opportunity is there for the library to link with major college initiatives, for example, and make information literacy more than a library issue. Citation management: now, here is a traditional library activity, the bane of many a reference service interaction and the undergraduate’s last-minute nightmare. A combination of technical, service and fiscal challenge revolve around the campus climate on the use of technology to respond to this quandary. What to do with faculty who believe strongly that the best way to learn this skill is by hand, not with any system that aims for interoperability and a desire to save the time of the user? For others, which tool should be used? Should we not just go with a free one? While discipline differences will always exist, the current environment does present opportunities for the library to take a leadership role in defining what the possibilities are and ideally connecting the approach to appropriate and measurable learning outcomes and to the larger issue of academic integrity. INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2012 5 E-books, PDA, article tokens: one of the unforeseen benefits of my moving to a small college library is that there is not the attachment to a print collection that exists in many/most research libraries. There is remarkable openness to experimenting with and committing to various methods of digital delivery of content. Thus, we have been able to test myriad possibilities, from patron driven book purchasing, tokens for journal articles, and streaming popular films from a link in the library management system. This blurring of content, delivery, and functionality presents numerous opportunities for librarians to have conversations with departments of the future of collections. Connecting with alumni: this is always an important strategic issue for colleges and universities and it seems as though there are promising emerging options for libraries to deliver database content to alumni, as vendors are beginning to offer more reasonable alumni-oriented packages. My library will be working with the appropriate campus offices next year to develop a plan for funding targeted library content for alumni as part of the college’s broader strategic activities to engage alumni. Web design skills: while I understand the value that products like LibGuides can bring to the community, allowing content experts (librarians) to quickly and easily create template-driven web-based subject guides, I remain troubled by the lack of design skills librarians possess, and by the lack of recognition that good design can be just as important as good content. This is not a criticism, as we are not graphic designers. We have a sense of user needs, knowledge about content, and a desire to deliver, but I believe that products like this lead librarians to believe that good design for learning is easy. I do not claim to be an expert, but I know this is not the case. This approach does not translate into user friendly guides that hold to consistent standards. I think we need to recognize that we can benefit from non-librarian expertise in the area of web design. One opportunity that I want to investigate along these lines is to create student internships that would bring design skills and the student perspective to the work. A win-win, as this also supports the college’s desire for more internships and experiential learning for students. There is neither time nor space to address an even broader library technology issue on the near horizon, which will be another campus engagement moment, the future ILS for the library. Yet, maybe that should have been addressed first, since what I have read and heard, the new ILSs will solve all of the above problems! 3002 ---- 54 inFormAtion tecHnoloGY AnD liBrAries | June 2011 recreation, law enforcement and public safety, and social services available in the community ■■ access to electronic encyclopedias, local libraries’ cata- logs, full-text articles online, and document delivery.”2 At the time we were asking the question, will an information infrastructure be built? The answer? Most assuredly. Indeed, librarians stepped up to the table and ensured that the public had access to information-related services at their local library. The information the public asked for in 1994, as listed above, is widely available today. There are numerous examples in which librar- ians and libraries have served as leaders in the ongoing sustainablity of local, regional, and national information networks. It was pointed out at the time, and remains true today, that in an era of ever-shrinking resources, libraries cannot and should not compete with telecommunications, entertainment, and computer companies. They need to “join them as equals in the information arena.”3 LITA has a viable role in the development of the twenty- first-century skills that will firmly put the information infrastructure into place. A LITA member is appointed as a liaison to the Office for Information Technology Policy (OITP) and serves on the LITA Technology and Access Committee, which addresses similar issues. The LITA Transliteracy Interest Group explores, develops, and promotes the role of libraries in all aspects of literacy. Working with the OITP provides LITA membership with the opportunity to participate in current issues, such as digital literacy. The information infrastructure has come a long way in the last twenty some years. There is still much to be done. Robert Bocher, Technology Consultant with the Wisconsin State Library and OITP Fellow, will present “Building the Future: Addressing Library Broadband Connectivity Issues in the 21st Century” at the LITA President’s Program from 4 p.m. to 5:30 p.m. on Sunday, June 26, at the ALA Annual Conference in New Orleans. I look forward to seeing you at the program and to hear about the successes and the work that remains to be done to address the broadband needs we all face in the country. References 1. Federal Communications Commission, The National Broadband Plan: Chapter 2: Goals for a High Performance America, http://www.broadband.gov/plan/2 -goals-for-a-high-performance-america/ (accessed Apr. 2, 2011). 2. Karen Starr, “The American Public, the Public Library, and the Internet; an Ever-Evolving Partnership” in The Cybrarian’s Manual, ed. Pat Ensor (Chicago: ALA, 1997): 23–24. 3. Ibid., 31. T wenty years ago, librarians became involved in the implementation of the Internet for the use of the public across the country. Those initiatives were soon followed by the Bill and Melinda Gates Foundation projects supporting public libraries, which included fund- ing hardware grants to implement public computer labs and connectivity grants to support high-speed Internet connections. In 2008, the Institute of Museum and Library Services (IMLS) convened a task force to define twenty- first-century skills for museums and libraries, which became an ongoing national initiative (http://www.imls .gov/about/21stCSkills.shtm). The one year anniversary of the release of the National Broadband Plan was March 16, 2011. As described on Broadband.gov, the plan is intended “to create a high-performance America—a more productive, creative, efficient America in which afford- able broadband is available everywhere and everyone has the means and skills to use valuable broadband applications.”1 In 1994, the Idaho State Library’s Development Division cosponsored eight focus groups in which 179 people participated. The participants were asked several questions, including the types of information they would like to see on the Internet. The results reflected the pub- lic’s interest at that time in the following: ■■ “expert advice on a variety of topics including medi- cine, law, car repair, computer technology, animal husbandry, and gardening ■■ economic development, investment, bank rates, con- sumer product safety, and insurance ■■ community-based information such as events, vol- unteers, local classified advertisements, special interest groups, housing information, public meet- ings, transportation schedules, and local employment opportunities ■■ computer training, foreign language programs, homework service, teacher recertification, school activities, school scheduling, and adult education ■■ electronic mail and the ability to transfer files locally as well as worldwide ■■ access to public records, voting records of legisla- tors, absentee voting, the ability to renew a driver’s license, the rules and regulations from governmental agencies, and taxes ■■ information about hunting and fishing, environmen- tal quality, the local weather, road advisories, sports, Karen J. starr (karen.j.starr@gmail.com) is Lita President 2010-11 and assistant administrator for Library and develop- ment Services, nevada State Library and archives, carson city. Karen J. Starr President’s Message: 21st Century Skills, 21st Century Infrastructure 3003 ---- eDitoriAl: sinGulAritY—Are We tHere, Yet? | truitt 55 I n my last column, I wrote about two books—Nicholas Carr ’s The Shallows and William Powers’ Hamlet’s Blackberry—relating to learning in the always-on, always connected environment of “screens.”1 Since then, two additional works have come to my attention. While I won’t be able to do them justice in the space I have here, they deserve careful consideration and open discussion by those of us in the library community. If Carr’s and Power’s books are about how we learn in an always-connected world of screens, Sherry Turkle’s Alone Together and Elias Aboujaoude’s Virtually You are about who we are in the process of becoming in that world.2 Turkle is a psychologist at MIT who studies human– computer interactions. Among her previous works are The Second Self (1984) and Life on the Screen (1995). Aboujaoude is a psychiatrist at the Stanford University School of Medicine, where he serves as director of the Obsessive Compulsive Disorder Clinic and the Impulse Control Disorders Clinic. Based on extensive coverage of specialist and popular literature, as well as numerous anonymized accounts of patients and subjects encoun- tered by the authors, both works are characterized by thorough research and thoughtful analysis. While their approaches to the topic of “what we are becoming” as a result of screens may differ— Aboujaoude’s, for example, focuses on “templates” and the terminology of traditional psychiatry, while Turkle’s examines the relationship between loneliness and soli- tude (they are different), and how these in turn relate to the world of screens—their observations of the everyday manifestations of what might be called the pathology of screens bear many common threads. I’m acutely aware of the potential for injustice (at best) and misrepresentation or misunderstanding (rather worse) that I risk in seek- ing to distill two very complex studies into such a small space. And, frankly, I’m still trying to wrap my head around both the books and the larger issues they raise. With that caveat, I still think we should be reading about and widely discussing the phenomena reported, which many of us observe on a daily basis. In the sections that follow, I’d like to touch on a very few themes that emerge from these books. ■■ “Why Do People No Longer Suffice?”3 A pair of anecdotes that Turkle recounts to explain her reasons for writing the current book seems worth shar- ing at the outset. In the first, she describes taking her then-fourteen-year-old daughter, Rebecca, to the Charles Darwin exhibition at New York’s American Museum of Natural History in 2005. Among the many artifacts on display was a pair of live giant Galapagos tortoises: “One tortoise was hidden from view; the other rested in its cage, utterly still. Rebecca inspected the visible tortoise thoughtfully for a while and then said matter-of-factly, ‘They could have used a robot.’” When Turkle queried other bystanders, many of the children agreed, with one saying, ‘For what the turtles do, you didn’t have to have live ones.’” In this case, “alive enough” was sufficient for the purpose at hand.4 Sometime later, Turkle read and publicly expressed her reservations about British computer scientist David Levy’s book, Love and Sex with Robots, in which Levy pre- dicted that by the middle of this century, Love with robots will be as normal as love with other humans, while the number of sexual acts and lovemak- ing positions commonly practiced between humans will be extended, as robots will teach more than is in all of the world’s published sex manuals combined.5 Contacted by a reporter from Scientific American about her comments regarding Levy’s book, Turkle was stunned when the reporter, equating the possibility of relation- ships between humans and robots with gay and lesbian relationships, accused her of likewise opposing these human-to-human relationships. If we now have reached a point where gay and lesbian relationships can strike us as comparable to human-to-machine relationships, something very important has changed; for Turkle, it sug- gested that we are on the threshold of what she terms the “robotic moment”: This does not mean that companionate robots are com- mon among us; it refers to our state of emotional—and I would say philosophical—readiness. I find people willing to seriously consider robots not only as pets but as potential friends, confidants and romantic partners. We don’t seem to care what these artificial intelligences “know” or “understand” of the human moments we might “share” with them. At the robotic moment, the performance of connection seems connection enough. We are poised to attach to the inanimate without prejudice.6 Marc TruittEditorial: Singularity—Are We There, Yet? marc truitt (marc.truitt@ualberta.ca) is associate university Librarian, Bibliographic and information technology Services, university of alberta Libraries, edmonton, alberta, canada, and editor of ITAL. 56 inFormAtion tecHnoloGY AnD liBrAries | June 2011 While these examples are admittedly extreme, both authors agree that something very basic has changed in the way we conduct ourselves. Turkle characterizes it as mobile technology having made each of us “pausable,” i.e., that a face-to-face interaction being interrupted by an incoming call, text message, or e-mail is no longer extraordinary; rather, in the “new etiquette,” it is “close to the norm.”10 And the rudeness, as well we know, isn’t limited to mobile communications. Referring to “flame wars,” which regularly erupt in online communities, Aboujaoude observes: The Internet makes it easier to suspend ethical codes governing conduct and behavior. Gentleness, com- mon courtesy, and the little niceties that announce us as well-mannered, civilized, and sociable members of the species are quickly stripped away to reveal a com- pletely naked, often unpleasant human being.11 Even our routine e-mail messages—lacking as they often do salutations and closing sign-offs—are character- ized by a form of curtness heretofore unacceptable in paper communications. Remarkably, to those old enough to recall the traditional norms, the brusqueness is not only unintended, it is as well unconscious; “[we] just don’t think warmth and manners are necessary or even advis- able in cyberspace.”12 ■■ Castles in the Air: Avatars, Profiles, and Remaking Ourselves as We Wish We Were Finally, a place to love your body, love your friends, and love your life. —Second Life, “What Is Second Life?”13 One of the interesting and worrisome themes in both Turkle’s and Aboujaoude’s studies is that of the reinven- tion and transformation of the self, in the form of online personas and avatars. This is the stock-in-trade of online communities and gaming sites such as Facebook and Second Life. These sites cater to our nearly universal desire to be someone other than who we are: Online, you’re slim, rich, and buffed up, and you feel you have more opportunities than in the real world. . . . we can reinvent ourselves as comely avatars. We can write the Facebook profile that pleases us. We can edit our messages until they project the self we want to be.14 The problem is that for many there is an increas- ing fuzziness at the interface between real and virtual ■■ Changing Mores, or the Triumph of Rudeness I can’t think of any successful online community where the nice, quiet, reasonable voices defeat the loud, angry ones. . . . The computer somehow nullifies the social contract. —Heather Champ, Yahoo!’s Flickr Community Manager7 Sadly, we’ve all experienced it. We get stuck on a bus, train, or in an elevator with someone engaged in a loud conversation on her or his mobile phone. All too often, the person is loudly carrying on about matters we wish we weren’t there to hear. Perhaps it’s a fight with a partner. Or a discussion of some delicate health matter. Whatever it is, we really don’t want to know, but because of the limitations imposed by physical spaces, we can’t avoid being a party to at least half of the conversation. What’s wrong with these individuals? Do they really have no consideration or sense of propriety? It turns out that in matters of tact and good taste, the ground has shifted, and where once we understood and abided by commonly accepted rules of conduct and respect for others, we do so no longer. Indeed, the every- day obnoxious intrusions by those using public spaces for their private conversations are among the least of offend- ers. Consider the following situations shared by Turkle: Sal, 62 years old, holds a small dinner party at his home as part of his “reentry into society” after several years of having cared for his recently deceased wife: I invited a woman, about fifty, who works in Washington. In the middle of a conversation about the Middle East, she takes out her BlackBerry. She wasn’t speaking on it. I wondered if she was check- ing her e-mail. I thought she was being rude, so I asked her what she was doing. She said that she was blogging the conversation. She was blogging the con- versation.8 Turkle later tells of attending a memorial service for a friend. Several [attendees] around me used the [printed] pro- gram’s stiff, protective wings to hide their cell phones as they sent text messages during the service. One of the texting mourners, a woman in her late sixties, came over to chat with me after the service. Matter-of-factly, she offered, “I couldn’t stand to sit that long without getting on my phone.” The point of the service was to take a moment. This woman had been schooled by a technology she’d had for less than a decade to find this close to impossible.9 eDitoriAl: sinGulAritY—Are We tHere, Yet? | truitt 57 enough” became yet more blurred. Turkle’s anecdotes of children explaining the “aliveness” of these robots are both touching and disturbing. Speaking of a Tamagotchi, one child wrote a poem: “My baby died in his sleep. I will forever weep. Then his batteries went dead. Now he lives in my head.”19 The concept of “alive enough” is not unique to the very young, either. By 2009, sociable robots had moved beyond children’s toys with the introduction of Paro, a baby seal-like “creature” aimed at providing companion- ship to the elderly and touted as “the most therapeutic robot in the world. . . . The children were onto something: the elderly are taken with the robots. Most are accepting and there are times when some seem to prefer a robot with simple demands to a person with more complicated ones.”20 Where does it end? Turkle goes on to describe Nursebot, a device aimed at hospitals and long-term care facilities, which colleagues characterized as “a robot even Sherry can love.” But when Turkle injured herself in a fall a few months later, [I was] wheeled from one test to another on a hospi- tal stretcher. My companions in this journey were a changing collection of male orderlies. They knew how much it hurt when they had to lift me off the gurney and onto the radiology table. They were solicitous and funny. . . . The orderly who took me to the discharge station . . . gave me a high five. The Nursebot might have been capable of the logistics, but I was glad that I was there with people. . . . Between human beings, simple things reach you. When it comes to care, there may be no pedestrian jobs.21 But need we librarians care about something as far- fetched as Nursebot? Absolutely. Now that IBM has proven that it can design a machine—okay, an array of machines, but something much more compact is surely coming soon—that can win at Jeopardy!, is the robotic reference librarian really that much of a hurdle? Take a bit of Watson technology, stick it in Nursebot, give it sensible shoes, and hey, I can easily imagine Bibliobot, factory-standard in several guises, including perhaps Donna Reed (as Mary, who becomes the town librarian in the alter-life of Capra’s It’s a Wonderful Life) or Shirley Jones (as Marian, the Librarian, in The Music Man). I like Donna Reed as much as anyone, but do I really want reference assistance from her android doppelgänger? But then, for years after the introduction of the ATM, I confess that I continued taking lunch hours off just so that I could deal with a “real person” at the bank, so perhaps it’s just me. The future is in the helping/service professions, indeed! And when we’re all replaced by robots (sociable and otherwise), what will we do to fill the time? personas: “Not surprisingly, people report feeling let down when they move from the virtual to the real world. It is not uncommon to see people fidget with their smart- phones, looking for virtual places where they might once again be more.”15 Turkle speaks of the development of what she terms a “vexed relationship” between the real and the virtual: In games where we expect to play an avatar, we end up being ourselves in the most revealing ways; on social-networking sites such as Facebook, we think we will be presenting ourselves, but our profile ends up as somebody else—often the fantasy of who we want to be. Distinctions blur.16 And indeed, some completely lose sight of what is real and what is not. Aboujaoude relates the story of Alex, whose involvement in an online community became so consuming that he not only created for himself an online persona—“’I then meticulously painted in his hair, streak by streak, and picked “azure blue” for his eye color and “snow white” for his teeth.’”—but also left his “real” girlfriend after similarly remaking the avatar of his online girlfriend, Nadia—“from her waist size to the number of freckles on her cheeks.” Speaking of his former “real” girl- friend, Alex said, “real had become overrated.”17 ■■ “Don’t We Have People for These Jobs?”18 Ageist disclaimer: When I grew up, robots—those that weren’t in science fiction stories or films—were things that were touted as making auto assembly lines more efficient, or putting auto workers out of jobs, depending on your perspective. While not technically a robot, the other machine that characterized “that time” was the Automated Teller Machine (ATM), which freed us from having to do our banking during traditional weekday hours, and not coincidentally resulted, again, in the loss of many entry-level jobs in financial institutions. As I recall, we were all reassured that the future lay in “helping/ service” professions, where the danger of replacement by machines was thought to be minimal. Now, fast forward 30 years. The first half of Turkle’s book is the history of “socia- ble robots” and our interactions with them. Moving from the reactions of MIT students to Joseph Weizenbaum’s ELIZA in the mid-1970s, she recounts her studies of children’s interactions, first with electronic toys—e.g., Tamagotchi—and later, with increasingly sophisticated and “alive” robots, such as Furby, AIBO, and My Real Baby. With each generation, these devices made yet more “demands” on their owners—for care, “feeding”, etc. And with each generation, the line between “alive” and “alive 58 inFormAtion tecHnoloGY AnD liBrAries | June 2011 to admit that we’ve seen many examples of how con- nectedness between people we’d otherwise consider “normal” has and is changing our manners and mores.24 Many libraries and other public spaces, reacting to patron complaints about the lack of consideration shown by some users, have had to declare certain areas “cell phone free.” In the interest of getting your attention, I’ve admit- tedly selected some fairly extreme examples from the two books at hand. However, I think the point is that, now that the glitter of always-on, always-connected, has begun to fade a bit, there is a continuum of dysfunctional behaviors that we are beginning to notice, and it’s time to talk about how we as librarians fit into all of this. Are there things we in libraries are doing that encourage some of these less desirable and even unhealthy behaviors? Which takes us to a second concern raised by some of my gentle draft-readers: We’ve heard this tale before. Television, and radio before it, were technologies that, when they were new, were criticized as corrupting and leading us to all sorts of negative, self-destructive, and socially undesirable behaviors. How are screens and the technology of always-connected any different? A part of me—the one that winces every time some- one glibly refers to the “transformational” changes taking place around us—agrees. I was trained as a historian, to take a long view about change. And we’re talking about technologies that—in the case of the web— have been in common use for just over fifteen years. That said, my interest here is in seeing our profession begin a conversation about how connective technolo- gies have influenced behavioral changes in people, and especially about how we in libraries may be unwittingly abetting those behavioral changes. Television and radio were fundamentally different technologies in that they were one-way broadcast tools. And to the best of my recollection, neither has ever been widely adopted by or in libraries. Yes, we’ve circulated videos and sound recordings, and even provided limited facilities for the playback of such media. But neither has ever really had an impact on the traditional core business of librar- ies, which is the encouragement and facilitation of the largely solitary, contemplative act of reading. Connective technologies, in the form of intelligent machines and network-based communities, can be said to be anti- thetical to this core activity. We need to think about that, and to consider carefully the behaviors we may be encouraging. Notwithstanding those critics of change in our profes- sion who feel we move far too glacially, I would maintain that we have often been, if not at the forefront of the tech- nology pack, then certainly among its most enthusiastic ■■ Where From Here? I titled this column “Singularity.” For those not familiar with the literature of science fiction, Turkle provides a useful explanation: This notion has migrated from science fiction to engi- neering. The singularity is the moment—it is mythic; you have to believe in it—when machine intelligence crosses a tipping point. Past this point, say those who believe, artificial intelligence will go beyond anything we can currently conceive. . . . At the singularity, everything will become technically possible, including robots that love. Indeed, at the singularity, we may merge with the robotic and achieve immortality. The singularity is technological rapture.22 I think it’s pretty clear that we’re still a fair distance from anything that one might reasonably term a singular- ity. But the concept is surely present, albeit in a somewhat less hubristic degree, when we speak in uncritical awe of “game-changing” or “transformational” technologies. Turkle puts it this way: The triumphalist narrative of the Web is the reassuring story that people want to hear and that technologists want to tell. But the heroic story is not the whole story. In virtual worlds and computer games, people are flattened into personae. On social networks, people are reduced to their profiles. On our mobile devices, we often talk to each other on the move and with little disposable time—so little, in fact, that we communicate in a new language of abbreviation in which letters stand for words and emoticons for feelings. . . . We are increasingly connected to each other but oddly more alone: in intimacy, new solitudes.23 Some of my endlessly patient friends—the ones who provide both you and me with some measure of buffer- ing from the worst of my rants in prepublication drafts of these columns—have asked questions about how all this relates to libraries, for example: How much it is legitimate to generalize to the broader population research findings from cases of obsessive compulsive disorder? The individuals studied are, of course, obsessive and compulsive, in relation to the Internet and new technologies. Do their behaviors not represent an extreme end of the population? A fair question. And yes, the examples I’ve provided in this column are admittedly somewhat extreme. But Turkle and Aboujaoud both point to many examples that are far more common. I think all of us would have eDitoriAl: sinGulAritY—Are We tHere, Yet? | truitt 59 References and Notes 1. Marc Truitt, “Editorial: The Air is Full of People,” Information Technology and Libraries 30 (Mar. 2011): 3–5. http:// www.ala.org/ala/mgrps/divs/lita/ital/302011/3001mar/ editorial_pdf.cfm (accessed Apr. 25, 2011). 2. Sherry Turkle, Alone Together: Why We Expect More from Technology and Less from Each Other (New York: Basic Books, 2011); Elias Aboujaoude, Virtually You : The Dangerous Powers of the E-Personality (New York : Norton, 2011). 3. Turkle, 19. 4. Ibid., 3–4. 5. Quoted in Ibid., 5. 6. Ibid., 9–10. Emphasis added. 7. Quoted in Aboujaoude, 99. 8. Turkle, 162. Emphasis in original. 9. Ibid, 295. 10. Turkle, 161. 11. Aboujaoude, 96 12. Ibid., 98. 13. Quoted in Turkle, 1. 14. Ibid., 12. 15. Ibid. 16. Ibid., 153. 17. Aboujaoude, 77–78. 18. Turkle, 290. 19. Ibid., 34. 20. Ibid., 103–4. 21. Ibid., 120–21. 22. Ibid., 25. 23. Ibid., 18–19. 24. For a recent and typical example, see David Carr, “Keep Your Thumbs Still When I’m Talking to You,” New York Times, Apr. 15, 2011, http://www.nytimes.com/2011/04/17/ fashion/17TEXT.html (accessed May 2, 2011). 25. Aboujaoude, 283. adopters. In our quest to remain “relevant” to our uni- versity or school administrations, governing boards, and (in theory, at least) our patrons, we have embraced with remarkably little reservation just about every technology trend that’s come along in the past few decades. At the same time, we’ve been remarkably uncritical and unre- flective about our role in, and the larger implications of, what we might be doing by adopting these technologies. Aboujaoude, in a surprising, but I think largely correct summary comment, observes: Extremely little is available, however, for the indi- vidual interested in learning more about how virtual technology has reshaped our inner universe and may be remapping our brains. As centers of learning, public libraries, schools, and universities may be dis- proportionately responsible for this deficiency. They outdo one another in digitalizing their holdings and speeding up their Internet connections, and rightfully see those upgrades as essential to compete for stu- dents, scholars, and patrons. In exchange, however, and with few exceptions, they teach little about the unintended, less obvious, and more personal conse- quences of the World Wide Web. The irony is, at least in some libraries’ case, that their very survival seems threatened by a shift that they do not seem fully engaged in trying to understand, much less educate their audiences about.25 I could hardly agree more. So, how do we answer Aboujaoude’s critique? 3004 ---- 60 inFormAtion tecHnoloGY AnD liBrAries | June 2011 B ecause this is a family program and because we are all polite people, I can’t really use the term I want to here. Let’s just say that I am an operat- ing system [insert term here for someone who is highly promiscuous]. I simply love to install and play around with various operating systems, primarily free operating systems (OSes), primarily Linux distributions. And the more exotic, the better, even though I always dutifully return home at the end of the evening to my beautiful and beloved Ubuntu. In the past year or two I can recall installing (and in some cases actually using) the follow- ing: Gentoo, Mint, Fedora, Debian, moonOS, Knoppix, Damn Small Linux, EasyPeasy, Ubuntu Netbook Remix, Xubuntu, openSuse, NetBSD, Sabayon, SimplyMEPIS, CentOS, GeeXboX, and ReactOS. (Aside from stock Ubuntu and all things Canonical, the one I keep a con- stant eye on is moonOS [http://www.moonos.org/], a stunningly beautiful and eminently usable Ubuntu-based remix by a young artist and programmer in Cambodia, Chanrithy Thim.) In the old days I would have rustled up an old, sloughed-off PC to use as an experimental “server” upon which I would unleash each of these OSes, one at a time. But those were the old days, and these are the new days. My boss kindly bought me a big honkin’ Windows-based workstation about a year and a half ago, a box with plenty of processing power and memory (can you even buy a new workstation these days that’s not incredibly powerful, and incredibly inexpensive?), so my need for hardware above and beyond what I use in my daily life is mitigated. Specifically, it’s mitigated through use of virtual machines. I have long used VirtualBox (http://www.virtualbox .org/) to create virtual machines (VMs), lopped-off hunks of RAM and disk space to be used for the installation of a completely different OS. With VirtualBox, you first describe the specifications of the VM you’d like to cre- ate—how much of the host’s RAM to provide, how large a virtual hard disk, boot order, access to host CD drives, USB devices, etc. You click a button to create it, then you install an OS onto it, the “guest” OS, in the usual way. (Well, not exactly the usual way; it’s actually easier to install an OS here because you can boot directly from a CD image, or iso file, negating the need to mess with anything so distasteful and old-fashioned and outre as an actual, physical CD-ROM.) In my experience, you can create a new VM in mere seconds; then it’s all a mat- ter of how difficult the OS is to install, and the Linux distributions are becoming easier and easier to install as the months plow on. At any rate, as far as your new OS is concerned, it is being installed on bare metal. Virtual? Real? For most intents and purposes the guest OS knows no difference. In the titillatingly dangerous and virus-ridden cyber- world in which we live, I’ll not mention the prophylactic uses of VMs because, again, this is a family program and we’re all polite people. Suffice it to say, the typical network connection of a VM is NATed behind the NIC of the host machine, so at least as far as active network– based attacks are concerned, your guest VM is at least as secure as its host, even more so because it sits in its own private network space. Avoiding software-based viruses and trojans inside your VM? Let’s just say that the wisdom passed down the cybergenerations still holds: When it rains, you wear a raincoat—if you see what I’m saying. Aside from enabling, even promoting my shameless OS promiscuity, how are VMs useful in an actual work setting? For one, as a longtime Windows guy, if I need to install and test something that is *NIX-only, I don’t need a separate box with which to do so. (And vice versa too for all you Unix-weaned ladies and gentlemen who find the need to test something on a Rocker from Redmond.) If there is a software dependency on a particular OS, a particular version of a particular OS, or even if the configuration of what I’m trying to test is so peculiar I just don’t want to attempt to mix it in with an existing, stable VM, I can easily and painlessly whip up a new instance of the required OS and let it fly. And deleting all this when I’m done is easily accomplished within the VirtualBox GUI. Using a virtual machine facilitates the easy explora- tion of new operating systems and new applications, and moving toward using virtual machines is similar to when I first started using a digital camera. You are free to click click click with no further expense accrued. You don’t like what you’ve done? Blow it away and begin anew. All this VM business has spread, at my home institu- tion, from workstation to data center. I now run both a development and test server on VMs physically sitting on a massive production server in our data center—the kind of machine that when switched on causes a brown-out in the tri-state area. This is a very efficient way to do things though because when I needed access to my own server, our system administrator merely whipped up a VM for me to use. To me, real or virtual, it was all the same; to the system administrator, it greatly simplified operations. And I may joke about the loud clank of the host server’s power switch and subsequent dimming of the lights, but doing things this way has been shown to be more energy efficient than running a server farm in which each server Editorial Board Thoughts: Just Like Being There, or How I Learned To Stop Coveting Bare Metal and Learned to Love My VM mark cyzyk (mcyzyk@jhu.edu) is the Scholarly communication architect in the Sheridan Libraries, Johns hopkins university, Baltimore, Maryland. Mark Cyzyk eDitoriAl BoArD tHouGHts | cYzYK 61 Virtual machines: Zero-cost playgrounds for the pro- miscuous, and energy efficient, staff saving tools for system operations. What’s not to like? Throw dual monitors into the mix (one for the host OS; one for the guest), and it’s just like being there. sucks in enough juice to quench the thirst of its redundant power supplies. (They’re redundant, they repeat them- selves; they’re redundant, they repeat themselves—so you don’t want too many of them around slurping up the wattage, slurping up the wattage . . . ) 3005 ---- 62 inFormAtion tecHnoloGY AnD liBrAries | June 2011 Jason Vaughan and Kristen Costello Management and Support of Shared Integrated Library Systems the second major hardware migration occurred, and an initial memorandum of understanding (MOU) was drafted by the UNLV Libraries. This MOU is still used by the libraries. The MOU was discussed with all partners and ultimately signed by the director of each library. Since the MOU was signed nearly a decade ago, the system has continued to grow by all measures—size of the database, number of users, number of software modules compris- ing the complete system, and the financial and staff commitment toward support and maintenance. Despite the emergence of a large number of other network-based technologies critical to library operations and services, the ILS remains a critical system that supports many library operations. The research described in this paper developed in part because there is a dearth of published survey-based research of shared ILS management and financial support. This article interweaves local existing practices with research findings. For brevity’s sake, the system shared by the UNLV University Libraries and four additional partners will be referred to as UNLV’s system. To provide a relative sense of the footprint of each partner on the system, various measures can be used (see figure 1). ■■ Survey Method In April 2010, the authors administered a 20-question survey to the Innovative User’s Group (IUG) via the group’s listserv. The survey focused on libraries that are part of a consortial or otherwise shared Innovative ILS. The Innovative User’s Group is the primary user’s group associated with the Innovative ILS and suite of products. The IUG hosts a busy listserv, coordinates the annual North American conference devoted solely to the Innovative system, and provides Innovative cus- tomer-driven enhancement requests. To prevent multiple individuals from the same consortium responding to the survey, instructions indicated that only one individual from the main institution hosting the system should offi- cially respond. Given the anonymity of the survey and the desire to provide confidentiality, there is the possibility that some survey responses refer to the same system. The survey consisted primarily of multiple choice, “select all that apply,” and free-text response questions. The survey was divided into four broad topical areas: (1) background information; (2) funding; (3) support; and (4) training, professional development, and planning. The survey was open for a period of three weeks. Because respondents could choose to skip questions, the number of responses received per question varied. On average, 43 individual responses were received for each question. Innovative currently has more than 1,200 Millennium ILS installa- tions.2 Not all of those installations support multiple, administratively separate library entities. It is unknown The University of Nevada, Las Vegas (UNLV) University Libraries has hosted and managed a shared integrated library system (ILS) since 1989. The system and the number of partner libraries sharing the system has grown significantly over the past two decades. Spurred by the level of involvement and support contributed by the host institution, the authors administered a compre- hensive survey to current Innovative Interfaces libraries. Research findings are combined with a description of UNLV’s local practices to provide substantial insights into shared funding, support, and management activities associated with shared systems. S ince 1989, the University of Nevada, Las Vegas University Libraries has hosted and managed a shared integrated library system (ILS). Currently, partners include the University of Nevada, Las Vegas University Libraries (consisting of one main and three branch libraries, and hereafter referred to as UNLV Libraries); the administratively separate UNLV Law Library; the College of Southern Nevada (a commu- nity college system consisting of three branch libraries); Nevada State College; and the Desert Research Institute. The original ILS installation included just the UNLV Libraries and the Clark County Community College (now known as the College of Southern Nevada). The Desert Research Institute joined in the early 1990s, the UNLV Law Library joined with the establishment of the William J. Boyd School of Law in 1998, and, finally, Nevada State College joined upon its creation in 2002. Over time, the technological underpinnings of the ILS have changed tremendously and have migrated firmly into a web- based environment unknown in 1989. The system was migrated to Innovative Interfaces’ current java-based platform, Millennium, beginning in 1999. Since the origi- nal installation, there have been three major full hardware migrations, in 1997, 2002, and 2009. Over time, regular Innovative software updates, as well as additional pur- chased software modules, have greatly extended both the staff and end user functionality of the ILS. In early 2001, UNLV and its partners conducted a mar- ketplace assessment of ILS vendors catering to academic customers.1 The assessment reaffirmed the consortia’s commitment to Innovative Interfaces. Shortly thereafter, Jason Vaughan (jason.vaughan@unlv.edu) is director, Library technologies, university of nevada Las Vegas. Kristen costello (kristen.costello@unlv.edu) is Systems Librarian, university of nevada Las Vegas. mAnAGement AnD support oF sHAreD inteGrAteD liBrArY sYstems | VAuGHAn AnD costello 63 partners originally purchased the system together; 20 (38.5 percent) indicated they purchased the system with some of their current existing partners, while 9 (17.3 per- cent) indicated they as the main institution originally and solely purchased the system. Several of the entities shar- ing the UNLV Libraries’ system did not even exist when the ILS was originally purchased; only two of the current partners shared the original purchase cost of the system. Another background question sought to understand how partners potentially individualize the system despite being on a shared platform. Innovative, and likely other similar ILS vendors, offers several products to help libraries better manage and control their holdings and acquisitions. Of potential benefit to staff operations and workflow, Innovative offers the option to have multiple acquisitions and/or serials control units, which provide separate fund files and ranges of order records for differ- ent institutions sharing the ILS system. Of 51 responses received, 44 respondents (86.3 percent) indicated they had multiple acquisitions and serials units and 7 (13.7 percent) do not. Innovative offers two web-based discov- ery interfaces for patrons: the traditional online public access catalog, known as WebPAC, and their version of a next-generation discovery layer, known as Encore. Of potential benefit to staff as well as patrons, Innovative offers “scoping” modules that help patrons using one of the web-based discovery interfaces, as well as staff using the Millennium staff modules. The scoping module allows holdings segmentation by location or material type. Scopes allow libraries to define their collections and offer their patrons the option to search just the collection of their applicable library. Forty-six (88.5 percent) of the 52 respondents indi- cated they use scoping and 6 (11.5 percent) do not. UNLV how many shared Innovative library systems exist. While a true response rate cannot be determined, such a mea- sure is not critical for this research. The survey questions with summarized results are provided in appendix A. ■■ Survey Background UNLV’s system, with only five unique library entities, is a “small” system when compared with survey responses. Survey respondents indicated a range from 2 to 80 unique members sharing their system. Of the 48 responses received for this background question, 26 (54 percent) indicated 10 or fewer partners on the system. Seven (14.6 percent) indicated 40 or more partners. The average number of partners sharing an ILS implementation was 18 and the median was 8.5. There can be varying levels of partnership within a shared ILS system. UNLV’s instance is a rather informal partnership. Some survey respondents indicated the existence of a far more structured or dedicated support group not directly associated with any particular library. One respondent noted they have a central office comprised of an executive director and two additional staff, respon- sible for ILS administration; this central office reports to a board of directors, comprised of library directors for each member library. Another indicated they have a central office responsible not only for the ILS, but for other things such as wide and local area networks and workstation support. One respondent indicated that they are actually a consortium of consortia, with 9 hosts each comprised of anywhere from 4 to 11 libraries. Twenty-three respondents out of 52 (44.2 percent) indicated that they and all of their current existing Full-Time Library Staff Bibliographic Records Item Records Order Records Patron Records Staff Login Licenses UNLV Libraries 105 (70.9%) 1,494,890 (78.2%) 1,906,225 (81.1%) 74,223 (58.4%) 40,788 (59.6%) 85 (69.1%) UNLV Law Library 13 (8.8%) 246,678 (12.9%) 243,788 (10.4%) 29,921 (23.5%) 2,034 (3%) 13 (10.6%) College of Southern Nevada 27 (18.2%) 146,118 (7.6%) 175,862 (7.5%) 22,142 (17.4%) 23,876 (34.9%) 20 (16.3%) Nevada State College 1 (.7%) 17,787 (.9%) 17,979 (.8%) 841 (.7%) 1,718 (2.5%) 3 (2.4%) Desert Research Institute 2 (1.4%) 5,396 (.3%) 5,361 (.2%) 0 (0%) 24 (<.1%) 2 (1.6%) Figure 1. Various Measures of ILS Footprints for UNLV’s Shared ILS (percentage of overall system) Note: “Staff login licenses” refers to the number of simultaneous staff users each institution can have on the system at any given time. 64 inFormAtion tecHnoloGY AnD liBrAries | June 2011 share of funding toward annual maintenance based on their number of staff licenses, as shown in figure 1. ■■ Funding Support from Partners MOUs appear to include funding and budgeting informa- tion more than any other discrete topic. Direct support costs can include the maintenance support costs paid to one or more vendors, costs for additional vendor authored software modules purchased in addition to the base software, and, perhaps, licensing costs associated with a database or operating system used by the ILS (e.g., an Oracle license for Oracle based ILS systems). There are many parameters by which costs could be determined for partners, and, given the dearth of published research on the topic, a chief focus of this research sought more infor- mation on what factors were used by other consortia. The authors brainstormed 10 elements that could potentially figure into the overall cost sharing method. Thirty-eight respondents provided information on factors playing a role in their cost sharing arrangements, illustrated in fig- ure 2. Respondents could mark more than one answer for this question, as more than one factor could be involved. The top two factors relate directly to vendor costs— whether annual support costs or acquisition of new vendor software. Hardware placed third in overall fre- quency; for Innovative and likely for other ILS systems, ILS hardware can be purchased from the vendor or an approved platform can be sourced from a reseller directly. Support costs from third parties and the number of staff login ports were each identified as a factor by more than a third of all respondents. ■■ Software Purchases Depending on the software, additional modules extend- ing the system capabilities can benefit a single partner, or, in UNLV’s experience, all partners on the system. Traditionally, the UNLV Libraries have had the largest operating budget of the group, and a majority of new soft- ware requests have come internally from UNLV Libraries staff. Over the past 20 years, the UNLV Libraries have fully funded the initial purchase costs of a majority of the software extending the system, regardless of whether it benefits just the UNLV Libraries or all system part- ners. There are numerous exceptions where the partner libraries have contributed funding, including significant start-up costs associated with the UNLV Law Library join- ing the system in 1998 and the addition of Nevada State College in 2002. In both instances, those bodies funded required and recommended software directly applicable has multiple serials and acquisitions units as well as mul- tiple scopes configured to help segment the records for each entities’ particular collection. Innovative offers various levels of maintenance support. UNLV’s level of support includes the vendor supplying services such as application troubleshooting resolution, software updates, and some degree of oper- ating system and hardware configuration and advice. UNLV also contracts with the hardware vendor for hardware maintenance and underlying operating system support. The UNLV Libraries have had the opportunity to hire fully qualified and capable technical staff to provide a high level of support for the ILS. UNLV’s level of vendor support has evolved from an original full turnkey instal- lation with Innovative providing all support to a present level of more modest support. Nearly half of all survey respondents, 25 of 52 (48.1 percent) indicated they had a turnkey arrangement with Innovative; the remaining 27 respondents had a lesser level of support. Maintenance and support obviously carry a cost with one or more third party providers. The majority of the respondents, 40 of 51 (78.4 percent), indicated there is a cost-sharing structure in place where maintenance support costs related to the ILS are spread across partner libraries. Six respondents (11.8 percent) indicated the main institution fully funds the maintenance support costs. The UNLV Libraries drafted the first and current MOU in 2002 for all five entities sharing the ILS system. Thirty-five of 51 survey respondents (68.6 percent) indi- cated they, too, have a MOU in place. UNLV’s MOU is a basic document, two pages in length, split into the follow- ing sections: background; acquisition of new or additional hardware; acquisition of new or additional software; annual maintenance associated with the primary vendor and third party suppliers and, importantly, the associated cost allocation method for how annual support costs are split between the partners; how new products are pur- chased from the vendor; and management and support responsibilities of the hosting institution. Many of the survey respondents provided details on items contained in their own MOUs, which can be clustered into several broad categories. These include budgeting, payments, funding formulas; general governance and voting mat- ters; support (e.g., contractual service responsibilities, responsibilities of member libraries); equipment (e.g., title and use of equipment, who maintains equipment); and miscellaneous. This latter category includes items such as expectations for record quality; network requirements/ restrictions; fine collection; and holds management. The majority of UNLV’s MOU addresses shared costs for annual maintenance. UNLV’s cost-sharing structure is simple. The system has a particular number of associated staff (simultaneous login) licenses, which have gradually increased as the libraries have grown. Logins are sepa- rated by institution, and each member is assessed their mAnAGement AnD support oF sHAreD inteGrAteD liBrArY sYstems | VAuGHAn AnD costello 65 annual maintenance bill and all partners help maintain new software acquisitions by contributing toward the annual maintenance. Regarding new software acquisitions, cost-sharing practices varied between 44 respondents providing infor- mation in the survey. Eight (18.2 percent) indicated there is consultation with other partners and there is some arrangement to share costs between the majority or all partners sharing the system. Two respondents (4.5 percent) indicated the institution expressing the initial interest in the product fully funds the purchase. Nineteen respondents (43.2 percent) indicated that they have had instances of both these scenarios (shared funding and sole funding). Two respondents (4.5 percent) indicated they could not recall ever adding any additional soft- ware. Thirteen respondents (29.5 percent) offered details to their operation such as additional serials and account- ing units (for the Law Library), check-in and order records, and staff licenses. In addition, when the system was migrated from the aging text-based system (Innopac) to the current Millennium java-based GUI system in 1999, the current partners contributed toward the upgrade cost based on number of staff licenses. Partner institu- tions have continued to fund items of sole benefit to their operation, such as adding staff licenses or required network port interfaces associated with patron self-check stations installed at their facilities. During the 2000s, the UNLV Libraries have fully funded a majority of software of potential benefit to all partners, such as the electronic resource management module, the Encore next gen- eration discovery platform, and various OPAC/Encore enhancements. Software additions typically increase the Figure 2. Cost-Sharing Formula Factors T h e a m o u n t o f th e o ve ra ll ye a rl y In n o va ti ve In te rf a c e s m a in te n a n c e /s u p p o rt i n vo ic e T h e a m o u n t o f a n y a d d it io n a l 3 rd p a rt y m a in te n a n c e / su p p o rt a g re e m e n ts a ss o c ia te d w it h t h e I n n o va ti ve sy st e m ( su c h a s c o n tr a c ts w it h t h e h a rd w a re m a n u fa c tu re r— H P, S u n M ic ro sy st e m s [O ra c le ], e tc .) T h e p u rc h a se c o st (s ) fo r n e w ly a c q u ir e d I n n o va ti ve m o d u le s/ p ro d u c ts T h e p u rc h a se c o st (s ) fo r n e w ly a c q u ir e d h a rd w a re a ss o c ia te d w it h t h e I n n o va ti ve s ys te m ( su c h a s a se rv e r, a d d it io n a l d is k s p a c e , b a c k u p e q u ip m e n t, e tc .) T h e n u m b e r o f in c id e n t re p o rt s (o r ti m e s p e n t) , b y p e rs o n n e l a t th e m a in i n st it u ti o n r e la te d t o r e se a rc h , tr o u b le sh o o ti n g , e tc . su p p o rt i ss u e s re p o rt e d b y p a rt n e r in st it u ti o n s T h e “ si ze ” o f th e p a rt n e r in st it u ti o n ’s p o rt io n o f th e In n o va ti ve s ys te m , a s m e a su re d b y in st it u ti o n F T E T h e “ si ze ” o f th e p a rt n e r in st it u ti o n ’s p o rt io n o f th e In n o va ti ve s ys te m , a s m e a su re d b y n u m b e r o f b ib o r it e m r e c o rd s th e p a rt n e r’ s in st it u ti o n h a s in t h e In n o va ti ve d a ta b a se T h e “ si ze ” o f th e p a rt n e r in st it u ti o n ’s p o rt io n o f th e In n o va ti ve s ys te m , a s m e a su re d b y n u m b e r o f st a ff lo g in p o rt s d e d ic a te d t o t h e p a rt n e r lib ra ry T h e “ si ze ” o f th e p a rt n e r in st it u ti o n ’s p o rt io n o f th e In n o va ti ve s ys te m , a s m e a su re d b y n u m b e r o f u se r se a rc h e s c o n d u c te d f ro m I P r a n g e s a ss o c ia te d w it h th e p a rt n e r in st it u ti o n T h e “ si ze ” o f th e p a rt n e r in st it u ti o n ’s p o rt io n o f th e In n o va ti ve s ys te m , a s m e a su re d b y th e n u m b e r o f p a tr o n r e c o rd s w h o se h o m e l ib ra ry i s a ss o c ia te d w it h t h e p a rt n e r in st it u ti o n 66 inFormAtion tecHnoloGY AnD liBrAries | June 2011 applied, the number of staff users has increased sig- nificantly, and the system was migrated to an underlying Oracle database in 2004. Since the original system was purchased in 1989 and fully installed in 1990, the central, locally hosted server has been replaced three times, in 1997, 2002, and 2009. Partners contributed toward the costs of the server upgrades in 1997 and 2002, while the UNLV Libraries fully funded the 2009 upgrade. Software and hardware components comprising the backup system have been significantly enhanced with a modern system capable of the speed, capacity, and features needed to per- form appropriately in the short backup window available each night. UNLV funded the initial backup software and hardware, and the partner institutions contribute toward the annual maintenance associated with the backup equipment and software. One survey question focused on major central infra- structure supporting the ILS (defined as items exceeding $1,000 and with several examples listed). The question did not focus on hardware that could be provided by ILS vendors benefiting a single partner, such as self-check sta- tions or inventory devices. Fourteen (31.8 percent) of the 44 respondents indicated that if major new hardware was needed, there was consultation with other partners, and, if purchased, a cost-sharing agreement was arranged. Two respondents (4.5 percent) indicated the institution expressing the initial interest fully funds the purchase and seven respondents indicated they’ve had instances in the past of both these scenarios. Three respondents (6.8 percent) indicated their shared system hardware had never been replaced or upgraded to their knowledge. Nineteen respondents provided information on alternate scenarios or otherwise more details as to local practice. Several indicated a separate fund is maintained solely for large ILS system-related improvements or ILS related purchases. Revenue for these funds can be built up over time through maintenance and use payments by partner libraries or by a small additional fee earmarked for future hardware replacement needs collected each year. One respondent indicated they have been able to get grant funds to cover major purchases. With few exceptions, the majority of free text responses indicated that costs for major purchases were shared by partners or otherwise funded by the central consortium or cooperative agency. As with regular annual maintenance and new soft- ware purchases, various elements can determine what portion of hardware replacement costs are borne by partner libraries. This includes number of staff licenses (21.9 percent of responses), institutional FTE count (15.6 percent), number of bibliographic or item records (15.6 percent), and number of patron records (9.4 percent). Twenty respondents provided additional information. Several indicated that the costs are split evenly across all partners. Several indicated that population served was a factor. Others reiterated that costs for central hardware on other scenarios. Several indicated that if a product is directly applicable to only one library, such as self-check interfaces and additional acquisition units, then the library in need fully funds the purchase, which mirrors the local practice at UNLV. Several respondents indicated that if a product benefits all libraries, then costs are shared equally. One respondent indicated that the partner librar- ies discuss the potential item, and collectively they may choose not to purchase, even if one or more partners are very interested. In such cases, those partners have the option to purchase the product and must agree to make it available to all partners. Several respondents indicated that, as the largest entity using the shared system, they generally always purchased new software for their opera- tion as needed, with the associated benefit that the other partners of the system were allowed to use the software as well. Three respondents reiterated that a central office funds add-on modules, in one case from funding set aside each year for system improvements. A fourth respondent indicated that a “joiners fee” fund, built up from new members joining the system, allows for the purchase of new software. Clearly there are many scenarios of how new software is funded. Generally, regardless of funding source, sole or share, if a product can benefit all partners, it’s allowed to do so. Thirty-six survey respondents provided details on what factors determine how much each partner contrib- utes toward new software purchases. Seven respondents (19.4 percent) indicated the number of staff licenses plays a role (as in the UNLV model). Three respondents (8.3 percent) indicated that institution FTE played a role, while three other respondents indicated that the num- ber of partner bibliographic/item records played a role. The majority of respondents, 25 (69.4 percent) provided alternate scenarios or otherwise more information. Nine of these 25 respondents indicated costs were split evenly across all partners. Several indicated that the formula used for determining maintenance costs was also applied to new software purchases. Four respondents indicated that the library service population was a factor. Two indi- cated that circulation counts were a factor. One indicated that it’s negotiated on a per purchase basis, based on varying factors. ■■ Hardware Purchases Hardware needs related to the underlying infrastructure, such as server(s), disk space, and backup equipment increases as the ILS grows. UNLV’s ILS installation has grown tremendously. New software modules have been purchased, application architecture changes occurred with the release of the Millennium suite in the late 1990s, regular annual updates to the system software have been mAnAGement AnD support oF sHAreD inteGrAteD liBrArY sYstems | VAuGHAn AnD costello 67 each partner institution. Each Module Coordinator served as the contact person charged with maintaining familiar- ity with the functions and features of a particular module, testing enhancements within new releases, keeping other staff informed of changes, and alerting the system vendor of any problems with the module. Annually, Module Coordinators were to consider new software and pri- oritize and recommend ILS software the library should consider purchasing. Module Coordinators were tasked to maintain a system-wide view of the ILS and alert oth- ers if they discovered problems or made changes to the ILS that could affect other areas of the system. In addition, Module Coordinators were encouraged to subscribe to the IUG listserv to monitor discussions and to maintain awareness of overall system issues. All staff had access to the system’s user manual but if they had questions on system features or functions, the Module Coordinator served as an additional resource. In addition, any bug reports were provided to the most appropriate Module Coordinator, who would contact Innovative. The UNLV Systems staff, which has grown over time and is now part of the Library Technologies Division, was responsible for all hardware and networking problems, and for sched- uling and verifying nightly data backups. The Systems Department coordinated any new software installations with the Module Coordinators Group, library staff, and library partners. In 2006, the UNLV Libraries reorganized and hired a dedicated Systems Librarian focused on the ILS. The Systems Librarian’s principal job responsibility is to serve as the central administrator and site coordina- tor of the UNLV Libraries’ shared ILS. Responsibilities include communicating with colleagues regarding cur- rent system capabilities, monitoring vendor software developments, monitoring how other libraries utilize their Innovative systems, and recommending enhance- ments. The Systems Librarian is the site contact with Innovative and coordinates and monitors support calls, software and patch upgrades, and new software module installations. The position serves as the contact person for the shared institutions whenever they have questions or issues with the ILS. The Systems Librarian has taken over much of the work previously coordinated through the Module Coordinators Group. While the formal Module Coordinators group no longer exists, module experts still provide assistance as needed, and consultation always occurs with partners on system-wide issues as they arise. UNLV is not unique in how it manages their ILS. In the survey results, 36 respondents (87.8 percent) indicated there is a dedicated individual at the main institution who has a primary responsibility of overseeing the ILS. To help clarify the responses, “primary responsibility” is defined as individuals spending more than half their time devoted to support, research, troubleshooting, and sys- tem administration duties related to the ILS. The authors replacements are determined by the same formula used for assessing the share of annual maintenance. ■■ Additional Purchases The last funding-related survey question asked if ongoing content enrichment services were subscribed to, and if so, to describe how the cost share amount is determined for partner libraries. Content enrichment services can provide additional evaluative content such as book cover images, table of contents (TOC), and book reviews. UNLV subscribes to a TOC service as well as an additional service providing book covers, reviews, and excerpts. Partner institutions contribute to the annual service charge associated with the TOC service and pay for each record enhanced at their library. UNLV fully funds the book cover/review/excerpt service that benefits all part- ners. Fourteen of the 43 survey respondents (32.6 percent) indicated they did not subscribe to enrichment services. Twelve respondents (27.9 percent) indicated they had one or more enrichment services and that the costs were fully funded by the main institution. Seventeen respon- dents (39.5 percent) subscribe to enrichment services and that the costs are shared. Several indicated the existing cost-sharing formula used for other assessments (annual maintenance, hardware, or nonsubscription-based soft- ware) is also used for the ongoing enrichment services. One respondent indicated they maintain a collective fund for enrichment services and estimate the cost of all shared subscriptions; this figure is integrated into the share each institution contributes to the central fund annually. One respondent indicated that their system only uses free enrichment services. ■■ Support The next section of the survey addressed staff support efforts related to management of the ILS. Twenty years ago when UNLV installed its ILS, staff support included one librarian and one additional staff; both focused on various aspects of system support, from maintaining hardware to working with the vendor, in addition to having other primary job responsibilities completely unrelated to the ILS. In addition, over time, functional experts developed for particular modules of the system, such as catalog- ing, acquisitions, circulation, and serials control. This group of functional experts eventually became known as the UNLV Innovative Module Coordinators Group, which was chaired by the head of the Library Systems Department. This group met quarterly and included experts from UNLV as well as one representative from 68 inFormAtion tecHnoloGY AnD liBrAries | June 2011 solely by the main library. Typical system administration activities include managing and executing mid-release and major release software upgrades (95.2 percent of all respondents indicated the main library is solely responsible); managing, coordinating, and scheduling new products for installation (95.2 percent); monitoring disk space (95 percent); and scheduling and monitoring backups (92.9 percent). UNLV’s ILS support model is very similar to the survey results. The Systems Librarian at UNLV manages all software upgrades, as well as coordinating and scheduling new ILS software product and module installs. The Library Technologies Division monitors and schedules the nightly backups and disk- space usage. Certain UNLV Libraries staff and selected individuals from the partner libraries are authorized to open support calls with the system vendor, although the Systems Librarian often handles this activity herself. Other functions, such as maintaining the year-to-date and last year circulation statistics are also performed by the UNLV Libraries Systems Librarian. Updating circula- tion parameters are tasks best performed by each of the created a list of 20 duties related to ILS system adminis- tration and asked respondents to indicate whether: the main library or a central consortial or cooperative office dedicated to the ILS handles this particular duty; the duty is shared between the main library and partner librar- ies; or the duty is handled by just a partner library. As illustrated in figure 3, the survey results overwhelmingly show that the main library in a shared system provides the majority of system administration support. Only two tasks were broadly shared between the main library and partner libraries; maintenance of the institution’s records (bibliographic, item, patron, order, etc.) and maintaining network and label printers. Other shared tasks included changes to the circulation parameters tables (e.g., con- figuring loan rules and specifying open hours and days closed tables for materials they themselves circulate) with 40.5 percent of the respondents indicating this as a shared responsibility, opening support calls with the vendor (38.1 percent), monitoring bounced export and FTS mail (33.3 percent), and account management (31 percent). The more typical system administration activities are done A c c o u n t m a n a g e m e n t (c re a te n e w / d e le te a c c o u n ts ; M ill e n n iu m a u th o ri za ti o n s) M a n a g e a n d e x e c u te I n n o va ti ve m id -r e le a se a n d m a jo r re le a se s o ft w a re u p g ra d e s M a n a g e , c o o rd in a te a n d s c h e d u le n e w In n o va ti ve s o ft w a re p ro d u c t in st a lla ti o n s S c h e d u le a n d m o n it o r b a c k u p s W ri te s c ri p ts t o a u to m a te p ro c e ss e s (i. e ., c ir c u la ti o n o ve rr id e s re p o rt , sy st e m s ta tu s re p o rt s, e tc .) P e rf o rm r e vi e w f ile m a in te n a n c e a n d t a k e a c ti o n s h o u ld a ll fi le s fi ll O p e n s u p p o rt c a lls w it h I n n o va ti ve M o n it o r st a tu s o f o p e n c a lls ; se rv e a s lia is o n w it h I n n o va ti ve f o r re so lu ti o n o f su p p o rt c a lls M a in ta in Y e a r- to -D a te /L a st Y e a r c ir c u la ti o n st a ti st ic c o u n te rs M o n it o r sy st e m m e ss a g e s M o n it o r d is k s p a c e u sa g e M o n it o r b o u n c e d e x p o rt a n d F T S m a il M a in ta in c o d e t a b le s (f ix e d l e n g th , va ri a b le le n g th , e tc .) U p d a te C ir c u la ti o n P a ra m e te rs t a b le s (lo a n ru le s, h o u rs o p e n , d a ys c lo se d , e tc .) S e t u p , m o n it o r a n d t ro u b le sh o o t n o ti c e s is su e s W ri te o r m o d if y lo a d t a b le s fo r n e w r e c o rd lo a d in g M a in ta in s ys te m p ri n te rs ( la b e l, n e tw o rk e d la se r p ri n te rs ) P ro vi d e m a in te n a n c e o n r e c o rd s (p a tr o n , b ib , it e m , e tc .) M a n a g e s ys te m s e c u ri ty t h ro u g h I n n o va ti ve sy st e m s e tt in g s a n d /o r h o st b a se d o r n e tw o rk b a se d f ir e w a lls P ro vi d e e m e rg e n c y (o ff h o u rs ) re sp o n se t o re p o rt s o f In n o va ti ve d o w n ti m e o r se rv e r h a rd w a re f a ilu re s Figure 3. Systems Administration / Support Responsibilities mAnAGement AnD support oF sHAreD inteGrAteD liBrArY sYstems | VAuGHAn AnD costello 69 and definition of policies and procedures. Some groups provide recommendations to a larger executive board for the consortia. The meeting frequency of these groups is as varied as the libraries. Some groups meet quarterly (33.3 percent) or monthly (20 percent) but the majority meet at other frequencies (40 percent), such as every other month or twice a year. Some libraries use e-mail to communi- cate as opposed to having regular in-person meetings. In addition to a standing committee focused on the ILS, and similar to UNLV’s experience, libraries may have finite working groups to implement particular products. ■■ Training, Professional Development, and Planning The survey also focused on training, professional devel- opment, and planning activities related to the ILS. There are many methods that library staff can use to stay current with their ILS. Most training methods typically include in-person workshops or online tutorials, as well as other venues for professional development, such as conference attendance. The authors were interested in how libraries sharing an ILS determined training needs and who was responsible for the training. The survey results showed that libraries value a variety of training opportunities, partner institutions, with advice and assistance as neces- sary provided by the Systems Librarian. The authors were interested if an ILS oversight body exists with other shared systems, and, if so, what issues are discussed. Responses indicated that a variety of groups exist, and, in some instances, multiple groups may exist within one consortia (some groups have a more specific ILS focus and others a more tangential involvement). As illustrated in figure 4, a minority of respondents, 11 of 41 (26.8 percent), indicated that they do not have a group providing ILS oversight. If such a group exists, respondents were allowed to select various predefined duties performed by that group. Twenty-three respondents indicated the group discusses purchasing decisions. Respondents also indicated that such a group also discusses the impact of the vendor enhancements offered by mid-release and regular full-releases (19), and when to schedule the upgrades (12). The absence of an oversight group doesn’t imply that consultation doesn’t occur, rather, it may be the responsibility of an individual as opposed to an effort coordinated by a group. Some libraries also have module-driven committees, which disseminate information, introduce new ideas, and try to promote cohesiveness throughout the consortium. Other duties that such an oversight group may focus on include workflow issues, discussion of system issues, Figure 4. Issues Discussed By ILS Oversight Body Updates on unresolved problem calls with Innovative Discussion on enhancements offered by mid-release and regular full release software upgrades and their impact (positive/ negative) on users of the system Scheduling mid-release/ full release software upgrades Prioritizing and selecting choices related to the Innovative User’s Group enhancements ballot for your installation Discussion of potential new software/ modules to purchase from Innovative N/A—an oversight group, body, or committee does not exist related to the oversight of the Innovative system Other 70 inFormAtion tecHnoloGY AnD liBrAries | June 2011 specifically regarding cost sharing, support, and rights and responsibilities. In conducting this background research, a paucity of published literature was observed, and thus the authors hope the findings above may help other established consortia, who may be interested in review- ing or tweaking their current MOUs or more formalized agreements likely in place. It may also provide some considerations for libraries considering initiating a shared ILS instance, something that, given the current recession, may be a topic to consider. Given that nearly a decade has passed since the original UNLV MOU was drafted and agreed to, several revisions will be proposed and drafted. This includes formalization of how costs are divided for enrichment services (new since the original MOU), and formalization in writing of the coordination role of the Systems Librarian in her capacity as chief manager of the ILS. Other ideas gathered from survey responses are worth consideration, such as a base additional fee con- tributed each year (above and beyond the fee accessed as determined by staff licenses). Such a fee could help recoup real, sometimes significant costs associated with the system, such as the purchase of additional software benefitting all players (often, in practice funded solely by the main library). Such a fee could also help recoup more tangential (but still real) expenses, such as replacement of backup media. However, at the time of writing, tweaking (increasing) the fee assessed to partner institutions is a delicate issue. As with many other institutions of learn- ing and their associated libraries, the Nevada System of Higher Education has been particularly hard hit with funding cuts, even when compared against serious cuts experienced by colleagues nationwide. By all measures (unemployment, state budget shortfall, foreclosures, etc.) Nevada has been one of the hardest hit states in the cur- rent recession. While knowledge gained from this survey was useful (and current), what effect it will have in chang- ing the cost structure is, now, on hold. In the spirit of support among the libraries in the same system of higher education, and in continuing to demonstrate serious shared efficiencies (by maintaining one joint system as opposed to five individual systems), no new fee structure will be implemented in the short term. At the appropriate time, different costing structures such as those elicited in the survey results will merit closer attention. References 1. Jason Vaughan, “A Library’s Integrated Online Library System: Assessment and New Hardware Implementation,” Information Technology and Libraries 23, no. 2 (June 2004): 50–57. 2. Innovative Interfaces, “About Us: History,” http://www .iii.com/about/history.shtml (accessed May 17, 2010). regardless of the library’s status. The easiest and cheapest method of awareness involves having someone monitor the IUG electronic discussion list, with 29 respondents (70.7 percent) indicating that both the main library and one or more partner libraries participate in this activity. Attendance at the national and regional IUG meetings was also valued highly by libraries with 26 respondents (66.7 percent) indicating both the main libraries and their partner libraries having a staff member attend such meet- ings in the past 5 years. Sixteen respondents (64 percent) indicated both the main library and their partner libraries regularly send staff to the American Library Association Annual Conference and Midwinter Meeting. IUG typi- cally has a meeting the Friday before the Midwinter Meeting. Attendance at training workshops held at the vendor headquarters, as well as online training, is an activity in which the main library participates more frequently than the partner libraries (61.1 percent). Complete survey results are provided in appendix A, available at http://www.lita.org/ala/mgrps/divs/lita/ ital/302011/3002jun/pdf/vaughan_app.pdf. ■■ Research Summary and Future Directions Integrated library systems shared by multiple partners hold the promise of shared efficiencies. Given a rather significant number of responses, shared systems appear to be quite common, ranging from a few partners to sys- tems with many partners. Perhaps reflecting this, shared systems range from loose federations of library partners to shared systems managed by a more formalized, official consortium. A majority of libraries with shared systems have a MOU or other official documents to help define the nature of the relationship, focusing on such topics as budgeting, payments, and funding formulas; general governance and voting matters; support; and equipment. Most libraries sharing a system have a method or fund- ing formula outlining how the ILS is funded on an annual basis and the contributions provided by each partner. Such methods can include not only annual maintenance, but also the procurement of new hardware and software extending the system capabilities. While many support functions are carried out by a central office or staff at the main library hosting the shared system, partner libraries often participate in annual user group and library associa- tion conferences where they help stay abreast of vendor ILS developments. The research above describes the authors’ investi- gations into management of shared integrated library systems. In particular, the authors were interested in how other consortia sharing an ILS managed their system, 3006 ---- communicAtions | mAceli, WieDenBecK, AnD ABels 71BeniGn neGlect: DeVelopinG liFe rAFts For DiGitAl content | DeriDDer 71 in accordance with the current best practices.8 For those producers of content who are not able to meet the require- ments of ingest, or who do not have access to an OAIS archive provider, what are the options? With the recent downturn in the economy, the avail- ability of staff and the funding for the support of digital libraries has no doubt left many collections at risk of abandonment. Is there a method for preparation of content for long- term storage that is within the reach of existing staff with few technical skills? If the content cannot get to the safe harbor of a trusted digital library, is it consigned to extinction? Or are there steps we can take to mitigate the potential loss? The OAIS model incorporates six functional entities: ingest, data management, administration, pres- ervation planning, archival storage, and access.9 Of these six, only archi- val storage is primary; all the others are useless without the actual content. And if the content cannot be accessed in some form, the storage of it may also be useless. Therefore the mini- mal components that must be met are those of archival storage and some form of access. The lowest cost and simplest option for archival storage currently available is the distribution of multiple copies dispersed across a geographical area, preferably on different platforms, as recommended by the current LOCKSS initiative,10 which focuses on bit-level preser- vation.11 Private LOCKSS Network models (such as the Alabama Digital Preservation Network)12 are the lowest-cost implementation, requir- ing only hardware, membership in LOCKSS, and a small amount of time and technical expertise. Reduction of the six functional entities to only two negates the need In contrast, other leaders of the digital preservation movement have been stating for years that benign neglect is not a workable solution for digital materials. Eric Van de Velde, director of Caltech’s Library Information Technology Group, stated that the “digital archive must be actively managed.”3 Tom Cramer of Stanford University agrees: “Benign neglect doesn’t work for digital objects. Preservation requires active, managed care.”4 The Digital Preservation Europe website argues that benign neglect of digital con- tent “is almost a guarantee that it will be inaccessible in the future.”5 Abby Smith goes so far as to say that “neglect of digital data is a death sentence.”6 Arguments to support this state- ment are primarily those of media or data carrier storage fragility and obsolescence of hardware, software, and format. However, the impact of these arguments can be reduced to a manageable nightmare. By removing as much as possible of the interme- diate systems, storing open-source code for the software and operating system needed for access to the digi- tized content, and locating archival content directly on the file system itself, we reduce the problems to primarily that of format obsoles- cence. This approach will enable us to forge ahead in the face of our lack of resources and our rather desperate need for rapid, cheap, and pragmatic solutions. Current long-term preserva- tion archives operating within the Open Archival Information System (OAIS) model assume that produc- ers can meet the requirements of ingest.7 However, the amount of content that needs to be deposited into archives and the expanding variety of formats and genres that are unsupported, are overwhelming the ability of depositors to prepare content for preservation. Andrea Goethals of Harvard proposed that we revisit assumptions of producer ability to prepare content for deposit Benign Neglect: Developing Life Rafts for Digital Content I n his keynote speech at the Archiving 2009 Conference in Arlington, Virginia, Clifford Lynch called for the development of a benign neglect model for digital preserva- tion, one in which as much content as possible is stored in whatever manner available in hopes of there someday being enough resources to more properly preserve it. This is an acknowledgment of current resource limitations relative to the burgeon- ing quantities of digital content that need to be preserved. We need low cost, scalable methods to store and preserve materials. Over the past few years, a tremendous amount of time and energy has, sensibly, been devoted to developing standards and methods for best practices. However, a short survey of some of the leading efforts clarifies for even the casual observer that implementation of the proposed standards is beyond many of those who are creating or hosting digital content, particularly because of restrictions on acceptable formats, requirements for extensive metadata in specific XML encodings, need for programmers for implementation, costs for participation, or simply a lack of a clear set of steps for the unini- tiated to follow (examples include: Planets, PREMIS, DCC, CASPAR, iRods, Sound Directions, HathiTrust).1 The deluge of digital content coupled with the lack of funding for digital preservation and exacerbated by the expanding variety of formats, makes the application of extensive standards and extraordinary techniques beyond the reach of the majority. Given the current circumstances, Lynch says, either we can seek perfection and store very little, or we can be sloppy and preserve more, discarding what is simply intractable.2 Communications Jody l. Deridder (jlderidder@ua.edu) is head, digital Services, university of alabama. Jody L. DeRidder 72 inFormAtion tecHnoloGY AnD liBrAries | June 2011 during digitization is that develop- ing digital libraries usually have a highly chaotic disorganization of files, directory structures, and meta- data that impede digital preservation readiness.19 If the archival digital files cannot be easily and readily associ- ated with the metadata that provides their context, and if the files them- selves are not organized in a fashion that makes their relationships trans- parent, reconstruction of delivery at some future point is seriously in ques- tion. Underfunded cultural heritage institutions need clear specifications for file organization and preparation that they are capable of meeting with- out programming staff or extensive time commitments. Particularly in the current economic downturn, few institutions have the technical skills to create METS wrappers to clarify file relationships.20 One potential solution is to use the organization of files in the file sys- tem itself to communicate clearly to future archivists how the files relate to one another. At the University of Alabama, we have adopted a stan- dardized file naming system that organizes content by the holding institution and type, collection, item, and then sequence of delivery (see figure 1). The file names are echoed in the file system: top level directories match the holding institution number sequence, secondary level directory names match the assigned collection number sequence, and so forth. Metadata and documentation are stored at whatever level in the file system corresponds to the files to which they apply, and these text and XML files have file names that also correspond to the files to which they apply, which assists further in identi- fication (see figure 2).21 By both naming and ordering the files according to the same system, and bypassing the need for databases, complex metadata schemes and soft- ware, we leverage the simplicity of the file system to bring order to chaos and to enable our content to be eas- ily reconstructed by future systems. take and manage the content is still uncertain. The relay principle states that a preservation system should support its own migration. Preserving any type of digital information requires preserving the information’s context so that it can be interpreted cor- rectly. This seems to indicate that both the intellectual context and the logical context need to be provided. Context may include provenance information to verify authenticity, integrity, and interpretation;17 it may include structural information about the organization of the digital files and how they relate to one another; and it should certainly include docu- mentation about why this content is important, for whom, and how it may be used (including access restrictions). Because the cost of continued migration of content is very high, a method of mitigating that cost is to allow content to become obsolete but to support sufficient metadata and contextual information to be able to resurrect full access and use at some future time—the resurrection prin- ciple. To be able to resurrect obsolete materials, it would be advisable to store the content with open-source software that can render it, an open- source operating system that can support the software, and separate plain-text instructions for how to reconstruct delivery. In addition, underlying assumptions of the stor- age device itself need to be made explicit if possible (type of file system partition, supported length of file names, character encodings, inode information locations, etc.). Some of the need for this form of preserva- tion may be diminished through such efforts as the Planets TimeCapsule Deposit.18 This consortium has gath- ered the supporting software and information necessary to access cur- rent common types of digital files (such as PDF), for long-term storage in Swiss Fort Knox. One of the drawbacks to gather- ing and storing content developed for a tremendous amount of meta- data collection. Where the focus has been on what is the best metadata to collect, the question becomes: What is the minimal metadata and contextual information needed? The following is an attempt to begin this conversation in the hope that debate will clarify and distill the absolutely necessary and specific requirements to enable long-term access with the lowest possible barrier to implemen- tation. If we consider the purpose of preservation to be solely that of ensuring long-term access, it is possi- ble to selectively identify information for inclusion. The recent proposal by the researchers of The National Geospatial Digital Archive (NGDA) may help to direct our focus. They have defined three architectural design principles that are necessary to preserve content over time: the fallback principle, the relay principle, and the resurrection principle.13 In the event that the system itself is no longer functional, then a preser- vation system should support some form of hand-off of its content—the fallback principle. This can be met by involvement in LOCKSS, as specified above. Lacking the ability to support even this, current creators and hosts of digital content may be at the mercy of political or private support for ingest into trusted digital repositories.14 The recently developed BagIt File Package Format includes valuable information to ensure uncorrupted transfer for incorporation into such an archive.15 Each base directory containing digital files is considered a bag, and the con- tents can be any types of files in any organization or naming convention; the software tags the content (or pay- load) with checksums and manifest, and bundles it into a single archive file for transfer and storage. An easily usable tool to create these manifests has already been developed to assist underfunded cultural heritage orga- nizations in preparing content for a hosting institution or government infrastructure willing to preserve the content.16 The gap of who would communicAtions | DeriDDer 73BeniGn neGlect: DeVelopinG liFe rAFts For DiGitAl content | DeriDDer 73 Clifford Lynch pointed out, funding cutbacks at the sub-federal level are destroying access and preservation of government records; corporate records are winding up in the trash; news is lost daily; and personal and cultural heritage materials are disap- pearing as we speak.24 It is valuable and necessary to determine best prac- tices and to seek to employ them to retain as much of the cultural and historical record as possible, and in an ideal world, these practices would be applied to all valuable digital con- tent. But in the practical and largely resource-constrained world of most libraries and other cultural institu- tions, this is not feasible. The scale of content creation, the variety and geo- graphic dispersal of materials, and the cost of preparation and support makes it impossible for this level of attention to be applied to the bulk of what must be saved. For our cultural memory from this period to survive, we need to communicate simple, clear, scalable, inexpensive options to digital holders and creators. References 1. Planets Consortium, Planets Preservation and Long-Term Access Through Networked Services, http:// www.planets-project.eu/ (accessed Mar. 29, 2011); Library of Congress, PREMIS (Preservation Metadata Maintenance Activity), http://www.loc.gov/stan- dards/premis/ (accessed Mar. 29, 2011); DCC (Digital Curation Centre), http:// www.dcc.ac.uk/ (accessed Mar. 29, 2011); CASPAR (Cultural, Artistic, and Scientific Knowledge for Preservation, Access, and Retrieval), http://www.casparpreserves .eu/ (accessed Mar. 29, 2011); IRODS (Integrated Rule-Oriented Data System), h t t p s : / / w w w. i ro d s . o rg / i n d e x . p h p / IRODS:Data_Grids,_Digital_Libraries,_ Persistent_Archives,_and_Real-time_ Data_Systems (accessed Mar. 29, 2011); Mike Casey and Bruce Gordon, Sound Directions: Best Practices for Audio Preservation, http://www.dlib.indiana .edu/projects/sounddirections/papers Present/sd_bp_07.pdf (accessed June 14, 2010); HathiTrust: A Shared Digital online delivery of cached derivatives and metadata, as well as webcrawler- enabled content to expand accessibility. This model of online delivery will enable low cost, scalable development of digi- tal libraries by simply ordering content within the archival storage location. Providing sim- ple, clear, accessible methods of preparing content for preserva- tion, of duplicating archival treasures in LOCKSS, and of web accessibility without excessive cost or deep web database storage of content, will enable underfunded cultural heritage institutions to help ensure that their content will con- tinue to survive the current preservation challenges. As David Seaman pointed out, the more a digital item is used, the more it is copied and handled, the more it will be preserved.23 Focusing on archival storage (via LOCKSS) and accessibil- ity of content fulfills the two most primary OAIS functional capabilities and provides a life raft option for those who are not currently able to surmount the forbidding tsunami of requirements being drafted as best practices for preservation. The importance of offering feasi- ble options for the continued support of the long tail of digitized content cannot be overstated. While the heav- ily funded centers may be able to preserve much of the content under their purview, this is only a small frac- tion of the valuable digitized material currently facing dissolution in the black hole of our cultural memory. As While no programmers are needed to organize content into such a clear, consistent, and standardized order, we are developing scripts that will assist others who seek to follow this path. These scripts not only order the content, they also create LOCKSS manifests at each level of the con- tent, down to the collection level, so that the archived material is ready for LOCKSS pickup. A standardized LOCKSS plugin for this method is available. To assist in providing access without a storage database, we are also developing an open-source web delivery system (Acumen),22 which dynamically collects con- tent from this protected archival storage arrangement (or from web- accessible directories) and provides Figure 1. University of Alabama Libraries Digital File Naming Scheme (©2009. Used with permission.) Figure 2. University of Alabama Libraries Metadata Organization (©2009. Used with permission.) 74 inFormAtion tecHnoloGY AnD liBrAries | June 2011 .org/documents/domain-range/index .shtml#ProvenanceStatement (accessed July 18, 2009). 18. Planets Consortium, Planets Time Capsule—A Showcase for Digital Preservation, http://www.ifs.tuwien .ac.at/dp/timecapsule/ (accessed June 14, 2010). 19. Martin Halbert, Katherine Skinner, and Gail McMillan, “Avoiding the Calf- Path: Digital Reservation Readiness for Growing Collections and Distributed Preservation Networks,” Archiving 2009 (May 2009): 6. 20. Library of Congress, Metadata Encoding and Transmission Standard (METS), http://www.loc.gov/standards/ mets. 21. Jody L. DeRidder, “From Confusion and Chaos to Clarity and Hope,” in Digitization in the Real World: Lessons Learned from Small to Medium-Sized Digitization Projects, ed. Kwong Bor Ng and Jason Kucsma, (Metropolitan New York Library Council, N.Y., 2010). 22. Tonio Loewald and Jody DeRidder, “Metadata In, Library Out. A Simple, Robust Digital Library System,” Code4Lib Journal 10 (2010), http://journal.code4lib .org/articles/3107 (accessed Aug. 29, 2010). 23. David Seaman “The DLF Today” (keynote presentation, 2004 Symposium on Open Access and Digital Preservation, Atlanta, Ga.), paraphrased by Eric Lease Morgan in Musings on Information and Librarianship, http://infomotions.com/ m u s i n g s / o p e n - a c c e s s - s y m p o s i u m / (accessed Aug. 9, 2009). 24. Lynch, Challenges and Opportunities. 9. Consultative Committee for Space Data Systems, Reference Model. 10. Stanford University et al., Lots Of Copies Keep Stuff Safe (LOCKSS), http:// www.lockss.org/lockss/Home (accessed Mar. 29, 2011). 11. David S. Rosenthal et al., “Requirements for Digital Preservation Systems: A Bottom-Up Approach,” D-Lib Magazine 11 (Nov. 2005): 11, http:// w w w. d l i b . o r g / d l i b / n o v e m b e r 0 5 / rosenthal/11rosenthal.html (accessed June 14, 2010). 12. Alabama Digital Preservation Network (ADPNet), http://www.adpn .org/ (accessed Mar. 29, 2011). 13. Greg Janée, “Preserving Geospatial Data: The National Geospatial Digital Archive’s Approach,” Archiving 2009 (May 2009): 6. 14. Research Libraries Group/OCLC, Trusted Digital Repositories: Attributes and Responsibilities, http://www .oclc.org/programs/ourwork/past/ trustedrep/repositories.pdf (accessed July 17, 2009). 15. Andy Boyko et al., The BagIt File Packaging Format (0.96) (NDIIPP Content Transfer Project), http://www.digital preservation.gov/library/resources/ tools/docs/bagitspec.pdf (accessed July 18, 2009). 16. Library of Congress, BagIt Library, http://www.digitalpreservation.gov/ partners/resources/tools/index.html#b (accessed June 14, 2010). 17. Andy Powell, Pete Johnston, and Thomas Baker, “Domains and Ranges for DCMI Properties: Definition of the DCMI term Provenance,” http://dublincore Repository, http://www.hathitrust.org/ (accessed Mar. 29, 2011). 2. Clifford Lynch, Challenges and Opportunities for Digital Stewardship in the Era of Hope and Crisis (keynote speech, IS&T Archiving 2009 Conference, Arlington, Va., May 2009). 3. Jane Deitrich, e-Journals: Do-It- Yourself Publishing, http://eands .caltech.edu/articles/E%20journals/ ejournals5.html (accessed Aug. 9, 2009). 4. Tom Cramer, quoted in Art Pasquinelli, “Digital Libraries and Repositories: Issues and Trends” (Sun Microsystems presentation at the Summit Bibliotheken, Universitäsbibliothek Kassel, 18–19, Mar. 2009), slide 12, http:// de.sun.com/sunnews/events/2009/ bibsummit/pdf/2-art-pasquinelli.pdf (accessed July 12, 2009). 5. Digital Preservation Europe, What is Digital Preservation? http://www.digi talpreservationeurope.eu/what-is-digi tal-preservation/ (accessed June 14, 2010). 6. Abby Smith, “Preservation,” in Susan Schreibman, Ray Siemens, John Unsworth, eds., A Companion to Digital Humanities (Oxford: Blackwell, 2004), http://www.digitalhumanities.org/com panion/ (accessed June 14, 2010). 7. Consultative Committee for Space Data Systems, Reference Model for an Open Archival System (OAIS), CCSDS 650.0-B-1 Blue Book, Jan. 2002, http://public.ccsds .org/publications/archive/650x0b1.pdf (accessed June 14, 2010). 8. Andrea Goethals, “Meeting the Preservation Demand Responsibly = Lowering the Ingest Bar?” Archiving 2009 (May 2009): 6. 3007 ---- : | zHAnG et Al. 75seeinG tHe WooD For tHe trees | zHAnG et Al. 75 Here again, no weighting or dif- ferentiating mechanism is included in describing the multiple elements. What is addressed is the “what” prob- lem: What is the work of or about? Metadata schemas for images and art works such as VRA Core and CDWA focus on specificity and exhaustivity of indexing, that is, the precision and quantity of terms applied to a subject element. However, these schemas do not address the question of how much the work is of or about the item or concept represented by a particular keyword. Recently, social tagging functions have been adopted in digital library and catalog systems to help support better searching and browsing. This introduces more subject terms into the system. Yet again, there is typi- cally no mechanism to differentiate between the tags used for any given item, except for only a few sites that make use of tag frequency informa- tion in the search interfaces. As collections grow and more federated searching is carried out, the absence of weights for subject terms can cause problems in search and navigation. The following examples illustrate the problems, and the rest of the paper further reviews and discusses the precedent research and practice on weighting, and further outlines the issues that are critical in applying a weighting mechanism. example, the Dublin Core Metadata Element Set recommends the use of controlled vocabulary to represent subject in “keywords, key phrases, or classification codes.”1 Similarly, the Library of Congress practice, sug- gested in the Subject Headings Manual, is to assign “one or more subject headings that best summarize the overall contents of the work and provide access to its most important topics.”2 A topic is only “important enough” to be given a subject head- ing if it comprises at least 20 percent of a work, except for headings of named entities, which do not need to be 20 percent of the work when they are “critical to the subject of the work as a whole.”3 Although catalogers are aware of it when they assign terms, this weight information is left out of the current library metadata schemas and practice. A similar practice applies in non-textual object subject indexing. Because of the difficulty of selecting words to represent visual/aural sym- bolism, subject indexing for art and cultural objects is usually guided by Panofsky’s three levels of meaning (pre-iconographical, iconographical, and post-iconographical), further refined by Layne in “ofness” and “aboutness” in each level. Specifically, what can be indexed includes the “ofness” (what the picture depicts) as well as some “aboutness” (what is expressed in the picture) in both pre–iconographical and iconographi- cal levels.4 In practice, VRA Core 4.0 for example defines subject subele- ments as: Terms or phrases that describe, identify, or interpret the Work or Image and what it depicts or expresses. These may include generic terms that describe the work and the elements that it comprises, terms that identify particular people, geographic places, narrative and icono- graphic themes, or terms that refer to broader concepts or interpretations.5 Seeing the Wood for the Trees: Enhancing Metadata Subject Elements with Weights Subject indexing has been conducted in a dichotomous way in terms of what the information object is primar- ily about/of or not, corresponding to the presence or absence of a particular subject term, respectively. With more subject terms brought into informa- tion systems via social tagging, man- ual cataloging, or automated indexing, many more partially relevant results can be retrieved. Using examples from digital image collections and online library catalog systems, we explore the problem and advocate for adding a weighting mechanism to subject indexing and tagging to make web search and navigation more effec- tive and efficient. We argue that the weighting of subject terms is more important than ever in today’s world of growing collections, more federated searching, and expansion of social tagging. Such a weighting mechanism needs to be considered and applied not only by indexers, catalogers, and tag- gers, but also needs to be incorporated into system functionality and meta- data schemas. S ubjects as important access points have largely been indexed in a dichotomous way: what the object is primarily about/ of or not. This approach to index- ing is implicitly assumed in various guidelines for subject indexing. For Hong Zhang, Linda C. Smith, Michael Twidale, and Fang Huang GaoCommunications Hong zhang (hzhang1@illinois.edu) is Phd candidate, graduate School of Library and information Science, university of illinois at urbana-champaign, linda c. smith (lcsmith@illinois.edu) is Professor, graduate School of Library and information Science, university of illinois at urbana-champaign, michael twidale (twidale@illinois.edu) is Professor, graduate School of Library and information Science, university of illinois at urbana-champaign, and Fang Huang Gao (fgao@gpo.gov) is Supervisory Librarian, government Printing office. 76 inFormAtion tecHnoloGY AnD liBrAries | June 2011 ■■ Examples of Problems exhaustive indexing: Digital library collections A search query of “tree” can return thousands of images in several dig- ital library collections. The results include images with a tree or trees as primary components mixed with images where a tree or trees, although definitely present, are minor compo- nents of the image. Figure 1 illustrates the point. These examples come from three different collections and either include the subject element of “tree” or are tagged with “tree” by users. There is no mechanism that catalog- ers or users have available to indicate that “tree” in these images is a minor component. Note that we are not calling this out as an error in the profession- ally developed subject terms, nor indeed in the end user generated tags. Although particular images may have an incorrectly applied key- word, we want to talk about the vast majority where the keyword quite correctly refers to a component of the image. Furthermore, such keywords referring to minor components of the image are extremely useful for other queries. This kind of exhaustive indexing of images enables the effec- tive satisfaction of search needs, such as looking for pictures of “buildings, people, and trees” or “trees beside a river.” With large image collections, such compound needs become more important to satisfy by combinations of searching and browsing. To enable them, metadata about minor subjects is essential. However, without weights to dif- ferentiate subject keywords, users will get overwhelmed with partially relevant results. For example, a user looking for images of trees (i.e., “tree” as the primary subject) would have to look through large sets of results such as a photograph of a dog with a tiny tree out of focus in the background. For some items that include rich metadata, such as title or description, when people look at a particular item’s record, with the title and some- times the description, we may very well determine that the picture is primarily of, say, a dog instead of trees. That is, the subject elements have to be interpreted based on the context of other elements in the record to convey the “primary” and “peripheral” subjects among the listed subject terms. However, in a search and navigation system where subject elements are usually treated as context-free, search efficiency will be largely impaired because of the “noise” items and inability to refine the scope, especially when the vol- ume of items grows. Lack of weighting also limits other potential uses of keywords or tags. For example, all the tags of all the items in a collection can be used to create a tag cloud as a low cost way to contribute to a visualization of what a collection is “about” over- all.6 Unfortunately, a laboriously developed set of exhaustive tags, although valuable for supporting searching and browsing within a large image collection, could give a very distorted overview of what the whole collection is about. Extending our example, the tag “tree” may occur so frequently and be so promi- nent in the tag cloud that a user infers that this is mostly a botanical collection. selective indexing: lcsH in library catalogs Although more extreme in the case of images in conveying the “ofness,” the same problem with multiple sub- jects also applies to text in terms of “aboutness.” The following example comes from an online library catalog in a faceted navigation web interface using Library of Congress Subject Headings in subject cataloging.7 The query “psychoanalysis and religion” returned 158 results, with 126 in “psychoanalysis and religion” under the Topic facet. According to the Subject Headings Manual, the first subject is always the primary one, while the second and others could be either a primary or nonprimary subject.8 This means that among these 126 books, there is no easy way to tell which books are “primarily” about “psychoanalysis and religion” unless the user goes through all of them. With the pro- vided metadata, we do know that all books that have “psychoanalysis and religion” as the first subject heading are primarily about this topic, but a book that has this same heading as its second subject head- ing may or may not be primarily about this topic. There is no way to indicate which it is in the metadata, nor in the search interface. As this example shows, the Library of Congress manual involves an attempt to acknowledge and make a distinction between primary and nonprimary subjects. However in practice the attempt is insufficient to be really useful since apart from the first entry, it is ambiguous whether subsequent entries are additional primary subjects or nonprimary sub- jects. Consequently, the search system and, further on, the users are not able to take full advantage of the care of a cataloger in deciding whether an additional subject is primary or not. other information retrieval systems The negative effect of current sub- ject indexing without weighting on search outcomes has been identified by some researchers on particular information retrieval systems. In a study examining “the contribution of metadata to effective searching,”9 Hawking and Zobel found that the available subject metadata are “of little value in ranking answers” to search queries.10 Their explanation is that “it is difficult to indicate via metadata tagging the relative importance of a page to a particular topic,”11 in addition to the prob- lems in data quality and system implementation. The same problem : | zHAnG et Al. 77seeinG tHe WooD For tHe trees | zHAnG et Al. 77 authors compared with the automatic indexing systems, because human indexers should be bet- ter at weighting the significance of subjects, and be more able to distinguish between important and peripheral compared with computers that base signifi- cance on term frequency.13 Indeed, while various weight- ing algorithms have been used in automatic indexing systems to approximate the distinguishing function, there is simply no such mechanism built in human subject the particular page harder to find.12 A similar problem is reported in a recent study by Lykke and Eslau. In comparing searching by controlled subject metadata, search- ing based on automatic indexing, and searching based on automatic indexing expanded with a corporate thesaurus in an enterprise electronic document management system, the authors found that the metadata searches produced the lowest pre- cision among the three strategies. The problem of indiscriminate meta- data indexing is “remarkable” to the of multiple tags without weights is described: In the kinds of queries we have studied, there is typically one page (or at most a small num- ber) that is particularly valu- able. There are many other pages which could be said to be relevant to the query—and thus merit a metadata match—but they are not nearly so useful for a typical searcher. Under the assumption that metadata is needed for search, all of these pages should have the relevant metadata tag, but this makes A. Subject: women; books; dresses; flowers; trees; . . . In: Victoria & Albert Museum (accessed Aug. 30, 2010), http://collections.vam.ac.uk/item/014962/oil-painting-the-day-dream B. Tags: Japanese; moon; nights; walking; tree; . . . In: Brooklyn Museum (accessed Aug. 30, 2010), http://www.brooklynmuseum.org/opencollections/objects/121725/Aoi_Slope_Outside_Toranomon_Gate_No._113_from_ One_Hundred_Famous_Views_of_Edo C. Tags: Japanese; birds; silk; waterfall; tree; . . . In: Steve: The Museum Social Tagging Project (accessed Aug. 30, 2010), http://tagger.steve.museum/steve/object/15?offset=2 Figure 1. Example Images with “tree” as a Subject Item 78 inFormAtion tecHnoloGY AnD liBrAries | June 2011 Anderson in NISO TR021997.20 In addition, researchers have noticed the limitations of this dichoto- mous indexing. In an opinion piece, Markey emphasizes the urgency to “replace Boolean-based catalogs with post-Boolean probabilistic retrieval methods,”21 especially given the chal- lenges library systems are faced with today. It is the time to change the Boolean, i.e., dichotomous, practice of subject indexing and cataloging, no matter whether it is produced by professional librarians, by user tag- ging, or by an automatic mechanism. Indeed, as declared by Svenonius, “While the purpose of an index is to point, the pointing cannot be done indiscriminately.”22 Needed Refinements in Subject Indexing The fact that weighted indexing has become more prominently needed over the past decade may be related to the shift in the continuum from subject indexing as representation/ surrogate to subject indexing as access points, which is consistent with the shift from a small number of subject terms to more subject terms. This might explain why the weight- ing practice is applied in the above mentioned MEDLINE/PubMed system. With web-based systems, social tagging technology, federated searching, and the growing number of collections producing more subject terms, to distinguish between them has become a prominent problem. In reviewing information users and use from the 1920s to the present, Miksa points out the trend to “more granular access to informational objects” “by viewing documents as having many diverse subjects rather than one or two ‘main’ subjects,” no matter what the social and tech- nical environment has been.23 In recognizing this theme in the future development of information organi- zation and retrieval systems, we argue that the subject indexing mechanism subject indexing has been discussed in the research area of subject analy- sis for some time. Weighting gives indexing an increased granularity and can be a device to counteract the effect of indexing specificity and exhaustivity on precision and recall, as pointed out by Foskett: Whereas specificity is a device to increase relevance at the cost of recall, exhaustivity works in the opposite direction, by increas- ing recall, but at the expense of relevance. A device which we may use to counteract this effect to some extent is weighting. In this, we try to show the signifi- cance of any particular specifi- cation by giving it a weight on a pre-established scale. For example, if we had a book on pets which dealt largely with dogs, we might give PETS a weight of 10/10, and DOGS, a weight of 8/10 or less.16 Anderson also includes weighting as a part of indexing in the Guidelines for Indexes and Related Information Retrieval Devices (NISO TR021997): One function of an index is to discriminate between major and minor treatments of particular topics or manifestations of par- ticular features.17 He also notes that a weight- ing scheme is “especially useful in high-exhaustivity indexing”18 when both peripheral and primary topics are indicated. Similarly, Fidel lists “weights” as one of the issues that should be addressed in an indexing policy.19 Metadata indexing without weighting is related to the simplified dichotomous assumption in sub- ject indexing—primarily about/of and not primarily about/of, which further leads to the dichotomous retrieval result—retrieved and not retrieved. Weighting as a mechanism to break this dichotomy is noted by metadata indexing even though human indexers are able to do the job much better than computers. Weighting: Yesterday, Today, and Future Precedent Weighting Practices Written more than thirty years ago, the final report of the Subject Access Project describes how the project researchers applied weights to the newly added subject terms extracted from tables of contents and back- of-the-book indexes. The criterion used in that project was that terms and phrases with a “ten-page range or larger” were treated as “major” ones.14 A similar mechanism was adopted in the ERIC database beginning in the 1960s, with indexes distinguishing “major” and “minor” descriptors as the result of indexing. While some search systems allowed differentia- tion of major and minor descriptors in formulating searches, others sim- ply included the distinction (with an asterisk) when displaying a record. Unfortunately, this distinguishing mechanism is no longer included in the later ERIC indexing data. A system using weighted index- ing and searching and still running today is the MEDLINE/PubMed interface. A qualifier [majr] can be used with a Medical Subject Headings (MeSH) term in a query to “search a MeSH heading which is a major topic of an article (e.g., thromboembolism[majr]).”15 In the search result page, each major MeSH topic term is denoted by an asterisk at the end. Weighting Concept and the Purpose of Indexing The weighting concept is connected with the fundamental purpose of indexing. The idea of weighting in : | zHAnG et Al. 79seeinG tHe WooD For tHe trees | zHAnG et Al. 79 user tagging and machine generated metadata, such weighting becomes more important than ever if we are to make productive use of metadata richness and still see the wood for the trees. References 1. “Dublin Core Metadata Element Set, Version 1.1,” http://dublincore.org/docu ments/dces/ (accessed Nov. 20, 2010). 2. Library of Congress, Subject Headings Manual (Washington, D.C.: Library of Congress, 2008). 3. Ibid. 4. Elaine Svenonius, “Access to Nonbook Materials: The Limits of Subject Indexing for Visual and Aural Languages,” Journal of the American Society for Information Science, 45, no. 8 (1994): 600–606. 5. “VRA Core 4.0 Element Description,” http://www.loc.gov/standards/vracore/ VRA_Core4_Element_Description.pdf (accessed Mar. 31, 2011). 6. Richard J. Urban, Michael B. Twidale, and Piotr Adamczyk, “Designing and Developing a Collections Dashboard,” In J. Trant and D. Bearman (eds). Museums and the Web 2010: Proceedings, ed. J. Trant and D. Bearman (Toronto: Archives & Museum Informatics, 2010). http://www .archimuse.com/mw2010/papers/urban/ urban.html (accessed Apr. 5, 2011). 7. “VuFind at the University of Illinois,” http://vufind.carli.illinois.edu (accessed Nov. 20, 2010). 8. Library of Congress, Subject Headings Manual. 9. David Hawking and Justin Zobel, “Does Topic Metadata Help with Web Search?” Journal of the American Society for Information Science & Technology 58, no. 5 (2007): 613–28. 10. Ibid. 11. Ibid. 12. Ibid, 625. 13. Marianne Lykke and Anna G. Eslau, “Using Thesauri in Enterprise Settings: Indexing or Query Expansion?” in The Janus faced Scholar. A Festschrift in Honour of Peter Ingwersen, ed. Birger Larsen et al. (Copenhagen: Royal School of Library & Information Science, 2010): 87–97. 14. Subject Access Project, Books Are for Use: Final Report of the Subject Access Project to the Council on Library Resources (Syracuse, N.Y.: Syracuse Univ., 1978). 15. “PubMed,” http://www.nlm.nih more than three categories or using continuous scales instead of category rating.24 Subject indexing involves a similar judgment of relevance when deciding whether to include a subject term. More sophisticated scales cer- tainly enable more useful ranking of results, but the cost of obtaining such information may rise. After the mechanism of incorpo- rating weights into subject indexing/ cataloging is developed, guidelines should be provided for indexing practice to produce consistent and good quality. Weights in Both Indexing and Retrieval System Adding weights to subject indexing/ cataloging needs to be considered and applied in three parts: (1) extend- ing metadata schemas by encoding weights in subject elements; (2) sub- ject indexing/cataloging with weight information; and (3) retrieval systems that exploit the weighting informa- tion in subject metadata elements. The mechanism will not work effec- tively in the absence of any one of them. Conclusion This paper advocates for adding a weighting mechanism to subject indexing and tagging, to enable search algorithms to be more discriminating and browsing better oriented, and thus to make it possible to provide more granular access to information. Such a weighting mechanism needs to be considered and applied not only by indexers, catalogers, and taggers, but also needs to be incorporated into system functionality. As social tagging is brought into today’s digital library collections and online library catalogs, as col- lections grow and are aggregated, and the opportunity arises for add- ing more metadata from a variety of different sources, including end should provide sufficient granular- ity to allow more granular access to information, as demonstrated in the examples in the previous section. Potential Challenges While arguing for the potential value of weights associated with subject terms, it is also important to acknowl- edge potential challenges posed by this approach. Human Judgment Treating assigned terms equally might seem to avoid the additional human judgment and the subjec- tivity of the weight levels because different catalogers may give differ- ent weight to a subject heading. We argue that assigning subject headings is itself unavoidably subjective. We are already using professional index- ers and subject catalogers to create value-added metadata in the form of subject terms. Assigning weights would be a further enhancement. On the other hand, adding a weighting mechanism into metadata schemas is independent of the issue of human indexing. No matter who will do the subject indexing or tag- ging, either professional librarians or users or possibly computers, there is a need for weight information in the metadata records. The Weighting Scale In terms of the specific mechanism of representing the weight rat- ing, we can benefit from research on weighting of index terms and on the relevance of search results. For example, the three categories of relevant, partially relevant, and non- relevant in information retrieval are similar to the major, minor, and non- present subject indexing method in the examples above. Borlund notes several retrieval studies proposing 80 inFormAtion tecHnoloGY AnD liBrAries | June 2011 22. Svenonius, “Access to Nonbook Materials,” 601. 23. Francis Miksa, “Information Organization and the Mysterious Information User,” Libraries & the Cultural Record 44, no. 3 (2009): 343–70. 24. Pia Borlund, “The Concept of Relevance in IR,” Journal of the American Society for Information Science & Technology 54, no. 10 (2003): 913–25. 18. Ibid. 19. Raya Fidel, “User-Centered Index- ing,” Journal of the American Society for Information Science 45, no. 8 (1994): 572–75. 20. Anderson, Guidelines for Indexes and Related Information Retrieval Devices, 20. 21. Karen Markey, “The Online Library Catalog: Paradise Lost and Paradise Regained?” D-Lib Magazine 13, no. 1/2 (2007). . g o v / b s d / d i s t e d / p u b m e d t u t o r i a l / 020_760.html (accessed Nov. 20, 2010). 16. A. C. Foskett, The Subject Approach to Information, 5th ed. (London: Library Association Publishing, 1996): 24. 17. James D. Anderson, Guidelines for Indexes and Related Information Retrieval Devices. NISO-TR02–1997, http:// www.niso.org/publications/tr/tr02.pdf (accessed Nov. 20, 2010): 25. 3008 ---- : | WAnG 81BuilDinG An open source institutionAl repositorY At A smAll lAW scHool liBrArY | WAnG 81 Fang WangCommunications V700 flatbed scanner, which was rec- ommended by many digitization best practices in Texas. For software, we had all the important basics such as OCR and image editing software for the project to start. For the following several months, I did extensive research on what digi- tal asset management platform would be the best solution for the law library. We had options to continue display- ing the digital collections through webpages or use a digital asset man- agement platform that would provide long-term preservation as well as retrieval functions. We made the deci- sion to go with the latter. Generally speaking, there are two types of digital asset management platforms: proprietary and open source. In some rare occasions, a library chooses to develop its own system and not to use either type of the platforms if the library has des- ignated programmers. There are pros and cons to both proprietary and open source platforms. Although set- ting up the repository is fairly quick and easy on a proprietary platform, it can be very expensive to pay annual fees for hosting and using the ser- vice. For the open source software, it may appear to be “free” up front; however, installing and customizing the repository can be very time con- suming and these solutions often lack technical and development support. There is no uniform rule for choosing a platform. It depends on what the organization wants to achieve and its own unique circumstances. I explored several popular propri- etary platforms such as CONTENTdm and Digital Commons. CONTENTdm is an OCLC product, which has a lot of capability and is especially good for displaying image collec- tions. Digital Commons is owned of the repository is ongoing; it is valuable to share the experience with other institutions who wish to set up an institutional repository of their own and also add to the knowledge- base of IR development. Institutional Repository from the Ground Up Unlike most large university librar- ies, law school libraries are usually behind on digital initiative activities because of smaller budgets, lack of staff, and fewer resources. Although institutional repositories have already become a trend for large uni- versity libraries, it still appears to be a new concept for many law school libraries. At the beginning of 2009, I was hired as the digital information management librarian to develop a digital repository for the law school library. When I arrived at Texas Tech University Law library, there was no institutional repository implemented. There were very few digital proj- ects done at the law library. One digital collection was of faculty schol- arship. This collection was displayed on a webpage with links to PDF files. Another digital project, to digi- tize and provide access to the Texas governor executive orders found in the Texas Register, was planned then disbanded because of the previous employee leaving the position. I started by looking at the digiti- zation equipment in the library. The equipment was very limited: a very old and rarely used book scanner and a sheet-fed scanner. The good thing was that the library did have extra PCs to serve as workstations. I did research on the book scanner we had and also consulted colleagues I met at various digital library conferences about it. Because the model is very outdated and has been discontinued by the vendor and thus had little value to our digitization project, I decided to get rid of the scanner. I then proposed to purchase an EPSON Perfection Building an Open Source Institutional Repository at a Small Law School Library: Is it Realistic or Unattainable? Digital preservation activities among law libraries have largely been lim- ited by a lack of funding, staffing and expertise. Most law school libraries that have already implemented an Institutional Repository (IR) chose proprietary platforms because they are easy to set up, customize, and maintain with the technical and development support they provide. The Texas Tech University School of Law Digital Repository is one of the few law school repositories in the nation that is built on the DSpace open source platform.1 The reposi- tory is the law school’s first institu- tional repository in history. It was designed to collect, preserve, share and promote the law school’s digi- tal materials, including research and scholarship of the law faculty and students, institutional history, and law-related resources. In addition, the repository also serves as a dark archive to house internal records. I n this article, the author describes the process of building the dig- ital repository from scratch including hardware and software, cus- tomization, collection development, marketing and outreach, and future projects. Although the development Fang Wang (fang.wang@ttu.edu) is digital information Management Librarian, texas tech university School of Law Library, Lubbock, texas. 82 inFormAtion tecHnoloGY AnD liBrAries | June 2011 Two months later, we discovered that a preconfigured application called JumpBox for DSpace was released and approved to be a much eas- ier solution for the installation. The price was reasonable too, $149 a year (the price has jumped quite a bit since then). However, using JumpBox would leave our newly purchased Red Hat Linux server of no use because JumpBox runs on Ubuntu, therefore after some discussion we decided not to pursue it. We were a little stuck in the installation process. Outsourcing the installation seemed to be a fea- sible solution for us at this point. We identified a reputable DSpace service provider after doing exten- sive research including comparing vendors, obtaining references, and pursuing other avenues. After obtain- ing a quote, we were quite satisfied with the price and decided to con- tract with the vendor. While waiting for the contract to be approved by the university contracting office, I began designing the look and feel that is unique to the TTU School of Law with some help from another library staff member. The installation finally took place at the beginning of January 2010. I worked very closely with the service provider during the installation to ensure the desired con- figuration for our DSpace instance. Our repository site with the TTU Law branding became accessible to the public three days later. And with several weeks of warranty, we were able to adjust several configurations including display thumbnails for images. Overall, we are very pleased with the results. After the installa- tion, our IT department maintains the DSpace site and we host all the content on our own server. Collection Development of the IR Content is the most critical element to an institutional repository. While we were waiting for our IT department 66, the majority of the repositories worldwide were created using the DSpace platform.2 For the installation, we looked at the opportunity to use services provided by the state digital library consortium Texas Digital Library (TDL) and tried to pursue a partner- ship with the main university library, which had already implemented a digital repository. However, because of financial reasons and separate budgets, those approaches did not work out. So we decided to have our own IT department install DSpace. Installation and Customization of Our DSpace Unlike large university libraries, smaller special libraries face many challenges while trying to establish an open source repository. After making the decision to use DSpace, the first challenge we faced was the installa- tion. DSpace runs on PostgreSQL or Oracle and requires a server installa- tion. Customizing the web interface requires either the JSPUI (JavaServer Pages user interface) or XMLUI (Extensible Markup Language user interface). The staff in our IT depart- ment knew little about DSpace. However, another special library on campus offered their installation notes to our system administrator because they just installed DSpace. Although DSpace runs on a variety of operating systems, we purchased Red Hat enterprise Linux after some testing because it is the recommended OS for DSpace. Then our system administrator spent sev- eral months trying to figure out how to install the software in addition to his existing projects. Because we did not have dedicated IT personnel working on the installation, the work was often interrupted and very dif- ficult to complete. Our IT staff also found it very difficult to continue with the installation because the software requires a lot of expertise. by Berkley Press and is often used in the law library community. As a smaller law library, our budget did not allow us to purchase those plat- forms, which require annual fees of more than $10,000. So we had to look at the open source options. For the open source platforms, I investigated DSpace, Fedora, EPrints and Green Stone. DSpace is a Java- based system developed by MIT and HP labs. It offers a communities- collections model and has built-in submission workflows and long-term preservation function. It can be installed “out of the box” and is easy to use. It has been widely adopted as institutional repository software in the United States and worldwide. Fedora was also developed in the United States. It is more of a back- end software with no web-based administration tools and requires a lot of programming effort. Similar to DSpace, EPrints is another easy to set up and use IR software devel- oped in the U.K. It is written in Perl and is more widespread in Europe. Greenstone is a tool developed in New Zealand for building and dis- tributing digital library collections. It provides interfaces in 35 languages so it has many international users. When choosing an IR platform, it is not a question of which software is superior to others but rather which is more appropriate for the purpose and the content of the repository. Our goal was to find a platform that had low costs and did not involve much programming. We also wanted a sys- tem that was capable of archiving digital items in various formats for the long term, flexible for data migra- tion, had a widely accepted metadata scheme, decent search capability, and was easy to use. Another factor we had to consider was the user base. Because open source software relies on the user themselves for techni- cal support for the most part, we wanted a software that had an active user community in the United States. DSpace seemed to satisfy all of our needs. Also, according to repository : | WAnG 83BuilDinG An open source institutionAl repositorY At A smAll lAW scHool liBrArY | WAnG 83 hosted by the Lubbock County Bar Association at the TTU Law School. We made the initial announcement to the law faculty and staff and later to the Lubbock County Bar about the new digital initiative service we have established. We received very positive feedback from the law com- munity. Professor Edgar’s family was delighted to see his collection made available to the public. Following the success of the initial launch, I developed an out- reach plan to promote the digital repository. To make the repository site more visible, several efforts were made: the repository site URL was submitted to the DSpace user reg- istry, the Directory of Open Access Repositories (OpenDOAR), and Registry of Open Access Repositories (ROAR); the site was registered with Google Webmaster Tools for better indexing; and the repository was linked to several websites of the law school and library. The “Faculty Scholarship” collection and the “Texas Governor Executive Orders” collection became available shortly after. I then developed a poster of the newly established digital reposi- tory and presented it at the Texas Conference on Digital Libraries held at University of Texas Austin in May 2010. Currently, our digital repository has more than eight hundred digital items as of August 2010. With more and more content becoming avail- able in the repository, we plan on making an official announcement to the law community. We will also make entering first-year law stu- dents aware of the IR by including an article about the new repository in the library newsletter that is distrib- uted to them during their orientation. Our future marketing plan includes sending out announcements of new collections to the law school using our online announcement system TechLawAnnounce and promoting the digital repository through the law library social networking pages on Facebook and Twitter. We also plan reviewed each year. Based on the collection develop- ment policy, we made a decision to migrate the content of the old “Faculty Scholarship” collection from webpages into the digital reposi- tory. It was intended to include all publications of the Texas Tech law school faculty in the collection. We then hired a second-year law student as the digital project assistant and trained him on scanning, editing, and OCR-ing PDF files; uploading files to DSpace; and creating basic metadata. We also brought another two stu- dent assistants on board to help with the migration of the faculty scholar- ship collection. The faculty services librarian checked the copyright with faculty members and publish- ers while I (the digital information management librarian) served as the repository manager handling more complicated metadata creation, per- forming quality control over student submissions, and overseeing the whole project. Later Development and Promoting the IR During the faculty scholarship migra- tion process, we discovered a need to customize DSpace to allow active URLs for publications. We wanted all the articles linked to three widely used legal databases: Westlaw, LexisNexis, and Hein Online. Because the default DSpace system does not support active URLs, it requires some programming effort to make the sys- tem detect a particular metadata field then render it as a clickable link. We outsourced the development to the same service provider who installed DSpace for us. The results were very satisfying. The vendor cus- tomized the system to allow active URLs and displayed the links as click- able icons for each legal database. In April 2010, “Professor J. Hadley Edgar ’s Personal Papers” collection was made available in con- junction with his memorial service, to install DSpace, we prepared and scanned two collections: the “Texas Governor Executive Orders” collec- tion and the “Professor J. Hadley Edgar’s Personal Papers” collection. The latter was a collection donated by Professor Edgar’s wife after he passed away in 2009. Professor Edgar taught at the Law School from 1971 to 1991. He was named the Robert H. Bean Professor of Law and was twice voted by the student body as the Outstanding Law Professor. The collection contains personal correspondence, photos, newspa- per clippings, certificates, and other materials. Many of the items have a high historic value to the law school. For the scanning standards, we used 200 dpi for text-based materials and 400 dpi for pictures. We chose PDF as our production file format as it is a common document format and smaller in size to download. After the installation was com- pleted at the beginning of January, I drafted and implemented a digital repository collection development policy shortly after to ensure proper procedures and guidance of the repository development. The policy includes elements such as the pur- pose of the repository, scope of the collections, selection criteria and responsibilities, editorial rights, and how to handle challenges and withdrawals. I also developed a repository release form to obtain per- missions from donors and authors to ensure open access for the mate- rials in the repository. Twelve collections were initially planned for the repository: “Faculty Scholarship,” “Personal Manuscripts,” “Texas Governor Executive Orders,” “Law School History,” “Law Library History,” “Regional Legal History,” “Law Student Works,” “Audio/ Video Collection,” “Dark Archive,” “Electronic Journals,” “Conference, Colloquium and Symposium,” and “Lectures and Presentations.” There will be changes to the collections in the future as the digital repository collection development policy will be 84 inFormAtion tecHnoloGY AnD liBrAries | June 2011 All roads lead to Rome. No matter what platform you choose, whether open source or not, the goal is to pick a system that best suits your organization’s needs. To build a successful institutional repository is not simply “scanning” and “putting stuff online.” Various factors need to be considered, such as digitization, IR platform, collection development, metadata, copyright issues, and mar- keting and outreach. Our experience has proven that it is possible for a smaller special library with limited resources and funding to establish an open source IR such as DSpace and continue to maintain the site and build the collections with success. Open source software is cer- tainly not “free” because it requires a lot of effort. However, in the end it still costs a lot less than what we would pay to the proprietary soft- ware vendors. References 1. “The Texas Tech University School of Law digital repository,” http://reposi tory.law.ttu.edu/ (accessed Apr. 5, 2011). 2. “Repository Maps,” accessed http://maps.repository66.org/ (accessed Aug. 16, 2010). (SSRN) links to individual articles in the faculty scholarship collection. After that, the next collections we will work on are the law school and law library history materials. We also plan to do some development on the DSpace authentication to integrate with the TTU “eRaider” system to enable single log-in. In the future, we want to explore the possibilities of setting up a collection for the works of our law students and engage in electronic journal publishing using our digital repository. Conclusion It is not an easy task to develop an institutional repository from scratch, especially for a smaller organization. Installation and development are cer- tainly a big challenge for a smaller library with limited number of IT staff. Outsourcing these needs to a service provider seems to be a fea- sible solution. Another challenge is training. We overcame this challenge by taking advantage of the state con- sortium’s DSpace training sessions. Subscribing to the DSpace mailing list is necessary as it is a communica- tion channel for DSpace users to ask questions, seek help, and keep up to date about the software. on hosting information sessions for our law faculty and students to learn more about the digital repository. Future Projects There is no doubt that our digital repository will grow significantly because we have exciting collections planned for future projects. One of our law faculty, Professor Daniel Benson, donated some of his personal files from an eight-year litigation repre- senting the minority plaintiffs in the civil rights case of Jones v. City of Lubbock, 727 F. 2d 364 (5th Cir. 1984) in which the minority plaintiffs won the case. The lawsuit changed the City of Lubbock’s election system for city council members from the “at large” method to the “single member district system,” which allowed the minority candidates consistently being elected. This collection contains materials, notes, memoranda, letters, and other documents prepared and utilized by the plaintiffs’ attorneys. It has signifi- cant historical value because a Texas Tech Law Professor and five Texas Tech Law graduates participated in that case successfully as pro bono attorneys for the minority plaintiffs. In addition, we plan on adding Social Science Research Network 3012 ---- Editor’s Comments Bob Gerrity INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2012 1 Past and present converge with the December 2012 issue of Information Technology and Libraries (ITAL), as we also publish online the first volume of ITAL’s predecessor, the Journal of Library Automation (JOLA), originally published in print in 1968. The first volume of JOLA offers a fascinating glimpse into early days of library automation, when many things were different, such as the size (big) and capacity (small) of computer hardware, and many things were the same (e.g., Richard Johnson’s description of the book catalog project at Stanford, where “the major achievement of the preliminary systems design was to establish a meaningful dialogue between the librarian and systems and computer personnel.” Plus ça change, plus c'est la meme. There are articles by luminaries in the field: Richard de Gennaro describes approaches to developing an automation program in a large research library, Frederick Kilgour, from the Ohio Bob Gerrity (r.gerrity@uq.edu.au) is University Librarian, University of Queensland, Australia. http://ejournals.bc.edu/ojs/index.php/ital/issue/view/312 Editor’s Comments Bob Gerrity EDITOR’S COMMENTS | GERRITY 2 College Library Center (now OCLC), analyzes catalog-card production costs at Columbia, Harvard, and Yale in the mid 1960s (8.8 to 9.8 cents per completed card), and Henriette Avram from the Library of Congress describes the successful use of the COBOL programming language to manipulate MARC II records. The December 2012 issue marks the completion of ITAL’s first year as an e-only, open-access publication. While we don’t have readership statistics for the previous print journal to compare with, download statistics for the e-version appear healthy, with more than 30,000 full-text article downloads for 2012 content so far this year, plus more than 10,000 downloads for content from previous years. Based on the download statistics, the topics of most interest to today’s ITAL readers are discovery systems, web-based research guides, digital preservation, and digital copyright. This month’s issue takes some of these themes further, with articles that examine the usability of autocompletion features in library search interfaces (Ward, Hahn, and Feist), reveal patterns of student use of library computers (Thompson), propose a cloud-based digital library storage solution (Sosa-Sosa), and summarize attributes of open standard file formats (Park, Oh). Happy reading. 3037 ---- 2 iNFOrMAtiON tecHNOlOGY AND liBrAries | MArcH 2011 program that would provide for educating mentees about LITA, sharing of areas of expertise and awareness, and develop a network of professionals. Dialogue on the LITA electronic discussion list and conversations with committee and interest group chairs suggests a desire and need for leadership training. The Membership Development Committee is addressing the need for mentors in LITA 101 and LITA 201 held at ALA Annual Conferences and Midwinter Meetings. LITA leadership, including the Membership Development Committee, Committee and Interest Group Chairs, the Education Committee, LITA Emerging Leaders, and others, will be included in an ongoing dialogue to see how and what can be implemented from the LITA Leadership Institute and the LITA Mentorship Program recommendations as submitted by the 2009 Emerging Leaders Team T. Follow-up by LITA to implement the recommendations of emerging leader projects is important to the vitality and longevity of the association. Since 2007, a number of projects have been developed by Emerging Leaders. Information about the projects is available at the follow- ing locations online: ■■ The ALA website: http://www.ala.org/ala/educationcareerleader ship/emergingleaders/index.cfm ■■ ALA Connect: http://connect.ala.org/emergingleaders ■■ Facebook: http://www.facebook.com/pages/ALA-Emerging -Leaders/156736295251?ref=ts/ ■■ The Emerging Leaders blog: http://connect.ala.org/2011emergingleaders ■■ The Emerging Leaders Wiki: http://emergingleaders.ala.org/wiki/index.php ?title =Main_Page I n 2006, ALA President Leslie Burger implemented six initiatives, including an Emerging Leaders program that is now in its fifth year. The initiative was designed to prepare librarians who are new to the profession in leadership skills that are applicable on the job and as active leaders within the association. LITA is sponsor- ing 2011 emerging leaders Bohyun Kim and Andreas Orphanides. Bohyun is currently digital access librarian at the Florida International University Medical Library. Andreas is currently librarian for digital technologies and learning at the North Carolina State University Libraries. As of the writing of this column, the projects for 2011 have not been assigned. Additional LITA members accepted into the 2011 ALA Emerging Leaders Program include Tabatha Farney, Deana Greenfield, Amanda Harlan, Colleen Harris, Megan Hodge, Matthew Jabaily, Catherine Kosturski, Nicole Pagowsky, Casey Schacher, Sibyl Schaefer, Jessica Sender, and Andromeda Yelton. LITA provides an ideal environment for its members to enhance their skills. In 2009, Emerging Leaders Team T developed a project “Making it Personal: Leadership Development Programs for LITA,” working in consulta- tion with the LITA Membership Development Committee. Team members included Amanda Hornby (University of Washington), Angelica Guerrero Fortin (San Diego County Library), Dan Overfield (Cuyahoga Community College), and Lisa Carlucci Thomas (Yale University). The Team T members recommended the creation of “an online continuing education program to develop the leadership and project management skills necessary to maintain and promote the value and ability of LITA’s professional membership to the greater librarian population.” Outcomes for the training would include project-management and team-building skills within a context that focuses on the development and application of technology in libraries. The team members also recommended the establishing of a LITA mentorship Karen J. starr (kstarr@nevadaculture.org) is lita President 2010–11 and assistant administrator for library and develop- ment Services, nevada State library and archives, carson city. Karen J. Starr President’s Message: Membership, Leadership, Emerging Leaders, and LITA 3038 ---- eDitOriAl | truitt 3 W ithin the last few months, two provocative books have been published that take different approaches to the question of how we learn in the always-on, always-connected electronic environment of “screens.” While neither is specifically directed at librarians, I think both deserve to be read and discussed widely in our community. ■■ The Shallows The first, The Shallows: What the Internet is Doing to Our Brains (Norton, 2010), by Nicholas Carr, is an expanded version of his article “Is Google Making Us Stupid?” pub- lished in the July/August 2008 issue of Atlantic Monthly and discussed in this space soon after.1 Carr’s arguments in The Shallows will be familiar to those who read his ear- lier article, but they are more thoroughly developed in his book and worth summarizing here. Carr’s thesis is that use of connective technology—the Internet and the web—is leading to a remapping of cog- nitive reading and thinking skills, and a “shallowing” of these mental faculties: Over the last few years I’ve had an uncomfortable sense that someone, or something, has been tinkering with my brain, remapping the neural circuitry, repro- gramming the memory. . . . I’m not thinking the way I used to think. I feel it most strongly when I’m reading. I used to find it easy to immerse myself in a book or a lengthy article. . . . That’s rarely the case anymore. (5) The problem, as Carr goes on to describe at some length, chronicling in detail the results of years of neu- rological investigations, is that the brain is “plastic.” “Virtually all of our neural circuits—whether they’re involved in feeling, seeing, hearing, moving, thinking, learning, perceiving, or remembering—are subject to change.” And one of the things that is changing them the most drastically today is our growing reliance on digital information. The paradox is that as we repeat an activity—surfing the Web and clicking on links, rather than engaging with linear texts, for example—chemically induced synapses cause us to want to continue the new activity, strengthening those links (34). This quality of plastic neural circuits that can be remapped, when combined with the “ecosystem of inter- ruption technologies” of the Internet and the Web (e.g., in-text hyperlinks, e-mail and RSS alerts, text messaging, Twitter, multiple widgets, etc.) is resulting in what Carr argues is a growing inability or unwillingness to engage with and reflect deeply upon extended text (91).2 As Carr puts it, the linear, literary mind . . . [that has] been the imagina- tive mind of the Renaissance, the rational mind of the Enlightenment, the inventive mind of the Industrial Revolution, even the subversive mind of Modernism . . . may soon be yesterday’s mind. (10) There is much more. Carr offers pointed critiques of major Internet players and the roles they play in facilitat- ing and exploiting the remapping of our neural circuits. Google, whose “profits are tied directly to the velocity of people’s information intake,” is to Carr “in the busi- ness of distraction” (156–57). The Google Book initiative “shouldn’t be confused with the libraries we’ve known until now. It’s not a library of books. It’s a library of snip- pets. . . . The strip-mining of ‘relevant content’ replaces the slow excavation of meaning” (166). Ultimately, for Carr, it’s about who is controlling whom. While the Internet may permit us to better per- form some functions—search, for example—“it poses a threat to our integrity as human beings . . . we program our computers and thereafter they program us” (214). Put another way, “the computer screen bulldozes our doubts with its bounties and conveniences. It is so much our servant that it would seem churlish to notice that it is also our master” (4). ■■ Hamlet’s Blackberry Perhaps less familiar than Carr’s work is William Powers’ Hamlet’s Blackberry: A Practical Philosophy for Building a Good Life in the Digital Age (HarperCollins 2010). Powers, a writer whose work has appeared in the Washington Post, the New York Times, the New Republic, and elsewhere, describes the influence of digital technology (or “screens,” to use his shorthand)3 and connectedness on our lives: In the last few decades, we’ve found a powerful new way to pursue more busyness: digital technology. Computers and smart phones are often pitched as solutions to our stressful, overextended lives. . . . But at the same time, they link us more tightly to all the sources of our busyness. Our screens are conduits for everything that keeps us hopping—mandatory and optional, worthwhile and silly. . . . Marc TruittEditorial: “The Air is Full of People” Marc truitt (marc.truitt@ualberta.ca) is associate university librarian, Bibliographic and information technology Services, university of alberta libraries, edmonton, alberta, canada, and editor of ITAL. 4 iNFOrMAtiON tecHNOlOGY AND liBrAries | MArcH 2011 if not yet a general consensus, that people are coming to experience and understand these costs. Finally, they also make the point that things need not continue on their present course. I can imagine that if we in libraries take Carr and Powers seriously, there might be significant implications for service models and collections practices. Both books have been reviewed in all the usual main- stream places. Remarkably though, to me—and excluding a scant few discussion list threads such as that on web4lib several years ago—I’ve seen no discussion in the usual professional venues of their implications where libraries are concerned. Perhaps I’m simply not reading the “right” weblogs or discussion lists. I’m not under the illusion that libraries or librarians can by themselves alter our rush toward the “shallows.” Still, given our eagerness to discuss how we extend the reach of “screens” in libraries—whether in the form of learning commons, wireless access, mobile-friendly web- sites, clearing stacks of “tree-books” in favor of e-books, etc.—would it not be reasonable to think that we should show as much concern about the consequences of such activities, and even some interest in providing possible remedial alternatives? One of my favorite library spaces in college was the Linonia and Brothers Reading Room in Yale’s Sterling Memorial Library (see a photo of the reading room at http://images.library.yale.edu/madid/oneItem.aspx ?id=1772930). Its dark oak paneling, built-in bookshelves, overstuffed leather easy chairs, cozy alcoves, toasty, foot- warming steam radiators, and stained-glass windows overlooking a quiet courtyard represented the epitome of the nineteenth-century “gentleman’s library” and encour- aged the sort of deep reading and contemplation that are becoming so rare in our institutions today. I spent many hours there, reading, thinking, dreaming—and yes, cat- napping too. I haven’t visited the “L&B” in years; I hope it is still the way I so fondly recall it. Over the past few years, as we’ve considered the various aspects of the library-as-space question, we’ve created all manner of collaborative, group-focused, überconnected learning spaces. We’ve also created book- free spaces (to say nothing of book-free “libraries”), food-friendly spaces, quiet and cell-phone-free spaces, and a host of others of which I’m sure I haven’t thought. So, in an attempt to get us thinking about what Carr ’s and Powers’ books might mean for libraries, here’s a crazy idea to start us off: How about a screen-free space for deep reading and contemplation? It should be very low-tech: no mobiles, no laptops, no desktops, no net- works, no clickety-clack of keys, no chimes of incoming e-mail and tweets, no unearthly glow of monitors. No food, drink, or group-study areas, either. Just a quiet, inviting, comfortable space for individual reading and The goal is no longer to be “in touch” but to erase the possibility of ever being out of touch. To merge, to live simultaneously with everyone, sharing every moment, every perception, thought, and action via our screens. Even the places where we used to go to get away from the crowd and the burdens it imposes on us are now connected. The simple act of going out for a walk is completely different today from what it was fifteen years ago. Whether you’re walking down a big-city street or in the woods outside a country town, if you’re carrying a mobile device with you, the global crowd comes along. . . . The air is full of people. (14–15) Drawing inspiration and analogy from a list of philos- ophers and other historical and literary figures beginning with Plato and ending with McLuhan, Powers describes seven practical approaches, tools, and techniques for dis- connecting from our screen-driven life: ■■ Seek physical distance (Plato) ■■ Seek intellectual and emotional distance (Seneca) ■■ Hope for devices that might allow us to customize our degree of connectedness (Gutenberg) ■■ Consider older, low-tech tools as alternatives where possible (Shakespeare via Hamlet) ■■ Create positive rituals (Ben Franklin) ■■ Create a “Walden zone” refuge (Thoreau) ■■ Be aware of and take personal control from technol- ogy by being aware of that technology (McLuhan) Powers then reviews how he and his family used these techniques to regain the sense of control and depth they felt they’d lost to screens. In the past several months, I’ve tried a couple myself. I no longer carry a Blackberry unless I’m traveling out of town. I avoid e-mail and the Internet completely on Saturdays (my “Internet Sabbath”). The effect of these two small and easily achieved changes has been little short of liberating, providing space to think and reflect without the distraction of always-on connectedness. Walking my Lab Seamus has become a special pleasure! ■■ Bringing Libraries into the Picture So, what do Carr’s and Powers’ theses mean for libraries, and what do they mean in particular for those of us who provide technology solutions for libraries? They remind us that there is a very real human cost to the technology of screens and always-on connectedness that have become our stock-in-trade in recent years. As well, they provide convincing evidence that there is a growing awareness, eDitOriAl | truitt 5 References and Notes 1. Carr’s Atlantic Monthly article appeared in volume 301 (July/Aug. 2008) and can be found at http://www.theatlantic . c o m / m a g a z i n e / a rc h i v e / 2 0 0 8 / 0 7 / i s - g o o g l e - m a k i n g - u s -stupid/6868/ (accessed Jan. 14, 2011); my ITAL column on the topic is at http://www.ala.org/ala/mgrps/divs/lita/ ital/272008/2703sep/editorial_pdf.cfm (accessed Jan. 14, 2011). 2. The term “ecosystem of interruption technologies” belongs to Cory Doctorow. 3. Powers uses the term “screens” to describe “the connec- tive digital devices that have been widely adopted in the last two decades, including desktop and notebook computers, mobile phones, e-readers, and tablets” (1). thought. Would some of our patrons adopt it? I’m will- ing to bet that they would. Do we not owe them the same commitment to service that we’ve worked so hard to provide to those who wish to be collaborative and “always-on”? Absolutely. No, we can’t change the world or stop the march of the screens. But perhaps, as with Powers’ “Walden Zone,” we can start by providing a close-at-hand safe harbor for those of our patrons seeking refuge from the “always-on” world of screens. 3039 ---- 6 iNFOrMAtiON tecHNOlOGY AND liBrAries | MArcH 2011 I n the new LITA strategic plan, members have sug- gested an objective for open access (OA) in scholarly communications. Some people describe OA as articles the author has to pay someone to publish. That can be true, but that’s not how I think of it. OA is definitely not vanity publishing. Most OA journals are peer-reviewed. I like the definition provided by EnablingOpenScholarship: Open Access is the immediate (upon or before publica- tion), online, free availability of research outputs with- out any of the restrictions on use commonly imposed by publisher copyright agreements.1 My focus on OA journals increased precipitously when the licensing for a popular American weekly medi- cal journal changed. We could only access online articles from one on-campus computer unless we increased our annual subscription payment by 500 percent. We didn’t have the funds, and now the students suffer the consequences. I think it was an unfortunate decision the journal’s publishers made. I know from experience that if a student can’t access the first article they want, they will find another one that is available. Interlibrary loan is simpler than ever, but I think only the patient and curious students will make the effort to contact us and request an article they cannot obtain. In 2006 scientist Gary Ward wrote that faculty at many institutions experience problems accessing current research. When faculty teach “what is available to them rather than what their students most need to know, the education of these students and the future of science in the U.S. will suffer.” He explains it is a false assumption that those who need access to scientific literature already have it. Interlibrary loans or pay-per-view are often offered by publishers as the solution to the access problem, but this misses an essential fact of how we use the scien- tific literature: We browse. It is often impossible to tell from looking at an abstract whether a paper contains needed methodological detail or the perfect illustration to make a point to one’s students. Apart from consider- ations of cost, time, and quality, interlibrary loans and pay-per-views simply do not meet the needs of those of us who often do not know what we’re looking for until we find it.2 I want our medical students and tomorrow’s doctors to have access to all of the most current medical research. We offer the service of providing JAMA articles to students, but I’m guessing that we hear from a small percentage of the students who can’t access the full text online. Are people reading OA articles? Not only are scholars reading the articles, but they are citing those articles in their publications. Consider the Public Library of Science’s PLoSOne (http://www.plosone.org/home.action), a peer- reviewed, open-access, online publication that features reports on primary research from all disciplines within science and medicine. In June 2010, PLoSONE received its first impact factor of 4.351—an impressive number. That impact factor puts PLoSONE in the top 25 percent of the Institute for Scientific Information’s (ISI) biology cat- egory.3 The impact factor is calculated annually by ISI and represents the average number of citations received per paper published in that journal during the two preceding years.4 In other words, articles from PLoSONE published in 2008 and 2009 were highly cited. Is OA making an impact in my medical library? I believe it is, although I won’t be happy until our students can access the online journals they want from off campus and the library won’t have to pay outrageous licensing fees. We have more than one thousand online OA journal titles in our list of online journals. The more full text they can access, the less they’ll have to settle for their second or third choice because their first choice is not available online. I’m glad that LITA members included OA in their stra- tegic plan. The number of OA journals is increasing, and I believe we will continue to see that the articles are reach- ing readers and making a difference. I don’t think ITAL will be adopting the “author pays” model of OA, but the editorial board is dedicated to providing LITA members with the access they want. References 1. EnablingOpenScholarship, “Enabling Open Scholarship: Open Access,” http://www.openscholarship.org/jcms/c_6157/ open-access?portal=j_55&printView=true, (accessed Jan. 18, 2011). 2. Ward, Gary, “Deconstructing the Arguments Against Improved Public Access,” Newsletter of the American Society for Cell Biology, Nov. 2006, http://www.ascb.org/filetracker .cfm?fileid=550 (accessed Jan. 18, 2011). 3. Davis, Phil, “PLoS ONE: Is a High Impact Factor a Blessing or a Curse?” online posting, June 21, 2010, The Scholarly Kitchen, http://scholarlykitchen.sspnet.org/2010/06/21/plosone -impact-factor-blessing-or-a-curse/ (accessed Jan. 18, 2011). 4. Thomson Reuters, “Introducing the Impact Factor,” http://thomsonreuters.com/products_services/science/ academic/impact_factor/ (accessed Jan. 18, 2011). Cynthia Porter Editorial Board Thoughts: Is Open Access the Answer? cynthia Porter (cporter@atsu.edu) is distance Support librar- ian at a.t. Still university of health Sciences, Mesa, arizona. 3040 ---- A siMPle scHeMe FOr BOOK clAssiFicAtiON usiNG WiKiPeDiA | YeltON 7 Andromeda Yelton A Simple Scheme for Book Classification Using Wikipedia ■■ Background Hanne Albrechtsen outlines three types of strategies for subject analysis: simplistic, content-oriented, and require- ments-oriented.3 In the simplistic approach, “subjects [are] absolute objective entities that can be derived as direct lin- guistic abstractions of documents.” The content-oriented model includes an interpretive step, identifying subjects not explicitly stated in the document. Requirements- oriented approaches look at documents as instruments of communication; thus they anticipate users’ potential information needs and consider the meanings that docu- ments may derive from their context. (See, for instance, the work of Hjørland and Mai.4) Albrechtsen posits that only the simplistic model, which has obvious weaknesses, is amenable to automated analysis. The difficulty in moving beyond a simplistic approach, then, lies in the ability to capture things not stated, or at least not stated in proportion to their impor- tance. Synonymy and polysemy complicate the task. Background knowledge is needed to draw inferences from text to larger meaning. These would be insuperable barri- ers if computers limited to simple word counts. However, thesauri, ontologies, and related tools can help computers as well as humans in addressing these problems; indeed, a great deal of research has been done in this area. For instance, enriching metadata with Princeton University’s WordNet and the National Library of Medicine’s Medical Subject Headings (MeSH) is a common tactic,5 and the Yahoo! category structure has been used as an ontology for automated document classification.6 Several projects have used Library of Congress Classification (LCC), Dewey Decimal Classification (DDC), and similar library tools for automated text classification, but their results have not been thoroughly reported.7 All of these tools have had problems, though, with issues such as coverage, currency, and cost. This has motivated research into the use of Wikipedia in their stead. Since Wikipedia’s founding in 2001, it has grown prodigiously, encompassing more than 3 million articles in its English edition alone as of this writing; this gives it unparalleled coverage. Wikipedia also has many thesaurus-like features. Redirects function as “see” references by linking syn- onyms to preferred terms. Disambiguation pages deal with homonyms. The polyhierarchical category structure provides broader and narrower term relationships; the vast majority of pages belong to at least one category. Links between pages function as related-term indicators. Editor’s note: This article is the winner of the LITA/Ex Libris Student Writing Award, 2010. Because the rate at which documents are being generated outstrips librarians’ ability to catalog them, an accurate, automated scheme of subject classification is desirable. However, simplistic word-counting schemes miss many important concepts; librarians must enrich algorithms with background knowledge to escape basic problems such as polysemy and synonymy. I have developed a script that uses Wikipedia as context for analyzing the subjects of nonfiction books. Though a simple method built quickly from freely available parts, it is partially successful, suggesting the promise of such an approach for future research. A s the amount of information in the world increases at an ever-more-astonishing rate, it becomes both more important to be able to sort out desirable information and more egregiously daunting to manually catalog every document. It is impossible even to keep up with all the documents in a bounded scope, such as aca- demic journals; there were more than twenty-thousand peer-reviewed academic journals in publication in 2003.1 Therefore a scheme of reliable, automated subject classifi- cation would be of great benefit. However, there are many barriers to such a scheme. Naive word-counting schemes isolate common words, but not necessarily important ones. Worse, the words for the most important concepts of a text may never occur in the text. How can this problem be addressed? First, the most characteristic (not necessarily the most common) words in a text need to be identified—words that particularly distinguish it from other texts. Some corpus that con- nects words to ideas is required—in essence, a way to automatically look up ideas likely to be associated with some particular set of words. Fortunately, there is such a corpus: Wikipedia. What, after all, is a Wikipedia article, but an idea (its title) followed by a set of words (the article text) that characterize that title? Furthermore, the other elements of my scheme were readily available. For many books, Amazon lists Statistically Improbable Phrases (SIPs)— that is, phrases that are found “a large number of times in a particular book relative to all Search Inside! books.”2 And Google provides a way to find pages highly relevant to a given phrase. If I used Google to query Wikipedia for a book’s SIPs (using the query form “site:en.wikipedia .org SIP”), would Wikipedia’s page titles tell me some- thing useful about the subject(s) of the book? Andromeda Yelton (andromeda.yelton@gmail.com) graduated from the Graduate School of library and information Science, Simmons college, Boston, in May 2010. 8 iNFOrMAtiON tecHNOlOGY AND liBrAries | MArcH 2011 ■■ An Initial Test Case To explore whether my method was feasible, I needed to try it on a test case. I chose Stephen Hawking’s A Brief History of Time, a relatively accessible meditation on the origin and fate of the universe, classified under “cosmol- ogy” by the Library of Congress. I began by looking up its SIPs on Amazon.com. Noticing that Amazon also lists Capitalized Phrases (CAPs)—“people, places, events, or important topics mentioned frequently in a book”—I included those as well (see table 1).14 I then queried Wikipedia via Google for each of these phrases, using queries such as “site:en.wikipedia .org ‘grand unification theory.’” I selected the top three Wikipedia article hits for each phrase. This yielded a list of sixty-one distinct items with several interesting properties: ■■ Four items appeared twice (Arrow of time, Entropy [arrow of time], Inflation [cosmology], Richard Feynman). However, nothing appeared more than twice; that is, nothing definitively stood out. ■■ Many items on the list were clearly relevant to Brief History, although often at too small a level of granu- larity to be good subject headings (e.g., Black hole, Second law of thermodynamics, Time in physics). ■■ Some items, while not unrelated, were wrong as sub- ject classifications (e.g., List of Solar System objects by size, Nobel Prize in Physics). ■■ Some items were at best amusingly, and at worst baf- flingly, unrelated (e.g., Alpha Centauri [Doctor Who], Electoral district [Canada], James K. Polk, United States men’s national soccer team). ■■ In addition, I had to discard some of the top Google hits because they were not articles but Wikipedia spe- cial pages, such as “talk” pages devoted to discussion of an article. This test showed that I needed an approach that would give me candidate subject headers at a higher level of granularity. I also needed to be able to draw a brighter line between candidates and noncandidates. The pres- ence of noncandidates was not in itself distressing—any automated approach will consider avenues a human would not—but not having a clear basis for discarding low-probability descriptors was a problem. As it happens, Wikipedia itself offers candidate subject headers at a higher level of granularity via its categories system. Most articles belong to one or more categories, which are groups of pages belonging to the same list or topic.15 I hoped that by harvesting categories from the sixty-one pages I had discovered, I could improve my method. This yielded a list of more than three hundred catego- ries. Unsurprisingly, this list mostly comprised irrelevant Because of this thesaurus structure, all of which can be harvested and used automatically, many researchers have used Wikipedia for metadata enrichment, text clustering and classification, and the like. For example, Han and Zhao wanted to automati- cally disambiguate names found online but faced many problems familiar to librarians: “The traditional methods measure the similarity using the bag of words (BOW) model. The BOW, however, ignores all the semantic rela- tions such as social relatedness between named entities, associative relatedness between concepts, polysemy and synonymy between key terms. So the BOW cannot reflect the actual similarity.” To counter this, they constructed a semantic model from information on Wikipedia about the associative relationships of various ideas. They then used this model to find relationships between information found in the context of the target name in different pages. This enabled them to accurately group pages pertaining to particular individuals.8 Carmel, Roitman, and Zwerdling used page catego- ries and titles to enhance labeling of document clusters. Although many algorithms exist for sorting large sets of documents into smaller, interrelated clusters, there is less work on labeling those clusters usefully. By extract- ing cluster keywords, using them to query Wikipedia, and algorithmically analyzing the results, they created a system whose top five recommendations contained the human-generated cluster label more than 85 percent of the time.9 Schönhofen looked at the same problem I examine— identifying document topics with Wikipedia data—but he used a different approach. He calculated the related- ness between categories and words from titles of pages belonging to those categories. He then used that relat- edness to determine how strongly words from a target document predicted various Wikipedia categories. He found that although his results were skewed by how well- represented topics were on Wikipedia, “for 86 percent of articles, the top 20 ranked categories contain at least one of the original ones, with the top ranked category correct for 48 percent of articles.”10 Wikipedia has also been used as an ontology to improve clustering of documents in a corpus,11 to auto- matically generate domain-specific thesauri,12 and to improve Wikipedia itself by suggesting appropriate cat- egories for articles.13 In short, Wikipedia has many uses for metadata enrichment. While text classification is one of these poten- tial uses, and one with promise, it is under-explored at present. Additionally, this exploration takes place almost entirely in the proceedings of computer science confer- ences, often without reference to library science concepts or in a place where librarians would be likely to benefit from it. This paper aims to bridge that gap. A siMPle scHeMe FOr BOOK clAssiFicAtiON usiNG WiKiPeDiA | YeltON 9 computationally trivial to do so, given such a list. (The list need not be exhaustive as long as it exhaustively described category types; for instance, the same regular expression could filter out both “articles with unsourced statements from October 2009” and “articles with unsourced state- ments from May 2008.”) At this stage of research, however, I simply ignored these categories in analyzing my results. To find a variety of books to test, I used older New York Times nonfiction bestseller lists because brand-new books are less likely to have SIPs available on Amazon.19 These lists were heavily slanted toward autobiography, but also included history, politics, and social science topics. ■■ Results Of the thirty books I examined (the top fifteen each from paperback and hardback nonfiction lists), twenty-one had SIPs and CAPs available on Amazon. I ran my script against each of these phrase sets and calculated three measures for each resulting category list: ■■ Precision (P): of the top categories, how many were synonyms or near-synonyms of the book’s LCSHs? ■■ Recall (R): of the book’s LCSHs, how many had syn- onyms or near-synonyms among the top categories? ■■ Right-but-wrongs (RbW): of the top categories, how many are reminiscent of the LCSHs without actu- ally being synonymous? These included narrower terms (e.g., the category “African_American_actors” when the LCSHs included “Actors—United States —Biography”), broader terms (e.g., “American_folk_ singers” vs. “Dylan, Bob, 1941–”), related terms (e.g., “The_Chronicles_of_Narnia_books” vs. “Lion, the Witch and the Wardrobe (Motion picture)”), and exam- ples (“Killian_documents_controversy” vs. “United States—Politics and government—2001–2009”). I considered the “top categories” for each book to be the five that most commonly occurred (excluding Wikipedia administrative categories), with the following exceptions: ■■ Because I had no basis to distinguish between them, I included all equally popular categories, even if that would bring the total to more than five. Thus, for example, for the book Collapse, the most common category occurred seven times, followed by two cat- egories with five appearances and six categories with four. Rather than arbitrarily selecting two of the six four-occurrence categories to bring the total to five, I examined all nine top categories. ■■ If there were more than five LCSHs, I expanded the number of categories accordingly, so as not to candidates (“wars involving the states and peoples of Asia,” “video games with expansion packs,” “organiza- tions based in Sweden,” among many others). Many categories played a clear role in the Wikipedia ecology of knowledge but were not suitable as general-purpose sub- ject headers (“living people,” “1849 deaths”). Strikingly, though, the vast majority of candidates occurred only once. Only forty-two occurred twice, fifteen occurred three times, and one occurred twelve times: “physical cosmology.” Twelve occurrences, four times as many as the next candidate, looked like a bright line. And “physical cosmology” is an excellent description of Brief History— arguably better than LCSH’s “cosmology.” The approach looked promising. ■■ Automating Further Test Cases The next step was to test an extensive variety of books to see if the method was more broadly applicable. However, running searches and collating queries for even one book is tedious; investigating a large number by hand was prohibitive. Therefore I wrote a categorization script (see appendix) that performs the following steps:16 ■■ reads in a file of statistically improbable phrases17 ■■ runs Google queries against Wikipedia for all of them18 ■■ selects the top hits after filtering out some common Wikipedia nonarticles, such as “category” and “user” pages ■■ harvests these articles’ categories ■■ sorts these categories by their frequency of occurrence This algorithm did not filter out Wikipedia adminis- trative categories, as creating a list of them would have been prohibitively time-consuming. However, it would be Table 1. SIPs and CAPs for A Brief History of Time SIPs grand unification energy, complete unified theory, thermodynamic arrow, psychological arrow, primordial black holes, boundary proposal, hot big bang model, big bang singularity, more quarks, contracting phase, sum over histories CAPs Alpha Centauri, Solar System, Nobel Prize, North Pole, United States, Edwin Hubble, Royal Society, Richard Feynman, Milky Way, Roger Penrose, First World War, Weak Anthropic Principle 10 iNFOrMAtiON tecHNOlOGY AND liBrAries | MArcH 2011 “Continental_Army_generals” vs. “United States— History—Revolution, 1775–1783.” ■■ weak: some categories treated the same subject as the LCSH but not at all in the same way ■■ wrong: the categories were actively misleading The results are displayed in table 2. ■■ Discussion The results of this test were decidedly more mixed than those of my initial test case. On some books the Wikipedia method performed remarkably well; on misleadingly increase recall statistics. ■■ I did not consider any categories with fewer than four occurrences, even if that left me with fewer than five top categories to consider. The lists of three-, two-, and one-occurrence categories were very long and almost entirely composed of unrelated items. I also considered, subjectively, the degree of overlap between the LCSHs and the top Wikipedia categories. I chose four degrees of overlap: ■■ strong: the top categories were largely relevant and included synonyms or near-synonyms for the LCSH ■■ near miss: some categories suggested the LCSH but missed its key points, such as Table 2. Results (sorted by percentage of relevant categories). Book P R RbW Subjective Quality Chronicles, Bob Dylan 0.2 0.5 0.8 strong The Chronicles of Narnia: The Lion, the Witch and the Wardrobe Official Illustrated Movie Companion, Perry Moore 0.25 1 0.625 strong 1776, David McCullough 0 0 0.8 near miss 100 People Who Are Screwing Up America, Bernard Goldberg 0 0 0.625 weak The Bob Dylan Scrapbook, 1956–1966, with text by Robert Santelli 0.2 0.5 0.4 strong Three Weeks With My Brother, Nicholas Sparks 0 0 0.57 weak Mother Angelica, Raymond Arroyo 0.07 0.33 0.43 near miss Confessions of a Video Vixen, Karrine Steffans 0.25 0.33 0.25 weak The Fairtax Book, Neal Boortz and John Linder 0.17 0.33 0.33 strong Never Have Your Dog Stuffed, Alan Alda 0 0 0.43 weak The World is Flat, Thomas L. Friedman 0.4 0.5 0 near miss The Tender Bar, J. R. Moehringer 0 0 0.2 wrong The Tipping Point, Malcolm Gladwell 0 0 0.2 wrong Collapse, Jared Diamond 0 0 0.11 weak Blink, Malcolm Gladwell 0 0 0 wrong Freakonomics, Steven D. Levitt and Stephen J. Dubner 0 0 0 wrong Guns, Germs, and Steel, Jared Diamond 0 0 0 weak Magical Thinking, Augusten Burroughs 0 0 0 wrong A Million Little Pieces, James Frey 0 0 0 wrong Worth More Dead, Ann Rule 0 0 0 wrong Tuesdays With Morrie, Mitch Albom No category with more than 4 occurrences A siMPle scHeMe FOr BOOK clAssiFicAtiON usiNG WiKiPeDiA | YeltON 11 my method’s success with A Brief History of Time. I tested another technical, jargon-intensive work (N. Gregory Mankiw’s Macroeconomics textbook), and found that the method also worked very well, giving categories such as “macroeconomics” and “economics terminology” with high frequency. Therefore a system of this nature, even if not usable for a broad-based collection, might be very useful for scientific or other jargon-intensive content such as a database of journal articles. ■■ Future Research The method outlined in this paper is intended to be a proof of concept using readily available tools. The follow- ing work might move it closer to a real-world application: ■■ A configurable system for providing statistically improbable phrases; there are many options.23 This would provide the user with more control over, and understanding of, SIP generation (instead of the Amazon black box), as well as providing output that could integrate directly with the script. ■■ A richer understanding of the Wikipedia category system. Some categories (e.g., “all articles with unsourced statements”) are clearly useful only for Wikipedia administrative purposes, not as document descriptors; others (e.g., “physical cosmology”) are excellent subject candidates; others have unclear value as subjects or require some modification (e.g., “environmental non-fiction books,” “macroeconom- ics stubs”). Many of these could be filtered out or reformatted automatically. ■■ Greater use of Wikipedia as an ontology. For exam- ple, a map of the category hierarchies might help locate headers at a useful level of granularity, or to find the overarching meaning suggested by several headers by finding their common broader terms. A more thorough understanding of Wikipedia’s rela- tional structure might help disambiguate terms.24 others, it performed very poorly. However, there are several patterns here: Many of these books were autobiographies, and the method was ineffective on nearly all of these.20 A key feature of autobiographies, of course, is that they are typi- cally written in the first person, and thus lack any term for the major subject—the author’s name. Biography, by contrast, is rife with this term. This suggests that includ- ing titles and authors along with SIPs and CAPs may be wise. Additionally, it might require making better use of Wikipedia as an ontology to look for related concepts (rather in the manner that Han and Zhao used it for name disambiguation).21 Books that treat a single, well-defined subject are eas- ier to analyze than those with more sprawling coverage. In particular, books that treat a concept via a sequence of illustrative essays (e.g., Tipping Point, Freakonomics) do not work well at all. SIPs may apply only to particu- lar chapters rather than to the book as a whole, and the algorithm tends to pick out topics of particular chapters (e.g., for Freakonomics, the fascinating chapter on Sudhir Venkatesh’s work on “Gangs_in_Chicago, _Illinois”22) rather than the connecting threads of the entire book (e.g. “Economics—Sociological aspects”). The tactics sug- gested for autobiography might help here as well. My subjective impressions were usually, but not always, borne out by the statistics. This is because some of the RbWs were strongly related to one another and sug- gested to a human observer a coherent narrative, whereas others picked out minor or dissimilar aspects of the book. There was one more interesting, and promising, pattern: my subjective impressions of the quality of the categories were strongly predicted by the frequency of the most common category. Remember that in the Brief History example, the most common category, “physical cosmology,” occurred twelve times, conspicuously more than any of its other categories. Therefore I looked at how many times the top category for each book occurred in my results. I averaged this number for each subjective quality group; the results are in table 3. In other words, the easier it was to draw a bright line between common and uncommon categories, the more likely the results were to be good descriptions of the work. This suggests that a system such as this could be used with very little modification to streamline catego- rization. For example, it could automatically categorize works when it met a high confidence threshold (when, for instance, the most common category has double-digit occurrence), suggest categories for a human to accept or reject at moderate confidence, and decline to help at low confidence. It was also interesting to me that—unlike my initial test case—none of the bestsellers were scientific or techni- cal works. It is possible that the jargon-intensive nature of science makes it easier to categorize accurately, hence Table 3. Category Frequency and Subjective Quality Subjective Quality of Categories Frequencies of Most Common Category Average Frequency of Most Common Category strong 6, 12, 16, 19 13.25 near miss 5, 5, 7, 10 6.75 weak 4, 5, 6, 7, 8 6 wrong 3, 4, 4, 5, 5, 5, 7, 7 5 12 iNFOrMAtiON tecHNOlOGY AND liBrAries | MArcH 2011 (1993): 219. 4. Birger Hjørland, “The Concept of Subject in Information Science,” Journal of Documentation 48, no. 2 (1992): 172; Jens- Erik Mai, “Classification in Context: Relativity, Reality, and Representation,” Knowledge Organization 31, no. 1 (2004): 39; Jens-Erik Mai, “Actors, Domains, and Constraints in the Design and Construction of Controlled Vocabularies,” Knowledge Organization 35, no. 1 (2008): 16. 5. Xiaohua Hu et al., “Exploiting Wikipedia as External Knowledge for Document Clustering,” in Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Paris, France, 28 June–1 July 2009 (New York: ACM, 2009): 389. 6. Yannis Labrou and Tim Finin, “Yahoo! as an Ontology— Using Yahoo! Categories to Describe Documents,” in Proceedings of the Eighth International Conference on Information and Knowledge Management, Kansas City, MO, USA 1999 (New York: ACM, 1999): 180. 7. Kwan Yi, “Automated Text Classification using Library Classification Schemes: Trends, Issues, and Challenges,” International Cataloging & Bibliographic Control 36, no. 4 (2007): 78. 8. Xianpei Han and Jun Zhao, “Named Entity Disambiguation by Leveraging Wikipedia Semantic Knowledge,” in Proceeding of the 18th ACM Conference on Information and Knowledge Management, Hong Kong, China, 2–6 November 2009 (New York: ACM, 2009): 215. 9. David Carmel, Haggai Roitman, and Naama Zwerdling, “Enhancing Cluster Labeling using Wikipedia,” in Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, Boston, MA, USA (New York: ACM, 2009): 139. 10. Peter Schönhofen, “Identifying Document Topics using the Wikipedia Category Network,” in Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence, Hong Kong, China, 18–22 December 2006 (Los Alamitos, Calif.: IEEE Computer Society, 2007). 11. Hu et al., “Exploiting Wikipedia.” 12. David Milne, Olena Medelyan, and Ian H. Witten, “Mining Domain-Specific Thesauri from Wikipedia: A Case Study,” in Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence, 22–26 December 2006 (Washington, D.C.: IEEE Computer Society, 2006): 442. 13. Zeno Gantner and Lars Schmidt-Thieme, “Automatic Content-Based Categorization of Wikipedia Articles,” in Proceedings of the 2009 Workshop on the People’s Web Meets NLP, ACL-IJCNLP 2009, 7 August 2009, Suntec, Singapore (Morristown, N.J.: Association for Computational Linguistics, 2009): 32. 14. “Amazon.com Capitalized Phrases,” Amazon.com, http://www.amazon.com/gp/search-inside/capshelp.html/ ref=sib_caps_help (accessed Mar. 13, 2010). 15. For more on the epistemological and technical roles of categories in Wikipedia, see http://en.wikipedia.org/wiki/ Wikipedia:Categorization. 16. Two sources greatly helped the script-writing process: William Steinmetz, Wicked Cool PHP: Real-World Scripts that Solve Difficult Problems (San Francisco: No Starch, 2008); and the docu- mentation at http://php.net. 17. Not all books on Amazon.com have SIPs, and books that do may only have them for one edition, although many editions may be found separately on the site. There is not a readily appar- ent pattern determining which edition features SIPs. Therefore ■■ A special-case system for handling books and authors that have their own article pages on Wikipedia. In addition, a large-scale project might want to work from downloaded snapshots of Wikipedia (via http:// download.wikimedia.org/), which could be run on local hardware rather than burdening their servers, This would require using something other than Google for relevance ranking (there are many options), with a corresponding revision of the categorization script. ■■ Conclusions Even a simple system, quickly assembled from freely available parts, can have modest success in identifying book categories. Although my system is not ready for real-world applications, it demonstrates that an approach of this type has potential, especially for collections limited to certain genres. Given the staggering volume of docu- ments now being generated, automated classification is an important avenue to explore. I close with a philosophical point. Although I have characterized this work throughout as automated clas- sification, and it certainly feels automated to me when I use the script, it does in fact still rely on human judgment. Wikipedia’s category structure and its articles linking text to title concepts are wholly human-created. Even Google’s PageRank system for determining relevancy rests on human input, using web links to pages as votes for them (like a vast citation index) and the texts of these links as indicators of page content.25 My algorithm there- fore does not operate in lieu of human judgment. Rather, it lets me leverage human judgment in a dramatically more efficient, if also more problematic, fashion than traditional subject cataloging. With the volume of content spiraling ever further beyond our ability to individually catalog documents—even in bounded contexts like academic databases, which strongly benefit from such cataloging— we must use human judgment in high-leverage ways if we are to have a hope of applying subject cataloging everywhere it is expected. References and Notes 1. Carol Tenopir. “Online Databases—Online Scholarly Journals: How Many?” Library Journal (Feb. 1, 2004), http://www .libraryjournal.com/article/CA374956.html (accessed Mar. 13, 2010). 2. “Amazon.com Statistically Improbable Phrases,” Amazon. com, http://www.amazon.com/gp/search-inside/sipshelp .html/ref=sib_sip_help (accessed Mar. 13, 2010). 3. Hanne Albrechtsen. “Subject Analysis and Indexing: From Automated Indexing to Domain Analysis,” The Indexer, 18, no. 4 A siMPle scHeMe FOr BOOK clAssiFicAtiON usiNG WiKiPeDiA | YeltON 13 problematic Million Little Pieces to be autobiography, as it has that writing style, and as its LCSH treats it thus. 21. Han and Zhao, “Named Entity Disambiguation.” 22. Sudhir Venkatesh, Off the Books: The Underground Economy of the Urban Poor (Cambridge: Harvard Univ. Pr., 2006). 23. See Karen Coyle, “Machine Indexing,” The Journal of Academic Librarianship 34, no. 6 (2008): 530. She gives as examples PhraseRate (http://ivia.ucr.edu/projects/PhraseRate/), KEA (http://www.nzdl.org/Kea/), and Extractor (http://extractor. com/). 24. Per Han and Zhao, “Named Entity Disambiguation.” 25. Lawrence Page et al., “The PageRank Citation Ranking: Bringing Order to the Web,” Stanford InfoLab (1999), http:// ilpubs.stanford.edu:8090/422/ (accessed Mar. 13, 2010). This paper precedes the launch of Google; as the title indicates, the citation index is one of Google’s foundational ideas. this step cannot be automated. 18. Be aware that running automated queries without per- mission is an explicit violation of Google’s Terms of Service. SeeGoogle Webmaster Central, “Automated Queries,” http://www.google.com/support/webmasters/bin/answer .py?hl=en&answer=66357 (accessed Mar. 13, 2010). Before using this script, obtain an API key, which confers this permission. AJAX web search API keys can be instantly and freely obtained via http://code.google.com/apis/ajaxsearch/web.html. 19. “Hardcover Nonfiction,” New York Times, Oct. 9, 2005, http://www.nytimes.com/2005/10/09/books/bestseller /1009besthardnonfiction.html?_r=1 (accessed Mar. 13, 2010); “Paperback nonfiction,” New York Times, Oct. 9, 2005, http://www .nytimes.com/2005/10/09/books/bestseller/1009bestpapernon fiction.html?_r=1 (accessed Mar. 13, 2010). 20. For the purposes of this discussion I consider the Appendix. PHP Script for Automated Classification 4) { echo “I’m sorry; the number specified cannot be more than 4.”; die; } // Next, turn our comma-separated list into an array. 14 iNFOrMAtiON tecHNOlOGY AND liBrAries | MArcH 2011 $sip_temp = fopen($argv[1], ‘r’); $sip_list = ‘’; while (! feof($sip_temp)) { $sip_list .= fgets($sip_temp, 5000); } fclose($sip_temp); $sip_array = explode(‘, ‘, $sip_list); /* Here we access Google search results for our SIPs and CAPs. It is a violation of the Google Terms of Service to run automated queries without permission. Obtain an AJAX API key via http://code.google.com. */ $apikey = ‘your_key_goes_here’; foreach($sip_array as $query) { /* In multiword terms, change spaces to + so as not to break the google search. */ $query = str_replace( “ “, “+”,,” $query); $googresult = “http://ajax.googleapis.com/ajax/services/search/web?v=1.0&q=site%3Aen.wikipedia.org+$query&key=$apikey”; $googdata = file_get_contents($googresult); // pick out the URLs we want and put them into the array $links preg_match_all(‘|” url”:” [^” ]*”|i’,, $googdata, $links); /* Strip out some crud from the JSON syntax to get just URLs */ $links[0] = str_replace( “\” url\”:\” “, “”, $links[0]); $links[0] = str_replace(“\” “, “”, $links[0]); /* Here we step through the links in the page Google returned to us and find the top Wikipedia articles among the results */ $i=0; foreach($links[0] as $testlink) { /* These variables test to see if we have hit a Wikipedia special page instead of an article. There are many more flavors of special page, but these are the most likely to show up in the first few hits. */ $filetest = strpos($testlink, ‘wiki/File:’); $cattest = strpos($testlink, ‘wiki/Category:’); $usertest = strpos($testlink, ‘wiki/User’); $talktest = strpos($testlink, ‘wiki/Talk:’); $disambtest = strpos($testlink, ‘(disambiguation)’); $templatetest = strpos($testlink, ‘wiki/Template_’); if (!$filetest && !$cattest && !$usertest && !$talktest && !$disambtest && !$templatetest) { $wikipages[] = $testlink; $i++; } /* Once we’ve accumulated as many article pages as the user asked for, stop adding links to the $wikipages array. */ Appendix. PHP Script for Automated Classification (continued) A siMPle scHeMe FOr BOOK clAssiFicAtiON usiNG WiKiPeDiA | YeltON 15 if ($i == $argv[2]) { break; } //This closes the foreach loop which steps through $links } // This closes the foreach loop which steps through $sip_array } /* For each page that we identified in the above step, let’s find the categories it belongs to. */ $mastercatarray = array(); foreach ($wikipages as $targetpage) { // Scrape category information from the article page. $wikiscrape = file_get_contents($targetpage); preg_match_all(“|/wiki/Category.[^\” ]+|”,,” $wikiscrape, $categories); foreach ($categories[0] as $catstring) { /* Strip out the “wiki/Category:” at the beginning of each string */ $catstring = substr($catstring, 15); /* Keep count of how many times we’ve seen this category. */ if (array_key_exists($catstring, $mastercatarray)) { $mastercatarray[$catstring]++; } else { $mastercatarray[$catstring] =1; } } } // Sort by value: most popular categories first. arsort($mastercatarray); echo “The top categories are:\n”; print_r($mastercatarray); ?> Appendix. PHP Script for Automated Classification (continued) 3041 ---- 16 iNFOrMAtiON tecHNOlOGY AND liBrAries | MArcH 2011 The Internet Public Library (IPL): An Exploratory Case Study on User Perceptions environment. Digital and physical holdings, academic and public libraries, free and subscription resources, Internet encyclopedias, and a multitude of other offerings form a complex (and often overwhelming) information- seeking environment. To move forward effectively and to best serve its existing and potential users, the IPL must pursue a path that is adapted to the present state of the Internet and that is user-informed and user-driven. Recent large-scale studies, such as the 2005 OCLC reports on perceptions of libraries and information resources, have begun to explore user perceptions of librar- ies in the complex Internet environment.3 These studies emphasize the importance of user perceptions of library use, questioning whether libraries still matter in the rap- idly growing infosphere and what future use trends might be. In the Internet environment, user perceptions play a key role in use (or nonuse) of library resource and services as information-seekers are faced with myriad easily acces- sible electronic information sources. The IPL’s name, for example, may or may not be perceived as initially helpful to users’ information-seeking needs. Repeat use relates to such perceptions as well, in the amount of value users per- ceive in the library resources over the many other sources available. In beginning to explore such issues, there is a need for current research addressing user perceptions of an Internet public library: what the name implies to both existing and potential users as well as the associated func- tions and resources that should be offered. In this study, we present an exploratory case study on public perceptions of the IPL. Using qualitative analysis of interviews with ten college students, some of whom are current users of the IPL and others with no exposure to the IPL, begins to yield an understanding of the public perception of what an Internet public library should be. This study seeks to expand our understanding of such issues and explore the present-day requirements for the IPL in addressing the following research questions: ■■ What is the public perception of an Internet public library? ■■ What services and materials should an Internet pub- lic library offer? ■■ Background the iPl: Origins and research In 1995, Joe Janes, a professor at the University of Michigan’s School of Information and Library Studies, ran a graduate seminar in which a group of students created a web-based library intended to be a hybrid of both physical library services and Internet resources and offerings. The resulting IPL would take the best from both the physical and digital The Internet Public Library (IPL), now known as ipl2, was created in 1995 with the mission of serving the public by providing librarian-recommended Internet resources and reference help. We present an exploratory case study on public perceptions of an “Internet public library,” based on qualitative analysis of interviews with ten col- lege student participants: some current users and others unfamiliar with the IPL. The exploratory interviews revealed some confusion around the IPL’s name and the types of resources and services that would be offered. Participants made many positive comments about the IPL’s resource quality, credibility, and personal help. T he Internet Public Library (IPL), now known as ipl2, is an online-based public service organization and a learning and teaching environment originally developed by the University of Michigan’s School of Information and currently hosted by Drexel University’s iSchool. The IPL was created in 1995 as a project in a graduate seminar; a diverse group of students worked to create an online space that would be both a library and an Internet institution, helping librarians and the public identify useful Internet resources and content collections. With a strong mission to serve and educate a varied com- munity of users, the IPL sought to help the public navigate the increasingly complex Internet environment as well as advocate for the continuing relevance of librarians in a digital world. The resulting IPL provided online reference, content collections (such as ready reference and a full-text reading room), youth-oriented resources, and services for other librarians, all through its free, web-based presence.1 Currently, the IPL consists of a publicly accessible website with several large content collections (such as “POTUS: Presidents of the United States”), sections targeted toward teens and children (“TeenSpace” and “KidSpace”), and a question and answer service where users can e-mail ques- tions to be answered by volunteer librarians.2 There has been an enormous amount of change in the Internet and digital libraries since the IPL’s incep- tion in 1995. While web use statistics, user feedback, and incoming patron questions indicate that the IPL remains well-used and valued, there are many questions about its place in an increasingly information-rich online Monica Maceli (mgm36@drexel.edu) is a doctoral Student, susan Wiedenbeck (susan.wiedenbeck@drexel.edu) is a Pro- fessor, and eileen Abels (eabels@drexel.edu) is a Professor at the college of information Science and technology, drexel uni- versity, Philadelphia. Monica Maceli, Susan Wiedenbeck, and Eileen Abels tHe iNterNet PuBlic liBrArY (iPl) | MAceli, WieDeNBecK, AND ABels 17 there has also been a continuous evaluation of the role of the library in an increasingly digital world, a ques- tion Janes sought to address in his first imaginings of the IPL. A study conducted in 2005 claimed that “electronic information-seeking by the public, both adults and chil- dren, is now an everyday reality and large numbers of people have the expectation that they should be able to seek information solely in a virtual mode if they so choose.”12 This trend in electronic information-seeking has driven both public and academic libraries to create and support vast networks of licensed and free online information, directories, and guides. These electronic offerings, which (at least in theory) are desired and appre- ciated by users, are often overshadowed by the wealth of quickly accessible information from tools such as search engines.13 In competition with quickly accessible (though not necessarily credible or accurate) information sources, librarians have struggled to find their place and relevance in an evolving environment. Google and other web search engines often shape users’ experiences and expecta- tions with information-seeking, more so than any formal librarian-driven instruction such as in Boolean searching. Several recent comprehensive studies have explored user perceptions of libraries, both physical and digital, in relationship to the larger Internet. Abels explored the perspective of libraries and librarians across a broad popu- lation consisting of both library users and non-users.14 Her findings included the fact that web search engines were the starting point for the majority of information-seeking, and that there is a high preference among users for virtual ref- erence desk services. She proposed an information-seeking model in which the library serves as one of many Internet resources, including free websites and interpersonal sources, and is likely not the user’s first stop. In respect to this model of information-seeking, Abels suggests that “librarians need to accept the broader framework of the information seeker and develop services that integrate the library and the librarian into this framework.”15 In 2005, OCLC released what is possibly the most comprehensive study to date of the public’s perceptions of library and information resources as explored on a number of levels, including both the physical and digi- tal environments.16 Findings relevant to libraries on the Internet (and this study) included the following: ■■ 84 percent of participants reported beginning an information search from a search engine; only 1 per- cent started from a library website ■■ there was a preference for self-service and a tendency to not seek assistance from library staff ■■ users were not aware of most libraries’ electronic resources ■■ college students have the highest rate of library use ■■ users typically cross-reference other sites to validate their results worlds while developing its own unique offerings and features.4 Janes had conceived the idea in 1994, when the Internet’s continued growth began to make it clear that the role of libraries and librarians would be forever changed as a result. Janes’ motivating question was “what does librarianship have to say to the network environment and vice versa?”5 The IPL tackled a broad mission of enhanc- ing the value of the Internet by providing resources to its varied users, educating and strengthening that community, and (perhaps most unique at the time) communicating “its’ creators vision of the unique roles of library culture and traditions on the Internet.”6 Initial student brainstorming sessions yielded the priorities that the IPL would address and included such services as reference, community out- reach, and youth services. The first version of the IPL contained electronic versions of classic library offerings, such as magazines, texts, serials, newspapers, and an e-mail reference service. The IPL was well received and continued its development, adding and expanding resources to support specific communities such as teens and children. The IPL was awarded several grants over the next few years, allowing for expansion and continuation.7 A wealth of librarian volunteers, composed of students and staff, contributed to the IPL, in particular toward the e-mail reference services. With a stated goal of responding to patrons’ questions within one week, the reference services provide help and direct contact with the IPL’s user base, many of whom are students working on school assignments.8 The IPL’s collections are discover- able through search engines (popular offerings such as the “POTUS: Presidents of the United States” resources rank highly in search results lists) and through its presence on social networking sites such as Myspace, Facebook, and Twitter. Additionally, IPL distributes brochures to teachers and librarians at relevant conferences. The IPL has been the focus of many research studies covering a broad range of themes, such as its history and funding, digital reference and the IPL’s question-and- answer service, and its resources and collections.9 Also, in line with the original mission of the IPL, Janes developed The Internet Public Library Handbook to share best prac- tices with other librarians.10 The majority of publications, however, have focused on IPL’s reference service, which is uniquely placed as a librarian-staffed volunteer digital reference service. As the IPL has collected and retained all reference interactions since its inception in 1995, there is a wealth of data readily available to such studies and exploratory work into how best to analyze it.11 user Perceptions of Digital libraries The Internet is a vastly different world than it was in the early days of the IPL’s creation. The expectations of library patrons, both in digital and in physical environ- ments, have changed as well. And as the Internet evolves 18 iNFOrMAtiON tecHNOlOGY AND liBrAries | MArcH 2011 of the public, which is the intention of this study. ■■ Method This exploratory study consisted of a qualitative analysis of data gathered from interviews and observations of ten college student participants who were academic library users and nonusers of the IPL. A pilot study preceded the final research effort and allowed us to iteratively tailor the study design to best pursue our research questions. Our initial study design incorporated a usability test portion, in which users were presented with a series of informa- tion-seeking needs and instructed to use the IPL’s website to answer the questions. However, we later dropped this portion of the study because pilot results found that it contributed little to answering our research questions about public perceptions; it largely explored implementa- tion details, which was not the focus of this study. Following the pilot study, we recruited ten Drexel University students from the university’s W. W. Hagerty Library. This ensured recruiting participants who were at least minimally familiar with physical libraries and who were from a variety of academic focuses. The participant group included eight females and two males—two were graduate students, eight were undergraduates—from a variety of majors, including biology, biomedical engineer- ing, business, library science, accounting, international studies, and information systems. Participants took an average of twenty-six minutes to complete the study. The study consisted of a short interview to assess the user’s experience with public libraries (both physical and online) and their expectations of an Internet public library. These open-ended questions (included in the appendix) sought to determine what features, services, or content were desired or expected by users, whether the term of “Internet public library” was meaningful, if there were similarities to web-based systems that the participants were already familiar with, or if they had previously used a website they would consider an Internet public library. All interviews were audio recorded and transcribed. An initial coding scheme was established and iteratively developed (table 1). Once we observed significant overlap between par- ticipant responses, the study then proceeded to the final analysis and presentation, using inductive qualitative analysis to code text and identify themes from the data.22 ■■ Findings All participants were current or former public library patrons; six participants (P1, P4, P5, P6, P8, and P9) were A portion of the study focused on library identity or brand in the mind of the public; participants found the library brand to be “books,” with no other terms or con- cepts coming close. As a companion report to this study, OCLC released a report focused on the library percep- tions of college students.17 As our study uses a college student participant base, OCLC’s findings are highly relevant. The vast majority of college students reported using search engines as a starting point for information- seeking and expressed a strong desire for self-service library resources. As compared to the general popula- tion, however, college students have the highest rate and broadest use of both physical and digital library resources and a corresponding high awareness of these services. The relationship between public libraries and the Internet was explored in depth in a 2002 study by D’Elia et al.18 The study sought to systematically inves- tigate patrons’ use of the Internet and of public libraries. Findings included the fact that the Internet and public libraries are often complementary; that more than half of Internet users were library users and vice versa; and that libraries are valued more than the Internet for providing accurate information, privacy, and child-oriented spaces and services. Participants made a distinction between the service characteristics of the public library versus those of the Internet. Many of the most-valued characteristics of the Internet (such as information that is always available when needed) were not supported by physical libraries because of limited offerings and hours. In addition to large, comprehensive surveys, there have been several case-study approaches, exploring user perceptions of a particular digital library or library fea- ture. Tammaro researched user perceptions of an Italian digital library, finding the catalog, online databases, and electronic journals to be most valued; she found speed of access, remote access, a larger number of resources, and personalization to be key digital library services.19 This study also reported a consistent theme in digital library lit- erature: a patron base primarily consisting of novice users who do not know how to use the library and are unaware of the various services offered. Crowley et al. evaluated an existing academic library’s webpages for issues and user needs.20 They identified issues with navigational structures and overly technical terminology and a general need for robust help and extensive research portals. In respect to our study, we found no literature that studied perceptions of Internet public libraries. As men- tioned earlier, research that addressed the IPL from the perspective of its patrons largely focused on IPL’s reference services. In 2008, IPL staff reported 13,857 reference questions received and 9,794,292 website visi- tors.21 Although reference is clearly a vital and well-used service, there is also a great deal of website collection use that must be researched. Recent literature does not address the current state of the IPL from the perspective tHe iNterNet PuBlic liBrArY (iPl) | MAceli, WieDeNBecK, AND ABels 19 of such a library. A few remained confused about how such a concept would relate to physical public libraries and the Internet in general. One participant assumed that such a term must mean the web presence of a particular physical public library. Another’s immediate reaction was to question the value of such a venture in light of existing Internet resources: “I mean, the Internet is already useful, so I don’t know [how useful it would be]” (P2). Two other participants found meaning in the term by associating it with a known library website, such as that of their academic library or local physical public library. When asked what websites seem similar in function or appearance to what they would consider an Internet public library, responses varied. While most participants could not name any similar website or service, one mentioned several academic library websites that he was famil- iar with, another described several bookseller websites (Amazon.com, Half.com, and AbeBooks.com), and a third mentioned Wikipedia (but then immediately retracted the statement, after deciding that Wikipedia was not a library). theme 2: Quick and easy, but still credible Participants were highly enthusiastic about the perceived benefits in access to and credibility of information from an Internet public library. Ease of use and faster informa- tion access, often from home, were key motivators for use of Internet-based libraries, both public and academic. As described earlier, there is a wealth of competing informa- tion options freely available on the Internet. Given this, participants felt that an Internet public library would offer the most value because of its credible information: I like the ready reference [almanacs, encyclopedias]. . . . I’m not used to using any of these, Wikipedia is just so ready and user friendly. It’s so easy to go to Wikipedia but it’s not necessarily credible. . . . Whereas I feel like this is definitely credible. It’s something I could use if I needed to in some sort of academic setting. (P10) theme 3: lack of Differentiation between Public and Academic; Physical and Digital libraries For many participants, there was confusion about what was or was not a public library, and they initially con- sidered their academic library in that category. Overall, participants did not think of public and academic libraries (physical or on the Internet) as distinctly different; rather they were more likely to be associated with phase of life. Participants that were not current public library users reported using public libraries frequently during their years of elementary education. For participants that were current public library users, physical public libraries (and other local academic libraries) were used to fill in the gaps current public library users, and four (P2, P3, P7, and P10) had used public libraries in the past but were no longer using their services. Two participants were graduate stu- dents (P3 and P9) with the remainder undergraduates, and two of the ten students had used the IPL website before (P3 and P6). The participants could be characterized as rela- tively infrequent public library users with a strong interest in the physical book holdings of the public library, primar- ily for leisure but frequently for research as well. Several participants mentioned scholarly databases that were pro- vided by their public library (typically from within the library or online with access using a public library card). There was also interest in leisure audiovisual offerings and in using the library as a destination for leisure. The following themes illustrate our main findings with respect to our research questions. As described above, we conceptualized our raw data into broad themes through an iterative process of inductive coding and analysis. Although multiple themes emerged as associ- ated with each of our research questions, we present only the most important and relevant themes (see table 2). All themes were supported by responses from multiple par- ticipants. We will further elaborate the themes discovered later in this section; a selected relevant and meaningful participant quote illustrates each theme. theme 1: confusion about Name “Internet public library” was not an immediately clear term to four of the participants; the six other participants were able to immediately begin describing their concept Table 1. Inductive Coding Scheme Developed from Raw Transcript Text, Used to Identify Key Themes Coding Scheme Physical public libraries Tied to life phase Confusion between academic and public Current use Frequency of use Perceptions of an Internet public library Access Properties of physical libraries Reference Resources Tools Users General Internet use Academic library use Similar sites to IPL 20 iNFOrMAtiON tecHNOlOGY AND liBrAries | MArcH 2011 would contain both electronic online items and locally available items in physical formats. In particular, connec- tions to local physical libraries to share item holdings and availability status were desired: “General book informa- tion and maybe a list of where books can be found. Like online, the local place you can find the books.” (P7) Given that information-seeking, for this group, was conducted indiscriminately across physical and digital libraries, this integrated view into local physical resources seems to be a natural request. theme 6: Personal and Personalized Help Although no participants claimed that reference was a service that they typically use during their physical pub- lic library experiences, it was a strong expectation for an Internet public library and mentioned by nearly every participant. When questioned as to how this reference interaction should take place, there was a clear prefer- ence for communicating via instant message: “Reference information. . . . You know, where you have real people. A place where you can ask questions. . . . If you think you can get an answer at a library, then online you would hope to get the same things.” (P1) In addition to being able to interact with a “real” librar- ian, participants desired other personalized elements, such as resources and services dedicated to information needy populations (like children) as well as resources supporting the community and personal lifestyle issues and topics (like health and money). ■■ Discussion In summary, we characterized the participants in this case study as low-frequency physical public library users with a high association between life phase (high school or grade school) and public library use. Participants looked to public libraries to provide physical books—primar- ily for leisure but often for research use as well—leisure DVDs and CDs, scholarly databases, and a space to “hang for items that could not be located at their school’s aca- demic library, either through physical or digital offerings. Consistent with this finding, a few participants reported conducting searches across both local academic and public libraries in pursuit of a particular item. There was a gen- eral disregard for where the item came from, as long as it could be acquired with relatively little effort from physi- cally close local or online resources. However, participants reported typically starting with their academic libraries for school resources and the public libraries for leisure materi- als “I go to the Philadelphia public library probably once a month or so usually for DVDs but sometimes for books that I can’t find here [academic library]. . . . I usually check here first because it’s closer.” (P5) theme 4: electronic resources, catalog, and searching tools are Key There were many participant comments, and some con- fusion, around what type of resources an Internet public library would provide, as well as whether they would be free or not (one participant assumed there would be a fee to read online). The desired resources (in order of impor- tance) included leisure and research e-books, scholarly databases, online magazines and newspapers, and DVDs and CDs (pointers to where those physical items could be found in local libraries). A few comments were negative, assuming the resources provided would only be elec- tronic, but participants were mostly enthusiastic about the types and breadth of resources that such a website would offer. For example, one participant commented, “I think you could get more resources. . . . The library I usu- ally visit is kind of small so it’s very limited in the range of information you can find.” (P4) Many participants emphasized the importance of providing robust, yet easy-to-use, search tools in man- aging complex information spaces and conveying item availability. theme 5: connections to Physical libraries Several participants assumed that the resource collection Table 2. Themes Identified Research Question Themes Identified What is the public perception of an Internet public library? Confusion about name Quick and easy, but still credible Lack of differentiation between public and academic; physical and digital libraries What services and materials would such a website offer? Electronic resources, catalog, and searching tools are key Connections to physical libraries Personal and personalized help tHe iNterNet PuBlic liBrArY (iPl) | MAceli, WieDeNBecK, AND ABels 21 infosphere—their services and collections both physi- cal and virtual.25 This is, like many issues in library systems design, a complex challenge. As previous research has shown, extending the metaphor of the physical library into the digital environment does not always assist users, espe- cially when they may be more likely to draw on previous experiences with other Internet resources.26 The original prospectus for the Internet Public Library, as developed by Joe Janes, acknowledges the different capabilities of physical libraries and libraries on the Internet, claiming that the IPL would “be a true hybrid, taking the best from both worlds but also evolving its own features.”27 If users anticipate an experience similar to the Internet resources they typically use (such as search engines), then the IPL may best serve its users by moving closer to “Internet” than “library.” However, such a choice may entail unfore- seen tradeoffs. Several participants in this study mused over what physical public library characteristics would carry over to a digital public library and the potential tradeoffs: “You wouldn’t have to leave your home but at the same time I think it’s easier to wander the library and just see things that catch your eye. And I like the quiet setting of the library too.” (P8) Another participant mentioned the distinctly positive public library experience, and how such an experience should be reflected in an Internet-based public library: “I think that public libraries have a very positive reputation within communities. And I don’t think it would be bad for an Internet public library to move toward that expec- tation that people have.” (P3) The question remains, then, whether the IPL can com- pete with a multitude of other Internet resources without losing the familiar and positive essence of a traditional physical public library. Or rather, how can the IPL find a way to translate that essence to a digital environment without sacrificing performance and user expectations of Internet services? ■■ Conclusion During this study, participants described an Internet public library that, in many ways, takes the best features of several currently existing and popular websites. An Internet public library should contain all the information of Wikipedia, yet be as credible as information received directly from your local librarian. It should search across both websites and physical holdings, like AbeBooks.com or a search aggregator. It should search as powerfully and as easily as Google, yet return fewer, more targeted results. And it should provide real-time help immediately and conveniently, all from the comfort of your home. out” or occupy leisure time. For the participants, an Internet public library (an occasionally confusing term) described a service you could access from home, which included electronic books, information about locally available physical books, scholarly databases, reference or help services, and robust search tools. It must be easy to use and tailored to needy community populations such as children and teens. For several participants it would be similar to existing bookseller websites (such as Amazon. com or AbeBooks.com) or academic library websites. In exploring how these findings can inform the future design and direction of the IPL, it is again necessary to reflect on the values and concepts that inspired the original creation of the IPL. The initial choice of the IPL’s name was intended to reflect a novel system at the time, as Joe Janes detailed in the IPL prospectus: “I would view each of those three words as equally important in conveying the intent of this project: Internet, Public, and Library. I think the combination of the three of them produces something quite different than any pair or individual might suggest.”23 All three of these concepts—Internet, public, and library—have evolved with the changing nature of the Internet. And, as the research explored would indicate, there may not be a distinct boundary between these concepts from the perspective of users. Our finding that participants seek information by indiscriminately crossing public and academic libraries, as well as digital and physical resource formats verifies earlier research efforts.24 As the amount of information accessible on the Internet has expanded, the boundary of the library can be seen as either expanding (providing credible indexing, pointers, and information about useful resources from all over the Internet), contracting (primarily providing access to select resources that must be accessed through subscription), or existing somewhere in between, depend- ing on the perspective. In any of these cases, it is vital that the IPL present its resources, services, and offerings such that its value and contribution to information-seeking is highlighted and clear to users. Amorphously placed in a complex world of digital and physical information, the IPL must work toward creating a strong image of its offering and mission; an image that is transparent to its users, starting with its name. This challenge is not the IPL’s alone, but rather that of all Internet library portals, resources, and services. The 2005 OCLC report on perceptions of librar- ies expressed the importance of a strengthened image for Internet libraries: Libraries will continue to share an expanding infos- phere with an increasing number of content produc- ers, providers and consumers. Information consumers will continue to self-serve from a growing informa- tion smorgasbord. The challenge for libraries is to clearly define and market their relevant place in that 22 iNFOrMAtiON tecHNOlOGY AND liBrAries | MArcH 2011 Library,” Journal of Electronic Publishing 3, no. 2 (1997). 8. David S. Carter and Joseph Janes, “Unobtrusive Data Analysis of Digital Reference Questions and Service at the Internet Public Library: An Exploratory Study,” Library Trends 49, no. 2 (2000): 251–65. 9. On the IPL’s history and funding, see Barbara Hegenbart, “The Economics of the Internet Public Library,” Library Hi Tech 16, no. 2 (1998): 69–83; Joseph Janes, “Serving the Internet Public: The Internet Public Library,” Electronic Library 14, no. 2 (1996): 122–26; and Carter and Janes, “Unobtrusive Data Analysis,” 251–65. On digital reference and IPL’s question-and- answer service, see Kenneth R. Irwin, “Professional Reference Service at the Internet Public Library With ‘Freebie’ Librarians,” Searcher—The Magazine for Database Professionals 6, no. 9 (1998): 21–23; Nettie Lagace and Michael McClennen, “Questions and Quirks: Managing an Internet-Based Distributed Reference Service,” Computers in Libraries 18, no. 2 (1998): 24–27; Sara Ryan, “Reference Service for the Internet Community: A Case Study of the Internet Public Library Reference Division,” Library & Information Science Research 18, no. 3 (1996): 241–59; and Elizabeth Shaw, “Real Time Reference in a Moo: Promise and Problems,” Internet Public Library, http://www.ipl.org/div/iplhist/moo .html (accessed Dec. 4, 2008). On the IPL’s resources and collec- tions, see Thomas Pack, “A Guided Tour of the Internet Public Library—Cyberspace’s Unofficial Library Offers Outstanding Collections of Internet Resources,” Database 19, no. 5 (1996): 52–56. 10. Joseph Janes, The Internet Public Library Handbook (New York: Neal Schuman, 1999). 11. Carter and Janes, “Unobtrusive Data Analysis,” 251–65. 12. Gloria J. Leckie and Lisa M. Given, “Understanding Information-Seeking: The Public Library Context,” Advances in Librarianship 29 (2005): 1–72. 13. James Rettig, “Reference Service: From Certainty to Uncertainty,” Advances in Librarianship 30 (2006): 105–43. 14. Eileen Abels, “Information Seekers’ Perspectives of Libraries and Librarians,” Advances in Librarianship 28 (2004): 151–70. 15. Ibid., 168. 16. Cathy De Rosa et al., “Perceptions of Libraries.” 17. Cathy De Rosa et al., “College Students’ Perceptions of Libraries.” 18. George D’Elia et al., “The Impact of the Internet on Public Library Use: An Analysis of the Current Consumer Market for Library and Internet Services,” Journal of the American Society for Information Science & Technology 53, no. 10 (2002): 802–20. 19. Anna Maria Tammaro, “User Perceptions of Digital Libraries: A Case Study in Italy,” Performance Measurement & Metrics 9, no. 2 (2008): 130–37. 20. Gwyneth H. Crowley et al., “User Perceptions of the Library’s Web Pages: A Focus Group Study at Texas A&M University,” The Journal of Academic Librarianship 28, no. 4 (2002): 205–10. 21. Adam Feldman, e-mail to author, Apr. 3, 2009; Mark Galloway, e-mail to author, Apr. 3, 2009. 22. For information on inductive qualitative analysis, see David R. Thomas. “A General Inductive Approach for Analyzing Qualitative Evaluation Data” American Journal of Evaluation 27, no. 2 (2006): 237–46; Michael Quinn Patton, Qualitative Research and Evaluation Methods (Thousand Oaks, Calif.: Sage, 2002); These are clearly complex, far-reaching, and labor-inten- sive requirements. And many of these requirements are currently difficult and unresolved challenges to digital libraries in general, not simply the IPL. This preliminary study is limited in its college student participant base and small sample size, which may not reflect perspectives of the greater community of IPL users. These results therefore may not be generalizable to other populations who are current or potential users of the IPL, including other targeted groups such as children and teens. Additionally, our chosen participant group, college students who are physical library users, had relatively high levels of library and technology experience, as well as complex expectations. Our results would likely differ with a participant group of novice Internet users. As detailed above, this study explores public percep- tions of an Internet public library—an important aspect of the IPL that is not well studied and that has implications on IPL use and repeat use. While the IPL was carefully and thoughtfully constructed by a dedicated group of librarians, students, and educators, there has not been a recent study devoted to understanding what an Internet public library should be today. More recently, in January 2010, the IPL merged with the Librarians’ Internet Index to form ipl2. The two collections were merged and the web- site was redesigned. Although this merger was because of circumstances unrelated to our research, our findings were leveraged during the redesign (for example, in nam- ing the collections). In the future, our findings can be used in further ipl2 design iterations or explored in subsequent research studies in the specific context of ipl2 or of digital libraries in general. As discussed above, this study may be extended to different participant populations and to existing but remote ipl2 users. This study may also be continued in a more design-oriented direction to explore the usability and user acceptance of ipl2’s website. References 1. Joseph Janes, “The Internet Public Library: An Intellectual History,” Library Hi Tech 16, no. 2 (1998): 55–68. 2. “About the Internet Public Library,” Internet Public Library, http://ipl.org/div/about/ (accessed Feb. 17, 2009). 3. Cathy De Rosa et al., “Perceptions of Libraries and Information Resources,” OCLC Online Computer Library Center, 2005, http://www.oclc.org/reports/pdfs/Percept_all .pdf (accessed Mar. 9, 2009); Cathy De Rosa et al., “College Students’ Perceptions of Libraries and Information Resources,” OCLC Online Computer Library Center, 2005, http://www .oclc.org/reports/pdfs/studentperceptions.pdf (accessed Mar. 9, 2009). 4. Janes, “The Internet Public Library,” 55. 5. Ibid., 56. 6. Ibid., 57. 7. Lorrie LeJeune, “Before Its Time: The Internet Public tHe iNterNet PuBlic liBrArY (iPl) | MAceli, WieDeNBecK, AND ABels 23 American Society for Information Science & Technology 58, no. 3 (2007): 433–45. 25. De Rosa et al., “College Students’ Perceptions of Libraries,” 146. 26. Makri et al., “A Library or Just Another Information Resource?” 434. 27. Joseph Janes, “The Internet Public Library,” 56. and Matthew B. Miles and Michael Huberman, Qualitative Data Analysis: An Expanded Sourcebook, 2nd ed. (Thousand Oaks, Calif.: Sage, 1994). 23. Janes, “The Internet Public Library,” 56. 24. For example, Stephann Makri et al., “A Library or Just Another Information Resource? A Case Study of Users’ Mental Models of Traditional and Digital Libraries,” Journal of the Appendix. Interview Protocol ■■ Have you ever visited a public library? ■❏ If so, how often do you visit and why? ■❏ What services do you typically use? ■❏ Can you describe your last visit and what you were looking for? ■❏ What do you think an Internet public library would be? ■■ What sort of services would it offer? ■■ What else should it do? ■■ Have you ever visited an Internet public library? 3042 ---- 24 iNFOrMAtiON tecHNOlOGY AND liBrAries | MArcH 2011 Ruben Tous, Manel Guerrero, and Jaime Delgado Semantic Web for Reliable Citation Analysis in Scholarly Publishing Nevertheless, current practices in citation analysis entail serious problems, including security flaws related to the publishing process (e.g., repudiation, imperson- ation, and privacy of paper contents) and defects related to citation analysis, such as the following: ■■ Nonidentical paper instances confusion ■■ Author naming conflicts ■■ Lack of machine-readable citation metadata ■■ Fake citing papers ■■ Impossibility for authors to control their related cita- tion data ■■ Impossibility for citation-analysis systems to verify the provenance and trust of citation data, both in the short and long term Besides the fact that they do not provide any security feature, the main shortcoming of current citation-analysis systems such as ISI Citation Index, Citeseer (http:// citeseer.ist.psu.edu/), and Google Scholar is the fact that they count multiple copies or versions of the same paper as many papers. In addition, they distribute citations of a paper between a number of copies or versions, thus decreasing the visibility of the specific work. Moreover, their use of different analysis databases leads to very different results because of differences in their indexing policies and in their collected papers.3 To remedy all these imperfections, this paper proposes a reference architecture for reliable citation analysis based on applying semantic trust mechanisms. It is important to note that a complete or partial adoption of the ideas defended in this paper will imply the effort to introduce changes within the publishing lifecycle. We believe that these changes are justified considering the serious flaws of the established solutions, and the relevance that cita- tion-analysis systems are acquiring in our society. ■■ Reference Architecture We have designed a reference architecture that aims to provide reliability to the citation and citation-tracking lifecycle. This architecture is based in the use of digitally signed semantic metadata in the different stages of the scholarly publishing workflow. As a trust scheme, we have chosen a public key infrastructure (PKI), in which certificates are signed by certification authorities belong- ing to one or more hierarchical certification chains.4 trust scheme The goal of the architecture is to allow citation-analysis systems to verify the provenance and trust of machine- readable metadata about citations before incorporating Analysis of the impact of scholarly artifacts is constrained by current unreliable practices in cross-referencing, cita- tion discovering, and citation indexing and analysis, which have not kept pace with the technological advances that are occurring in several areas like knowledge man- agement and security. Because citation analysis has become the primary component in scholarly impact fac- tor calculation, and considering the relevance of this metric within both the scholarly publishing value chain and (especially important) the professional curriculum evaluation of scholarly professionals, we defend that cur- rent practices need to be revised. This paper describes a reference architecture that aims to provide openness and reliability to the citation-tracking lifecycle. The solution relies on the use of digitally signed semantic metadata in the different stages of the scholarly publishing workflow in such a manner that authors, publishers, repositories, and citation-analysis systems will have access to inde- pendent reliable evidences that are resistant to forgery, impersonation, and repudiation. As far as we know, this is the first paper to combine Semantic Web technologies and public-key cryptography to achieve reliable citation analysis in scholarly publishing. I n recent years, the amount of scholarly communica- tion brought into the digital realm has exponentially increased.1 This no-way-back process is fostering the exploitation of large-scale digitized scholarly repositories for analysis tasks, especially those related to impact factor calculation. The potential automation of the contribution– relevance calculation of scholarly artifacts and scholarly professionals has attracted the interest of several parties within the scholarly environment, and even outside of it. For example, one can find within articles of the Spanish law related to the scholarly personnel certification the requirement that the papers appearing in the curricula of candidates should appear in the Subject Category Listing of the Journal Citation Reports of the Science Citation Index.2 This example shows the growing relevance of these systems today. ruben tous (rtous@ac.upc.edu) is associate Professor, Manuel Guerrero (guerrero@ac.upc.edu) is associate Professor, and Jaime Delgado (jaime.delgado@ac.upc.edu) is Professor, all in the departament d’arquitectura de computadors, universitat Politècnica de catalunya, Barcelona, Spain. seMANtic WeB FOr reliABle citAtiON ANAlYsis iN scHOlArlY PuBlisHiNG | tOus, GuerrerO, AND DelGADO 25 might send a signed notification of rejection. We feel that the notification of acceptance is necessary because in a certain kind of curriculum, evaluations for university professors conditionally accepted papers can be counted, and in other curriculums not. The camera-ready version will be signed by all the authors of the paper, not only the corresponding author like in the paper submission. After the camera-ready version of the paper has been accepted, the journal will send a signed notification of future publication. This notification will include the date of acceptance and an estimate date of publication. Finally, once the paper has been published, the journal will send a signed notification of publication to the author. The rea- son for having both notification of future publication and notification of publication is that, again, some curriculum evaluations might be flexible enough to count papers that have been accepted for future publication, while stricter ones state explicitly that they only accept published papers. Once this process has been completed, a citation- analysis system will only need to import the authors’ CA certificates (that is, the certificates of the universities, research centers, and companies) and the publishers’ CA certificates (like ACM, IEEE, Springer, LITA, etc.) to be able to verify all the signed information. A chain of CAs will be possible both with authors (for example, univer- sity, department, and research line) and with publications (for example, publisher and journal). ■■ Universal Resource Identifiers To ensure that authors’ URIs are unique, they will have a tree structure similar to what URLs have. The first level element of the URI will be the authors’s organization (be it a university or a research center) ID. This organiza- tion id will be composed by the country code top-level domain (ccTLD) and the organization name, separated by an underscore.5 The citation-analysis system will be responsible for assigning these identifiers and ensuring that all organizations have different identifiers. Then, in the same manner, each organization will assign second-level elements (similar to departments) and so forth. Author’s CA_Id: _ Example: es_upc Author ’s URI: author:/// . . . /. Example: author://es_upc.dac/ruben.tous (In this example “es” is the ccTDL for Spain, UPC (Universitat Politècnica de Catalunya) is the uni- versity, and DAC (Departament d’Arquitectura de Computadors) is the department. them into their repositories. As a collateral effect, authors and publishers also will be able to store evidences (in the form of digitally signed metadata graphs) that demonstrate different facts related to the creating–edit- ing–publishing process (e.g., paper submission, paper acceptance, and paper publication). To achieve these goals, our reference architecture requires each metadata graph carrying information about events to be digitally signed by the proper subject. Because our approach is based in a PKI trust scheme, each signing subject (author or publisher) will need a public key certificate (or identity certificate), which is an electronic document that incor- porates a digital signature to bind a public key with an identity. All the certificates used in the architecture will include the public key information of the subject, a valid- ity period, the URL of a revocation center, and the digital signature of the certificate produced by the certificate issuer’s private key. Each author will have a certificate that will include as a subject-unique identifier the author ’s Universal Resource Identifier (URI), which we explain in the next section, along with the author ’s current information (such as name, e-mail, affiliation, and address) and pre- vious information (list of former names, e-mails, and addresses), and a timestamp indicating when the certifi- cate was generated. The certification authority (CA) of the author’s certificate will be the university, research center, or company with which the author is affiliated. The CA will manage changes in name, e-mail, and address by generating a new certificate in which the former certifi- cate will move to the list of former information. Changes in affiliation will be managed by the new CA, which will generate a new certificate with the current informa- tion. Since the new certificate will have a new URI, the CA also will generate a signed link to the previous URI. Therefore the citation-analysis system will be able to recognize the contributions signed with both certificates as contributions made by the same author. It will be the responsibility of the new CA to verify that the author was indeed affiliated to the former organization (which we consider a very feasible requirement). Every time an author (or group of authors) submits a paper to a conference, workshop, or journal, the cor- responding author will digitally sign a metadata graph describing the paper submission event. Although the paper submission will only be signed by the correspond- ing author, it will include the URIs of all the authors. Journals (and also conferences and workshops) will have a certificate that contains their related informa- tion. Their CA will be the organization or editorial board behind them (for instance, ACM, IEEE, Springer, LITA, etc.). If a paper is accepted, the journal will send a signed notification of acceptance, which will include the reviews, the comments from the editor, and the conditions for the paper to be accepted. If the paper is rejected, the journal 26 iNFOrMAtiON tecHNOlOGY AND liBrAries | MArcH 2011 ■■ Microsoft’s Conference Management Toolkit (CMT; http://cmt.research.microsoft.com) is a confer- ence management service sponsored by Microsoft Research. It uses HTTPS to provide confidentiality, but it is a service for which you have to pay. Although some of the web-based systems provide confidentiality through HTTPS, none of them provides nonrepudiation, which we feel is even more important. This is so because nonrepudiation allows authors to cer- tify their publications to their curriculum evaluators. Our proposed scheme always provides nonrepu- diation because of its use of signatures. Curriculum evaluators don’t need to search for the publisher’s web- site to find the evaluated author’s paper. In addition, our proposed scheme allows curriculum evaluations to be performed by computer programs. And confidentiality can easily be achieved by encrypting the messages with the public key of the destination of the message. It should not be difficult for authors to obtain the public key for the conference or journal (which could be included in its “call for papers” or included on its webpage). And, because the paper-submission message includes the author’s public key, notifications of acceptance, rejection, and publication can be encrypted with that key. ■■ Modeling the Scholarly Communication Process Citation analysis systems operate over metadata about the scholarly communication process. Currently, these metadata are usually automatically generated by the citation-analysis systems themselves, generally through a programmatic analysis of the scholarly artifacts unstruc- tured textual contents. These techniques have several drawbacks, as enumerated already, but especially regard- ing the fact that there is metadata that cannot be inferred from the contents of a paper, like all the aspects of the publishing process. To allow citation-analysis systems accessing metadata about the entire scholarly artifacts lifecycle, we suggest a metadata model that captures a great part of the scholarly domain static and dynamic semantics. This model is based on knowledge represen- tation techniques in Semantic Web, such as Resource Description Framework (RDF) graphs and Web Ontology Language (OWL) ontologies. Metadata and rDF The term “metadata” typically refers to a certain data representation that describes the characteristics of an information-bearing entity (generally another data repre- sentation such as a physical book or a digital video file). Metadata plays a privileged role in the scholarly Creations’ URIs are built in a similar manner to authors’ URIs. But it this case, the use of the country code as part of the publisher’s ID is optional. Because a creation and its metadata evolve through different stages (submission and camera-ready), we will use different URIs for each phase. We propose the use of this kind of URI instead of other possible schemes such as the Digital Object Identifier (DOI), because the ones proposed in this paper has the advantage of being human readable and contain the CAs chain.6 Of course, that doesn’t mean that once published a paper cannot obtain a DOI or another kind of identifier. Publisher’s CA_Id: or _ Examples: lita and it_ItalianJournalOfZoology Creation’s URI: creation:// . . . / Example: creation://lita.ital/vol27_num1_ paper124 confidentiality and Nonrepudiation Nowadays, some conferences manage their paper sub- missions and notifications of acceptance (with their corresponding reviews) through e-mail, while others use a web-based application, such as EDAS (http://edas.info/). The e-mail-based system has no means of providing any kind of confidentiality. Each router through which the e-mail travel can see their contents (paper submissions and paper reviews). The web-based system can provide confidentiality through HTTP Secure (HTTPS), although some of the most popular applications (such as EDAS and MyReview) do not provide it; their developers may not have thought that it was an important feature. The following is a short list of some of the existing web-based systems: ■■ EDAS (http://edas.info/) is probably the most popular sytem. It can manage a large number of conferences and special issues of journals. It does not provide confidentiality. ■■ MyReview (http://myreview.intellagence.eu/index .php) is an open-source web application distributed under the GPL License for managing the paper submissions and paper reviews of a conference or journal. MyReview is implemented with PHP and MySQL. It does not provide confidentiality. ■■ ConfTool (http://www.conftool.net) is another web-based management system for conferences and workshops. A free license of the standard version is available for noncommercial conferences and events with fewer than 150 participants. It uses HTTPS to provide confidentiality. seMANtic WeB FOr reliABle citAtiON ANAlYsis iN scHOlArlY PuBlisHiNG | tOus, GuerrerO, AND DelGADO 27 the purpose of the reference architecture described in this paper, we do not instruct which of the two described approaches for signing RDF graphs is to be used. The decision will depend on the implementation (i.e., on how the graphs will be interchanged and processed). OWl and an Ontology for the scholarly context To allow modeling the scholarly communication process with RDF graphs, we have designed an OWL Description Logic (DL) ontology. OWL is a vocabulary for describing properties and classes of RDF resources, complementing RDFS’s capabilities for providing semantics for general- ization hierarchies of such properties and classes. OWL enriches the RDFS vocabulary by adding, among others, relations between classes (e.g., disjointness), cardinality (e.g., “exactly one”), equality, richer typing of properties, characteristics of properties (e.g., symmetry), and enu- merated classes. OWL has the influence of more than ten years of DL research. This knowledge allowed the set of constructors and axioms supported by OWL to be care- fully chosen so as to balance the expressive requirements of typical applications with a requirement for reliable and efficient reasoning support. A suitable balance between these computational requirements and the expressive requirements was achieved by basing the design of OWL on the SH family of Description Logics.10 The language has three increasingly expressive sublanguages designed for different uses: OWL Lite, OWL DL, and OWL Full. We have chosen OWL DL to define the ontology for capturing the static and dynamic semantics of the scholarly communication process. With respect to the other versions of OWL, OWL DL offers the most expressiveness while retaining computational completeness (all conclusions are guaranteed to be computable) and decidability (all com- putations will finish in finite time). OWL DL is so named because of its correspondence with description logics. Figure 3 shows a simplified graphical view of the OWL ontology we have defined for capturing static and dynamic semantics of the scholarly communication process. Figure 4, figure 5, and figure 6 offer a (partial) tabu- lar representation of the main classes and properties of the ontology. In OWL, properties are independent from classes, but we have chosen to depict them in an object-oriented manner to improve understanding. For the same reason we have represented some properties as arrows between classes, despite this information being already present in the tables. URIs do not appear as properties in the diagrams because each instance of a class will be an RDF resource, and any resource has a URI according to the RDF model. These URIs will fol- low the rules described in the above section, “Reference Architecture.” It’s worth mentioning that the selection of the included properties has been based in the study of several metadata formats and standards, such as Dublin communication process by helping identify, discover, assess, and manage scholarly artifacts. Because metadata are data, they can be represented through any the existing data representation models, such as the Relational Model or the XML Infoset. Though the represented information should be the same regardless of the formalism used, each model offers different capabilities of data manipulation and querying. Recently, a not-so-recent formalism has proliferated as a metadata representation model: RDF from the World Wide Web Consortium (W3C).7 We have chosen RDF for modeling the citation life- cycle because of its advantages with respect to other formalisms. RDF is modular; a subset of RDF triples from an RDF graph can be used separately, keeping a consistent RDF model. It therefore can be used with partial informa- tion, an essential feature in a distributed environment. The union of knowledge is mapped into the union of the corresponding RDF graphs (information can be gathered incrementally from multiple sources). RDF is the main building block of the Semantic Web initiative, together with a set of technologies for defining RDF vocabularies like RDF Schema (RDFS) and the OWL.8 RDF comprises several related elements, including a formal model and an XML serialization syntax. The basic building block of the RDF model is the triple subject- predicate-object. In a graph-theory sense, an RDF instance is a labeled directed graph consisting of vertices, which represent subjects or objects, and labeled edges, which represent predicates (semantic relations between subjects and objects). Coming back to the scholarly domain, our proposal is to model static knowledge (e.g., authors and papers metadata) and dynamic knowledge (e.g., “the action of accepting a paper for publication,” or “the action of sub- mitting a paper for publication”) using RDF predicates. The example in figure 1 shows how the action of sub- mitting a paper for publication could be modeled with an RDF graph. Figure 2 shows how the example in figure 1 would be serialized using the RDF XML syntax (the abbreviated mode). So, in our approach, we model assertions as RDF graphs and subgraphs. To allow anybody (authors, pub- lishers, citation-analysis systems, or others) to verify a chain of assertions, each involved RDF graph must be digitally signed by the proper principal. There are two approaches to signing RDF graphs (as also happens with XML instances). The first approach applies when the RDF graph is obtained from a digitally signed file. In this situation, one can simply verify the signature on the file. However, in certain situations the RDF graphs or subgraphs come from a more complex processing chain, and one could not have access to the original signed file. A second approach deals with this situation, and faces the problem of digitally signing the graphs themselves, that is, signing the information contained in them.9 For 28 iNFOrMAtiON tecHNOlOGY AND liBrAries | MArcH 2011 Note that instances of Submitted and Accepted event classes will point to the same creation instance because no modification of the creation is performed between these events. On the other hand, instances of ToBePublished and Published event classes will point to different creation instances (pointed by the cameraReady and published- Creation properties) because of the final editorial-side modifications to which a work can be subject. ■■ Advantages of the Proposed Trust Scheme The following is a short list of security features provided by our proposed scheme and attacks against which our proposed scheme is resilient: Core (DC), DC’s Scholarly Works Application Profile, vCard, and BibTEX.11 Figure 4 shows the class Publication and its subclasses, which represent the different kinds of publication. In the figure, we only show classes for journals, proceedings, and books. But it could obviously be extended to contain any kind of publication. Figure 5 contains the classes for the agents of the ontol- ogy (i.e., the human beings that author papers and book chapters and the organizations to which human beings are affiliated or that edit publications). The figure also includes the Creation class (e.g., a paper or a book chapter). Finally, figure 6 has the part of the ontology that describes the different events that occur in the process of publishing a paper (i.e., paper submission, paper accep- tance, notification of future publication, and publication). Figure 1. Example RDF Graph seMANtic WeB FOr reliABle citAtiON ANAlYsis iN scHOlArlY PuBlisHiNG | tOus, GuerrerO, AND DelGADO 29 cryptography. The necessary changes do not apply only to the citation-management software, but also to all the involved parties in the publishing lifecycle (e.g., conference and journal management systems). Authors and publishers would be the originators of the digitally signed evidences, thus user-friendly tools for generat- ing and signing the RDF metadata would be required. Plenty of RDF editors and digital signature toolkits exist, but we predict that conference and journal manage- ment systems such as EDAS could easily be extended to provide integrated functionalities for generating and processing digitally signed metadata graphs. This could be transparent to the users because the RDF documents would be automatically generated (and also signed in the case of the publishers) during the creating–editing– publishing process. Because our approach is based on a PKI trust scheme, we rely on a special setup assump- tion: the existence of CAs, which certify that the identity information and the public key contained within the public key certificates of authors and publishers belong together. To get a publication recognized by a reliable citation-analysis system, an author or a publisher would need a public-key certificate issued by a CA trusted by this citation-analysis system. The selection of trusted ■■ An author can certify to any evaluation entity that will evaluate his or her curriculum the publications that he or she has done. ■■ An evaluator entity can query the citation-analysis system and get all the publications that a certain author has done. ■■ An author cannot forge notifications of publication. ■■ A publisher cannot repudiate the fact that it has pub- lished an article once it has sent the certificate. ■■ Two or more authors cannot team up and make the system think that they are the same person to have more publications in their accounts (not even if they happen to have the same name). ■■ Implications The adoption of the approach proposed in this paper has certain implications in terms of technological changes but also in terms of behavioral changes at some of the stages of the scholarly publishing workflow. Regarding the technological impact, the approach relies on the use of Semantic Web technologies and public-key 2008–05–25 Semantic web for Reliable Citation Management in Scholarly Publishing . . . . . . Figure 2. Example RDF/XML Representation of Graph in Figure 1 30 iNFOrMAtiON tecHNOlOGY AND liBrAries | MArcH 2011 Figure 3. OWL Ontology for Capturing the Scholarly Communication Process Figure 4. Part of the Ontology Describing Publications seMANtic WeB FOr reliABle citAtiON ANAlYsis iN scHOlArlY PuBlisHiNG | tOus, GuerrerO, AND DelGADO 31 the citation-analysis system obtains the information or whether the information is duplicated. The proposed approach guarantees that the citation-analysis subsys- tem can always verify the provenance and trust of the metadata, and the use of unique identifiers ensures the detection of duplicates. Our approach also implies minor behavioral changes for authors, mainly related to the management of public- key certificates, which is often required for many other tasks nowadays. A collateral benefit of the approach would be the automation of the copyright transfer pro- cedure, which in most cases still relies on handwritten signatures. Authors would only be required to have their public-key certificate at hand (probably installed in the web browser), and the conference and journal manage- ment software would do all the work. CAs by citation-analysis systems would require the deployment of the necessary mechanisms to allow an author or a publisher to ask for the inclusion of his or her institution in the list. However, this process would be eased if some institutional CAs belonged to trust hierarchies (e.g., national or regional), so including some higher-level CAs makes the inclusion of CAs of some small institutions easier. Another technological implication is related to the interchange and storage of the metadata. Users and pub- lishers should save the signed metadata coming from a publishing process digitally, and citation-analysis sys- tems should harvest the digitally signed metadata. The metadata-harvesting process could be done in several different ways; but here raises an important benefit of the presented approach: the fact that it does not matter where Figure 5. Part of the Ontology Describing Agents and Creations 32 iNFOrMAtiON tecHNOlOGY AND liBrAries | MArcH 2011 domain, but which we have taken in consideration. In our approach, static and dynamic metadata cross many trust boundaries, so it is necessary to apply trust management techniques designed to protect open and decentralized systems. We have chosen a public-key infrastructure (PKI) design to cover such a requirement. However, other approaches exist, such as the one by Khare and Rifkin, which combines RDF with digital signatures in a manner related to what is known as the “Web of Trust.”13 One aspect of any approach dealing with RDF and cryptography is how to digitally sign RDF graphs. As described above, in the section “Modeling the Scholarly Communication Process with Semantic Web Knowledge Representation Techniques,” there are two different approaches for such a task, signing the file from which the graph will be obtained (which is the one we have chosen) or digitally signing the graphs themselves (the information represented in them), as described by Carroll.14 ■■ Conclusions The work presented in this paper describes a reference architecture that aims to provide reliability to the citation and citation-tracking lifecycle. The paper defends that current practices in the analysis of impact of scholarly artifacts entail serious design and security flaws, includ- ing nonidentical instances confusion, author-naming conflicts, fake citing, repudiation, impersonation, etc. ■■ Related Work As far as we know, this is the first paper to combine Semantic Web technologies and public-key cryptogra- phy to achieve reliable citation analysis in scholarly publishing. Regarding the use of ontologies and Semantic Web technologies for modeling the scholarly domain, we highlight the research by Rodriguez, Bollen, and Van de Sompel.12 They define a semantic model for the scholarly communication process, which is used within an associ- ated large-scale semantic store containing bibliographic, citation, and use data. This work is related to the MESUR (MEtrics from Scholarly Usage of Resources) project (http://www.mesur.org) from Los Alamos National Laboratory. The project’s main goal is providing novel mechanisms for assessing the impact of scholarly com- munication items, and hence of scholars, with metrics derived from use data. As in our case, the approach by Rodriguez, Bollen, and Van de Sompel models static and dynamic aspects of the scholarly communication process using RDF and OWL. However, contrary to what hap- pens in that approach, our work focuses on modeling the dynamic aspects of the creation–editing–publishing workflow, while the approach by Rodriguez, Bollen, and Van de Sompel focuses on modeling the use of already- published bibliographic resources. Regarding the combination of Semantic Web technolo- gies with security aspects and cryptography, there exist several works that do not specifically focus in the scholarly Figure 6. Part of the Ontology Describing Events seMANtic WeB FOr reliABle citAtiON ANAlYsis iN scHOlArlY PuBlisHiNG | tOus, GuerrerO, AND DelGADO 33 ISI Web of Knowledge, http://www.isiwebofknowledge .com/ (accessed June 24, 2010); and Eugene Garfield, Citation Indexing: Its Theory and Application in Science, Technology and Humanities (New York: Wiley, 1979). 3. Judit Bar-Ilan, “An Ego-Centric Citation Analysis Of The Works Of Michael O. Rabin Based on Multiple Citation Indexes,” Information Processing & Management: An International Journal 42 no. 6 (2006): 1553–66. 4. Alfred Arsenault and Sean Turner, “Internet X.509 Public Key Infrastructure: PKIX Roadmap,” draft, PKIX Working Group, Sept. 8, 1998, http://tools.ietf.org/html/draft-ietf-pkix- roadmap-00 (accessed June 24, 2010). 5. Internet Assigned Numbers Authority (IANA), Root Zone Database, http://www.iana.org/domains/root/db/ (accessed June 24, 2010). 6. For information on the DOI system, see Bill Rosenblatt, “The Digital Object Identifier: Solving The Dilemma of Copyright Protection Online,” Journal of Electronic Publishing 3, no. 2 (1997). 7. Resource Description Framework (RDF), World Wide Web Consortium, Feb. 10, 2004, http://www.w3.org/RDF/ (accessed June 24, 2010). 8. “RDF Vocabulary Description Language 1.0: RDF Schema. W3C Working Draft 23 January 2003,” http://www .w3.org/TR/2003/WD-rdf-schema-20030123/ (accessed June 24, 2010); “OWL Web Ontology Language Overview. W3C Recommendation 10 February 2004,” http://www.w3.org/TR/ owl-features/ (accessed June 24, 2010). 9. Jeremy J. Carroll, “Signing RDF Graphs,” in The Semantic Web—ISWC 2003, vol. 2870, Lecture Notes in Computer Science, ed. Dieter Fensel, Katia Sycara, and John Mylopoulos (New York: Springer, 2003). 10. Ian Horrocks, Peter F. Patel-Schneider, and Frank van Harmelen, “From SHIQ and RDF to OWL: The Making of a Web Ontology Language” Web Semantics: Science, Services and Agents on the World Wide Web 1 (2003): 10–11. 11. See the Dublin Core Metadata Initiative (DCMI), http:// dublincore.org/ (accessed June 24, 2010); Julie Allinson, Pete Johnston, and Andy Powell, “A Dublin Core Application Profile for Scholarly Works,” Ariadne 50 (2007), http://www.ukoln .ac.uk/repositories/digirep/index/Eprints_Type_Vocabulary_ Encoding_Scheme, http://www.ariadne.ac.uk/issue50/ allinson-et-al/ (accessed Dec. 27, 2010); World Wide Web Consortium, “Representing vCard Objects in RDF/XML: W3C Note 22 February 2001,” http://www.w3.org/TR/2001/NOTE -vcard-rdf-20010222/ (accessed Dec. 3, 2010); and for BibTEX, see “Entry Types,” http://nwalsh.com/tex/texhelp/bibtx-7. html (accessed June 24, 2010). 12. Marko. A. Rodriguez, Johan Bollen, and Herbert Van de Sompel, “A Practical Ontology For The Large-Scale Modeling Of Scholarly Artifacts And Their Usage,” Proceedings of the 7th ACM/ IEEE Joint Conference on Digital Libraries (2007): 278–87. 13. Rohit Khare and Adam Rifkin, “Weaving a Web of Trust,” World Wide Web Journal 2, no. 3 (1997): 77–112. 14. Carroll, “Signing RDF Graphs.” The architecture presented in this work is based in the use of digitally signed RDF graphs in the different stages of the scholarly publishing workflow, in such a manner that authors, publishers, repositories, and citation-anal- ysis systems could have access to independent reliable evidences. The architecture aims to allow the creation of a reliable information space that reflects not just static knowledge but also dynamic relationships, reflecting the full complexity of trust relationships between the differ- ent parties in the scholarly domain. To allow modeling the scholarly communication process with RDF graphs, we have designed an OWL DL ontology. RDF graphs carry- ing instances of classes and properties from the ontology will be digitally signed and interchanged between parties at the different stages of the creation–editing–publishing process. Citation-management systems will have access to these signed metadata graphs and will be able to verify their provenance and trust before incorporating them to their repositories. Because citation analysis has become a critical component in scholarly impact factor calculation, and considering the relevance of this metric within the schol- arly publishing value chain, we defend that the relevance of providing a reliable solution justifies the effort of introducing technological changes within the publish- ing lifecycle. We believe that these changes, which could be easily automated and incorporated to the modern conference and journal editorial systems, are justified considering the serious flaws of the established solu- tions and the relevance that citation-analysis systems are acquiring in our society ■■ Acknowledgment This work has been partly supported by the Spanish administration (TEC2008-06692-C02-01 and TSI2007- 66869-C02-01). References and Notes 1. Herbert Van de Sompel et al., “An Interoperable Fabric For Scholarly Value Chains,” D-Lib Magazine 12 no. 10 (2006), http:// www.dlib.org/dlib/october06/vandesompel/10vandesompel .html (accessed Jan. 19, 2011). 2. Boletín Oficial del Estado (B.O.E.) 054 04/03/2005 sec 3 pag 7875 a 7887, http://www.boe.es/boe/dias/2005/03/04/pdfs/ A07875–07887.pdf (accessed June 24, 2010). See also Thomson 3043 ---- 34 iNFOrMAtiON tecHNOlOGY AND liBrAries | MArcH 2011 Camilla Fulton Web Accessibility, Libraries, and the Law As a typical student, you are able to scan the resources and descriptions, familiarize yourself with the quiz’s format, and follow the link to the quiz with no inherent problems. Everything on the page flows well for you and the content is broken up easily for navigation. Now imagine that you are legally blind. You navigate to the webpage with your screen reader, a software device that allows you to surf the web despite your impairment. Ideally, the device gives you equal access to webpages, and you can navigate them in an equivalent manner as your peers. When you visit your teacher’s web- page, however, you start experiencing some problems. For one, you cannot scan the page like your peers because the category titles were designed with font tags instead of heading tags styled with cascading style sheets (CSS). Most screen readers use heading tags to create the equivalent of a table of contents. This table of contents function divides the page into navigable sections instead of making the screen reader relay all page content as a single mass. Second, most screen readers also allow users to “scan” or navigate a page by its listed links. When you visit your teacher’s page, you get a list of approximately twenty links that all read, “Search this resource.” Unfortunately, you are unable to differentiate between the separate resources without having the screen reader read all con- tent for the appropriate context. Third, because the resources are separated by hard returns, you find it difficult to differentiate between each listed item. Your screen reader does not indicate when it approaches a list of categorized items, nor does it pause between each item. If the resources were contained within the proper HTML list tags of either ordered or unordered (with subsequent list item tagging), then you could navi- gate through the suggested resources more efficiently (see figures 1, 2, and 3). Finally, the video tutorial’s audio tract explains much of the quiz’s structure; however, the video relies on image-capture alone for page orientation and navigation. Without a visual transcript, you are at a disadvantage. Stylistic descriptions of the page and its buttons are gen- erally unhelpful, but the page’s textual content, and the general movement through it, would better aid you in preparation for the quiz. To be fair, your teacher would already be cognizant of your visual disability and would have accommo- dated your class needs appropriately. The Individuals with Disabilities Education Act (IDEA) mandates edu- cational institutions to provide an equal opportunity to education.1 Your teacher would likely avoid posting any class materials online without being certain that the content was fully accessible and usable to you. Unlike educational institutions, however, most libraries are not legally bound to the same law. IDEA does not command libraries to provide equal access to information through With an abundance of library resources being served on the web, researchers are finding that disabled people oftentimes do not have the same level of access to materials as their nondisabled peers. This paper discusses web accessibility in the context of United States’ federal laws most referenced in web accessibility lawsuits. Additionally, it reveals which states have statutes that mirror federal web accessibility guidelines and to what extent. Interestingly, fewer than half of the states have adopted statutes addressing web accessibility, and fewer than half of these reference Section 508 of the Rehabilitation Act or Web Content Accessibility Guidelines (WCAG) 1.0. Regardless of sparse legislation surrounding web accessibility, librarians should consult the appropriate web accessibility resources to ensure that their specialized content reaches all. I magine you are a student. In one of your classes, a teacher and librarian create a webpage that will help the class complete an online quiz. This quiz constitutes 20 percent of your final grade. Through the exercise, your teacher hopes to instill the importance of quality research resources found on the web. The teacher and librarian divide their hand-picked resources into five subject-based categories. Each resource listing contains a link to that particular resource followed by a paragraph of pertinent background information. The list concludes with a short video tutorial that prepares students for the layout of the online quiz. Neither the teacher nor the librarian has extensive web design experience, but they both have basic HTML skills. The library’s information technologists give the teacher and librarian web space, allowing them to freely create their content on the web. Unfortunately, they do not have a web librarian at their disposal to help construct the page. They solely rely on what they recall from previ- ous web projects and visual layouts from other websites they admire. As they begin to construct the page, they first style each category’s title with font tags to make them bolder and larger than the surrounding text. They then separate each resource and its accompanying description with the equivalent of hard returns (or line breaks). Next, they place links to the resources within the description text and label them with “Search this resource.” Finally, they create the audiovisual tutorial with a runtime of three minutes. camilla Fulton (cfulton2@illinois.edu) is Web and digital con- tent access librarian, university of illinois, urbana-champaign. WeB AccessiBilitY, liBrAries, AND tHe lAW | FultON 35 providing specifics on when those standards should apply. For example, Section 508 of the Rehabilitation Act could serve as a blueprint for information technology guidelines that state agencies should follow. Section 508 states that Federal employees with disabilities [must] have access to and use of information and data that is comparable to the access and use by Federal employees who are not individuals with disabilities, unless an undue bur- den would be imposed on the agency.4 Section 508 continues to outline how the declaration should be met when procuring and managing software, websites, telecommunications, multimedia, etc. Section 508’s web standards comply with W3C’s Web Content Accessibility Guidelines (WCAG) 1.0; stricter compliance is optional. States could stop at Section 508 and only make web accessibility laws applicable to other state agencies. Section 504 of the Rehabilitation Act, however, provides additional legislation to model. In Section 504, no dis- abled person can be excluded from programs or activities that are funded by federal dollars.5 Section 504 further their websites. Neither does the federal government possess a carte blanche web accessibility law that applies to the nation. This absence of legislation may give the impression of irrelevance, but as more core com- ponents of librarianship migrate to the web, librarians should confront these issues so they can serve all patrons more effectively. This article provides background information on the federal laws most frequently referenced within web accessibility cases. Additionally, this article tests three assumptions: ■■ Although the federal government has no web accessibility laws in place for the general public, most states legalized web accessibility for their respective state agencies. ■■ Most state statutes do not men- tion Section 508 of the Americans with Disabilities Act (ADA) or acknowledge World Wide Web Consortium (W3C) standards. ■■ Most libraries are not included as entities that must comply with state web accessibility statutes. Further discussion on why these issues are important to the library profession follows. ■■ Literature Review No previous study has systematically examined state web accessibility statutes as they relate to libraries. Most articles that address issues related to library web acces- sibility view libraries as independent entities and run accessibility evaluators on preselected library and uni- versity websites.2 Those same articles also evaluate the meaning and impact of federal disability laws that could drive the outcome of web accessibility in academia.3 In examining state statutes, additional complexities may be unveiled when delving into the topic of web accessibility and librarianship. ■■ Background With no definitive stance on public web accessibility from the federal government, states became tasked with Figure 1. These webpages look exactly the same to users, but the HTML structure actu- ally differs in source code view. 36 iNFOrMAtiON tecHNOlOGY AND liBrAries | MArcH 2011 Title II, Section 201 (1) defines “public entity” as state and local governments, including their agencies, depart- ments, and districts.9 Title III, Section 302(a) builds on Title II and states that in the case of commercial facilities, No individual shall be discriminated against on the basis of disability in the full and equal enjoyment of the goods, services, facilities, privileges, advantages, or accommodations of any place of public accommoda- tion by any person who owns, leases . . . or operates a place of public accommodation.10 delineates specific entities subject to the auspice of this law. Though Section 504 never mentions web accessibility specifically, states could freely inter- pret and apply certain aspects of the law for their own use (e.g., making organizations receiving state funds create accessible websites to prevent the exclusion of disabled people). If states wanted to provide the highest level of service to all, they would also consider incorporating the most recent W3C recommendations. The W3C formed in 1994 to address the need for structural consistency across multitudinous websites and web browsers. The driving principle of the W3C is to make the benefits of the web accessible to all, “whatever their hardware, software, network infra- structure, native language, culture, geographical location, or physical or mental ability.”6 The most recent W3C guidelines, WCAG 2.0, detail web accessibility guidelines that are sim- pler to understand and, if followed, could improve both accessibility and usability despite browser type. Alternatively, states could decide to wait until the federal government mandates an all-encompassing law on web accessibility. The National Federation of the Blind (NFB) and American Council of the Blind (ACB) have been trying commercial entities in courts, claiming that inaccessible commercial websites discriminate against disabled people. The famous NFB lawsuit against Target pro- vided a precedent for other courts to acknowledge; commercial entities should provide an accessible means to purchase regularly stocked items through their website (if they are already maintaining one).7 These commercial web accessibility lawsuits are often defended with Title II and Title III of the ADA. Title II, Section 202 states, Subject to the provisions of this title, no quali- fied individual with a disability shall, by reason of such disability, be excluded from participa- tion in or be denied the benefits of the services, programs, or activities of a public entity, or be discriminated by any such entity.8 Figure 2. Here we see distinct variances in the source code. The image at the top (inac- cessible) reveals code that does not use headings or unordered lists for each resource. The image on the bottom (accessible) does use semantically correct code, maintaining the same look and feel of the headings and list items through an attached cascading stylesheet. WeB AccessiBilitY, liBrAries, AND tHe lAW | FultON 37 accessibility believe that Section 301(7) specifically denotes places of physical accommodation because the authors’ original intent did not include virtual ones.13 Settling on a definition for “public accommodation” is so divisive that three district courts are receptive to “public accommoda- tion” referring to nonphysical places, four district courts ruled against the notion, and four have not yet made a decision.14 Despite legal battles within the commercial sec- tor, state statute analysis shows that states felt compelled to address web accessibility on their own terms. ■■ Method This study surveys the most current state statute web presences as they pertain to web accessibility and their con- nection to libraries. Using Georgia Institute of Technology’s State E&IT Accessibility Initiatives database and Golden’s article on accessibility within institutions of higher learn- ing as starting points, I searched each state government’s online statutes for the most recently available code.15 Examples of search terms used include “web accessibil- ity,” “information technology,” and “accessibility -building -architecture -health.” “Building,” for example, excluded statute results that pertained to building accessibility. I then reviewed each statute to determine whether its mandates applied to web accessibility. Some statutes excluded men- tion of web accessibility but outlined specific requirements for an institution’s software procurement. When statutes on web accessibility could not be found, additional searches were conducted for the most recently available web accessibility guidelines, policies, or stan- dards. Using a popular web search engine and the search terms “[state] web accessibility” usually resulted in find- ing the state’s standards online. If the search engine did not offer desirable results, then I visited the appropriate state government’s website. The term “web accessibility” was used within the state government’s site search. The following results serve only as a guide. Because of the ever-changing nature of the law, please consult legal advisors within your institution for changes that may have occurred post article publication. ■■ Results “Although the federal government has no web acces- sibility laws in place for the general public, most states legalized web accessibility for its respective state agencies.” False—Only seventeen states have codified laws ensuring web accessibility for their state websites.16 Four This title’s proclamation seems clear-cut; however, legal definitions of “public accommodation” differ. Title III, Section 301(7) defines a list of acceptable entities to receive the title of “public accommodation.”11 Among those listed are auditoriums, theaters, terminals, and educational facili- ties. Courts using Title III in defense for web accessibility argue that the web is a place, and therefore cannot dis- criminate against those with visual, motor, or mental disabilities.12 Those arguing against using Title III for web Figure 3. Fangs (http://www.standards-schmandards.com/ projects/fangs/) visually emulates what a standard screen reader outputs so that designers can take the first steps in creating more accessible content on the web. 38 iNFOrMAtiON tecHNOlOGY AND liBrAries | MArcH 2011 classified institutions with library websites found that less than half of each degree-producing division was directed by their institution to comply with the ADA for web accessibility.24 Some may not recognize the signifi- cance of providing accessible library websites, especially if they do not witness a large quantity of accommodation requests from their users. Coincidentally, perceived soci- etal drawbacks could keep disabled users from seeking the assistance they need.25 According to American Community Survey terminol- ogy, disabilities negatively affecting web accessibility tend to be sensory and self-care based.26 The 2008 American Community Survey Public Use Microdata Sample esti- mates that 10,393,100 noninstitutionalized Americans of all ages live with a hearing disability and 6,826,400 live with a visual disability.27 According to the same survey, an estimated 7,195,600 noninstitutionalized Americans live with a self-care disability. In other words, nearly 24.5 million people in the United States are unable to retrieve information from library websites unless web authors make accessibility and usability their goal. As gatekeepers of information and research resources, librarians should want to be the first to provide unre- stricted and unhindered access to all patrons despite their ability. Nonetheless, potential objections to addressing web accessibility can deter improvement: Learning and applying web accessibility guidelines will be difficult. There is no way we can improve access to disabled users in a way that will be useful. Actually, more than 90 percent of sensory-accessibility issues can be resolved through steps outlined in Section 508, such as utilizing headings properly, giving alterna- tive image descriptions, and providing captions for audio and video. Granted, these elements may be more difficult to manage on extensive websites, but wisely applied web content management systems could alleviate information technology units’ stress in that respect.28 Creating an accessible website is time consuming and resource draining. This is obviously an “undue burden” on our facility. We cannot do anything about accessibil- ity until we are given more funding. The “undue burden” clause seen in Section 508 and several state statutes is a real issue that government offi- cials needed to address. However, individual institutions are not supposed to view accessible website creation as an isolated activity. “Undue burden,” as defined by the Code of Federal Regulations, relies upon the overall budget of the program or component being developed.29 Claiming an “undue burden” means that the institution must extensively document why creating an accessible website would cause a burden.30 The institution would also have to provide disabled users an alternative means of access to information provided online. of these seventeen extended coverage to include agen- cies receiving state funds (with no exceptions).17 Though that number seems disappointingly low, many states addressed web accessibility through other means. Thirty- one states without web accessibility statutes posted some form of standard, policy, or guideline online in its place (see appendix). These standards only apply to state enti- ties, however, and have no legal footing outside of federal law to spur enforcement. At the time of article submis- sion, Alaska and Wyoming were the only two states without an accessibility standard, policy, or guideline available on the web. “Most state statutes do not mention Section 508 of the Americans with Disabilities Act or acknowledge World Wide Web Consortium (W3C) standards” True—Interestingly, only seven of the seventeen states with web accessibility statutes reference Section 508 or WCAG 1.0 directly within their statute text (see appen- dix).18 Minnesota is the only state that references the more current WCAG 2.0 standards.19 These numbers may seem minuscule as well, but all states have supplemented their statutes with more descriptive guidelines and standards that delineate best practices for compliance (see appen- dix). Within those guidelines and standards, Section 508 and WCAG 1.0 get mentioned with more frequency. “Most libraries are not included as entities that must comply with state web accessibility statutes.” True—From the perspective of a librarian, the above data means that forty-eight states would require web accessibility compliance for their state libraries (see appendix). Four of those states (Arkansas, California, Kentucky, and Montana) require all libraries receiv- ing state funds to maintain an accessible website.20 An additional four states (Illinois, Oklahoma, Texas, and Virginia) explicitly hold universities, and therefore their libraries, to the same standards as their state agencies.21 Despite the commendable efforts of eight states pushing for more far-reaching web accessibility, thousands of K–12, public, and academic libraries nationwide escape these laws’ reach. ■■ Discussion and Conclusion Without legal backing for web accessibility issues at all levels, “equitable access to information and library ser- vices” might remain a dream.22 Notably, researchers have witnessed web accessibility improvements in a four-year span; however, as of 2006, even libraries at institutions with ALA-accredited library and information science programs did not average an accessibility validation of 70 percent or higher.23 Additionally, a survey of Carnegie WeB AccessiBilitY, liBrAries, AND tHe lAW | FultON 39 9. 42 U.S.C. §12131. 10. 42 U.S.C. §12182. 11. 42 U.S.C. §12181. 12. Carrie L. Kiedrowski, “The Applicability of the ADA to Private Internet Web Sites,” Cleveland State Law Review 49 (2001): 719–47; Shani Else, “Courts Must Welcome the Reality of the Modern Word: Cyberspace is a Place under Title III of the Americans with Disabilities Act,” Washington & Lee Law Review 65 (Summer 2008): 1121–58. 13. Ibid. 14. Nikki D. Kessling, “Why the Target ‘Nexus Test’ Leaves Disabled Americans Disconnected: A Better Approach to Determine Whether Private Commercial Websites are ‘Places of Public Accommodation,’” Houston Law Review 45 (Summer 2008): 991–1029. 15. State E & IT Accessibility Initiatives Workgroup, “State IT Database,” Georgia Institute of Technology, http://acces sibility.gtri.gatech.edu/sitid/state_prototype.php (accessed Jan. 28, 2010); Nina Golden, “Why Institutions of Higher Education Must Provide Access to the Internet to Students with Disabilities,” Vanderbilt Journal of Entertainment & Technology Law 10 (Winter 2008): 363–411. 16. Arizona Revised Statutes §41-3532 (2010); Arkansas Code of 1987 Annotated §25-26-201–§25-26-206 (2009); California Government Code §11135–§11139 (2010); Colorado Revised Statutes §24-85-101–§24-85-104 (2009); Florida Statutes §282.601– §282.606 (2010); 30 Illinois Complied Statutes Annotated 587 (2010); Burns Indiana Code Annotated §4-13.1-3 (2010); Kentucky Revised Statutes Annotated §61.980–§ 61.988 (2010); Louisiana Revised Statutes §39:302 (2010); Maryland State Finance and Procurement Code Annotated §3A-311 (2010); Minnesota Annotated Statutes §16E.03 Subdivisions 9-10 (2009); Missouri Revised Statutes §191.863 (2009); Montana Code Annotated §18- 5-601 (2009); 62 Oklahoma Statutes §34.16, §34.28–§34.30 (2009); Texas Government Code §2054.451–§2054.463 (2009); Virginia Code Annotated §2.2-3500–§2.2-3504 (2010); West Virginia Code § 18-10N-1–§18-10N-4 (2009). 17. Arkansas Code of 1987 Annotated §25-26-202(7) (2009); California Government Code §11135 (2010); Kentucky Revised Statutes Annotated §61.980(4) (2010); Montana Code Annotated §18-5-602 (2009). 18. Arizona Revised Statutes §41-3532 (2010); California Government Code §11135(d)(2) (2010); Burns Indiana Code Annotated §4-13.1-3-1(a) (2010); Florida Statutes §282.602 (2010); Kentucky Revised Statutes Annotated §61.980(1) (2010); Minnesota Annotated Statutes §16E.03 Subdivision 9(b) (2009); Missouri Revised Statutes §191.863(1) (2009). 19. Minnesota Annotated Statutes §16E.03 Subdivision 9(b) (2009). 20. Arkansas Code of 1987 Annotated §25-26-202(7) (2009); California Government Code §11135 (2010); Kentucky Revised Statutes Annotated §61.980(4) (2010); Montana Code Annotated §18-5-602 (2009). 21. 30 Illinois Complied Statutes Annotated 587/10 (2010); 62 Oklahoma Statutes §34.29 (2009); Texas Government Code §2054.451 (2009); Virginia Code Annotated §2.2-3501 (2010). 22. American Library Association, “ALAhead to 2010 Strategic Plan,” http://www.ala.org/ala/aboutala/missionhistory/ plan/2010/index.cfm (accessed Jan. 28, 2010). 23. Comeaux and Schmetzke, “Accessibility Trends.” No one will sue an institution focused on promoting education. We will just continue providing one-on-one assistance when requested. In 2009, a blind student, backed by the NFB, initiated litigation against the Law School Admissions Council (LSAC) because of the inaccessibility of its online tests.31 In 2010, they added four law schools to the defense: University of California Hastings College of the Law, Thomas Jefferson School of Law, Whittier Law School, and Chapman University School of Law.32 These law schools were added because they host their application materials on the LSAC website.33 Assuredly, if instructors and students are encouraged or required to use library webpages for assignments and research, those unable to use them in an equivalent manner as their peers may pursue litigation for forcible change. Ultimately, providing accessible websites for library users should not be perceived as a hassle. Sure, it may entail a new way of thinking, but the benefits of universal access and improved usability far outweigh the frustra- tion that users may feel when they cannot be self-sufficient in their web-based research.34 Regardless of whether the disabled user is in a K–12, college, university, or public library, they are paying for a service that requires more than just a physical accommodation.35 Federal agencies, state entities, and individual institutions are all responsi- ble (and important) in the promotion of accessible website construction. Lack of statutes or federal laws should not exempt libraries from providing equivalent access to all; it should drive libraries toward it. References 1. Individuals with Disabilities Education Act of 2004, 40 U.S.C. §1411–§1419. 2. See David Comeaux and Axel Schmetzke, “Accessibility Trends among Academic Library and Library School Web Sites in the USA and Canada,” Journal of Access Services 6 (Jan.–June 2009): 137–52; Julia Huprich and Ravonne Green, “Assessing the Library Homepages of COPLA Institutions for Section 508 Accessibility Errors: Who’s Accessible, Who’s Not and How the Online WebXACT Assessment Tool Can Help,” Journal of Access Services 4, no. 1 (2007): 59–73; Michael Providenti and Robert Zai III, “Web Accessibility at Kentucky’s Academic Libraries,” Library Hi Tech 25, no. 4 (2007): 478–93. 3. Ibid.; Michael Providenti and Rober Zai III, “Web Accessibility at Academic Libraries: Standards, Legislation, and Enforcement,” Library Hi Tech 24, no. 4 (2007): 494–508. 4. 29 U.S.C. §794(d); 36 Code of Federal Regulations (CFR) §1194.1. 5. 29 U.S.C. § 794. 6. World Wide Web Consortium, “W3C Mission,” http:// www.w3.org/Consortium/mission.html (accessed Jan. 28, 2010). 7. National Federation of the Blind v. Target Corp., 452 F. Supp. 2d 946 (N.D. Cal. 2006). 8. 42 U.S.C. §12132. 40 iNFOrMAtiON tecHNOlOGY AND liBrAries | MArcH 2011 Special Needs, vol. 5105, Lecture Notes in Computer Science (Linz, Australia: Springer-Verlag, 2008) 454–61; David Kane and Nora Hegarty, “New Site, New Opportunities: Enforcing Standards Compliance within a Content Management System,” Library Hi Tech 25, no. 2 (2007): 276–87. 29. 28 CFR §36.104. 30. Ibid. 31. Sheri Qualters, “Blind Law Student Sues Law School Admissions Council Over Accessibility,” National Law Journal (Feb. 20, 2009), http://www.law.com/jsp/nlj/PubArticleNLJ .jsp?id=1202428419045 (accessed Jan. 28, 2010). Follow the case at the County of Alameda’s Superior Court of California, avail- able online (search for case number RG09436691): http://apps .alameda.courts.ca.gov/domainweb/html/index.html (accessed Sept. 20, 2010). 32. Ibid. 33. Ibid. After finding the case, click on “Register of Actions” in the side navigation menu. These details can be found on page 10 of the action “Joint Case Management Statement Filed,” uploaded June 30, 2010. 34. Jim Blansett, “Digital Discrimination: Ten Years after Section 508, Libraries Still Fall Short of Addressing Disabilities Online,” Library Journal 133 (Aug. 2008): 26–29; Drew Robb, “One Site Fits All: Companies are Working to Make Their Web Sites Comply with Accessibility Guidelines because the Effort Translates into More Customers,” Computerworld (Mar. 28, 2005): 29–32. 35. The United States Department of Justice supports Title III’s application of “public accommodation” to include virtual web spaces. See U.S. Department of Justice, “Settlement Agreement Between the United States of America and City of Missoula County, Montana Under the Americans with Disabilities Act,” DJ# 204-44-45, http://www.justice.gov/crt/foia/mt_1.php and http://www.ada.gov/missoula.htm (accessed Jan. 28, 2010). 24. Ruth Sara Connell, “Survey of Web Developers in Academic Libraries,” Journal of Academic Librarianship 34, no. 2 (2008): 121–29. 25. Patrick M. Egan and Traci A. Guiliano, “Unaccommodating Attitudes: Perceptions of Students as a Function of Academic Accommodation Use and Test Performance” North American Journal of Psychology 11, no. 3 (2009): 487–500; Ramona Paetzold et al., “Perceptions of People with Disabilities: When is Accommodation Fair?” Basic & Applied Social Psychology 30 (2008): 27–35. 26. U.S. Census Bureau, American Community Survey, Puerto Rico Community Survey: 2008 Subject Definitions (Washington, D.C.: Government Printing Office, 2009). Hearing disability pertains to deafness or difficulty in hearing. Visual disability pertains to blindness or difficulty seeing despite prescription glasses. Self-care disability pertains to those whom have “dif- ficulty dressing or bathing.” 27. U.S. Census Bureau, Data Set: 2006–2008 American Community Survey (ACS) Public Use Microdata Sample (PUMS) 3-Year Estimates (Washington, D.C.: Government Printing Office, 2009). For a more interactive table, with statistics drawn directly from the American Community Survey PUMS data files, see the database created and maintained by the Employment and Disability Institute at Cornell University: M. J. Bjelland, W. A. Erickson, and C. G. Lee, Disability Statistics from the American Community Survey (ACS), Cornell University Rehabilitation Research and Training Center on Disability Demographics and Statistics (StatsRRTC), http://www.disabilitystatistics.org (accessed Jan. 28, 2010). 28. Sébastien Rainville-Pitt and Jean-Marie D’Amour, “Using a CMS to Create Fully Accessible Web Sites,” Journal of Access Services 6 (2009): 261–64; Laura Burzagli et al., “Using Web Content Management Systems for Accessibility: The Experience of a Research Institute Portal,” in Proceedings of the 11th International Conference on Computers Helping People with Appendix. Library Website Accessibility Requirements, by State State Libraries Included? Code Online State Statutes Online Statements/Policies/ Guidelines Ala. n/a n/a n/a http://isd.alabama.gov/isd/statements .aspx Alas. n/a n/a n/a n/a Ariz.* state and state- funded (with exceptions) Arizona Revised Statutes §41- 3532 http://www.azleg.state.az.us/ ArizonaRevisedStatutes.asp? Title=41 http://az.gov/polices_accessibility.html Ark. state and state-funded Arkansas Code Annotated §25- 26-201 thru §25-26-206 http://www.arkleg.state.ar.us/assembly/ ArkansasCodeLargeFiles/Title%2025%20 State%20Government-Chapter%2026%20 Information%20Technology.htm and http:// www.arkleg.state.ar.us/bureau/Publications/ Arkansas%20Code/Title%2025.pdf http://portal.arkansas.gov/Pages/policy .aspx WeB AccessiBilitY, liBrAries, AND tHe lAW | FultON 41 State Libraries Included? Code Online State Statutes Online Statements/Policies/ Guidelines Calif.* state and state-funded California Government Code §11135 thru §11139 http://www.leginfo.ca.gov/calaw.html http://www.webtools.ca.gov/Accessibility/ State_Standards.asp Colo. state Colorado Revised Statutes §24- 85-101 thru §24-85-104 http://www.state.co.us/gov_dir/leg_dir/ OLLS/colorado_revised_statutes.htm www.colorado.gov/colorado/accessibility .html Conn. n/a n/a n/a http://www.access.state.ct.us/ Del. n/a n/a n/a http://gic.delaware.gov/information/ access_central.shtml Fla.* state Florida Statutes §282.601 thru §282.606 http://www.leg.state.fl.us/STATUTES/ http://www.myflorida.com/myflorida/ accessibility.html Ga. n/a n/a n/a http://www.georgia.gov/00/static/ 0,2085,4802_0_0_Accessibility, 00.html Hawaii n/a n/a n/a http://www.ehawaii.gov/dakine/docs/ada .html Idaho n/a n/a n/a http://idaho.gov/accessibility.html Ill. state and university 30 Illinois Complied Statutes Annotated 587 http://www.ilga.gov/legislation/ilcs/ilcs.asp http://www.dhs.state.il.us/page.aspx? item=32765 Ind.* state and local government Burns Indiana Code Annotated §4-13.1-3 http://www.in.gov/legislative/ic/code/title4/ ar13.1/ch3.html http://www.in.gov/core/accessibility.htm Iowa n/a n/a n/a http://www.iowa.gov/pages/accessibility Kans. n/a n/a n/a http://www.kansas.gov/about/ accessibility_policy.html Ky.* state and state-funded Kentucky Revised Statutes Annotated §61.980 thru §61.988 http://www.lrc.ky.gov/krs/titles.htm http://technology.ky.gov/policies/ webtoolkit.htm La. state Louisiana Revised Statutes §39:302 http://www.legis.state.la.us/ http://www.louisiana.gov/Government/ Policies/#webaccessibility Maine n/a n/a n/a http://www.maine.gov/oit/accessibility/ policy/webpolicy.html Appendix. Library Website Accessibility Requirements, by State (continued) 42 iNFOrMAtiON tecHNOlOGY AND liBrAries | MArcH 2011 State Libraries Included? Code Online State Statutes Online Statements/Policies/ Guidelines Md. state and (possibly) community college Maryland State Finance and Procurement Code Annotated §3A- 311 http://www.michie.com/maryland/ and http://www.dsd.state.md.us/comar/coma r.aspx http://www.maryland.gov/pages/ Accessibility.aspx Mass. n/a n/a n/a http://www.mass.gov/accessibility and http://www.mass.gov/?pageID=mg2utiliti es&L=1&sid=massgov2&U=utility_policy_ accessibility Mich. n/a n/a n/a http://www.michigan.gov/som/0,1607,7– 192–26913–2090—, 00.html Minn.** state Minnesota Annotated Statutes §16E. 03 Subdivisions 9–10 https://www.revisor.mn.gov/pubs/ http://www.starprogram.state.mn.us/ Accessibility_Usability.htm Miss. n/a n/a n/a http://www.mississippi.gov/access_policy .jsp Mo.* state Missouri Revised Statutes §191.863 http://www.moga.mo.gov/STATUTES/ STATUTES.htm http://oa.mo.gov/itsd/cio/standards/ ittechnology.htm Mont. state and state-funded Montana Code Annotated §18- 5-601 http://data.opi.mt.gov/bills/mca_toc/index .htm http://mt.gov/discover/disclaimer .asp#accessibility Neb. n/a n/a n/a http://www.webmasters.ne.gov/ accessibilitystandards.html Nev. n/a n/a n/a http://www.nitoc.nv.gov/PSPs/3.02_ Standard_WebStyleGuide.pdf N.H. n/a n/a n/a http://www.nh.gov/wai/ N.J. n/a n/a n/a http://www.state.nj.us/nj/accessibility.html N.M. n/a n/a n/a http://www.newmexico.gov/accessibility .htm N.Y. n/a n/a n/a http://www.cio.ny.gov/Policy/NYS-P08– 005.pdf N.C. n/a n/a n/a http://www.ncsta.gov/docs/Principles%20 Practices%20Standards/Application.pdf N. Dak. n/a n/a n/a http://www.nd.gov/ea/standards/ Ohio n/a n/a n/a http://ohio.gov/policies/accessibility/ Appendix. Library Website Accessibility Requirements, by State (continued) WeB AccessiBilitY, liBrAries, AND tHe lAW | FultON 43 State Libraries Included? Code Online State Statutes Online Statements/Policies/ Guidelines Okla. state and university 62 Oklahoma Statutes §34.16, §34.28 thru §34.30 http://www.lsb.state.ok.us/ http://www.ok.gov/accessibility/ Ore. n/a n/a n/a http://www.oregon.gov/accessibility.shtml Pa. n/a n/a n/a http://www.portal.state.pa.us/portal/ server.pt/community/it_accessibility/10940 R.I. n/a n/a n/a http://www.ri.gov/policies/access.php S.C. n/a n/a n/a http://sc.gov/Policies/Accessibility.htm S. Dak. n/a n/a n/a http://www.sd.gov/accpolicy.aspx Tenn. n/a n/a n/a http://www.tennesseeanytime.org/web -policies/accessibility.html Tex. state and university Texas Government Code §2054.451 thru §2054.463 http://www.statutes.legis.state.tx.us/ http://www.texasonline.com/portal/tol/en/ policies Utah n/a n/a n/a http://www.utah.gov/accessibility.html Va. state, university, and common- wealth Virginia Code Annotated §2.2-3500 thru §2.2-3504 http://leg1.state.va.us/000/src.htm http://www.virginia.gov/cmsportal3/ about_virginia.gov_4096/web_policy.html Vt. n/a n/a n/a http://www.vermont.gov/portal/policies/ accessibility.php Wash. n/a n/a n/a http://isb.wa.gov/webguide/accessibility .aspx W. Va. state West Virginia Code §18- 10N-1 thru §18-10N-4 http://www.legis.state.wv.us/WVCODE/ Code.cfm http://www.wv.gov/policies/Pages/ accessibility.aspx Wis. n/a n/a n/a http://www.wisconsin.gov/state/core/ accessibility.html Wyo. n/a n/a n/a n/a *these states mention Section 508 of the rehabilitation act within statute text **this state mentions WcaG 2.0 within its statute text note: Most states with statutes on web accessibility also have statements, policies, and guidelines that are more detailed than the statute text and may contain references to Section 508 and WcaG 2.0. all webpages were visited between January 1, 2010, and February 12, 2010. Appendix. Library Website Accessibility Requirements, by State (continued) 3044 ---- 44 iNFOrMAtiON tecHNOlOGY AND liBrAries | MArcH 2011 Jennifer Emanuel Usability of the VuFind Next-Generation Online Catalog VuFind incorporates many of the interactive web and social media technologies that the public uses online, including features from online booksellers and commer- cial search engines. The VuFind search page is simple, containing only a single search box and a dropdown menu that gives users the option to search all fields or to search by title, author, subject, or ISBN/ISSN (see figure 1). To combine searches using Boolean logic or to limit to a particular language or format, the user must use the advanced search feature (see figure 2). The record- results page displays results vertically, with each result containing basic item information, such as title, author, call number, location, item availability, and a graphical icon displaying the material’s format. The results page also has a column on the right side displaying “facets,” which are links that allow a user to refine their search and browse results using catalog data contained within the result set (see figure 3). VuFind also contains a vari- ety of Web 2.0 features, such as the ability to tag items, create a list of favorite items, leave comments about an item, cite an item, and links to Google Book previews and extensive author biographies data mined from the Internet. Corresponding to the beginning of the VuFind trial at UIUC, the university library purchased reviews, synopses, and cover images from Syndetic Solutions to further enhance both VuFind and the existing WebVoyage catalog. An additional appealing aspect of VuFind was its speed; the CARLI installation of WebVoyage is slow to load and is prone to time out while conducting searches. The UIUC library first provided VuFind (http:// www.library.illinois.edu/vufind) at the beginning of the 2008 fall semester and expected it to be trialed through the end of the spring semester 2009. Use statistics show that throughout the fall semester (September through December), there were approximately six thousand unique visitors each month, producing a total of more than thirty-eight thousand visits. Spring statistics show use averaging more than ten thousand visitors a month, an increase most likely from word-of-mouth. Librarians at both UIUC and CARLI were inter- ested in what users thought about VuFind, especially in relation to the usability of the interface. With this in mind, the library launched several forms of assessment during the spring semester. The first was a quantita- tive survey based on Yale’s VuFind usability testing.3 The second was a more extensive qualitative usability test that had users conducting sample searches in the interface and telling the facilitator their opinions. This article will discuss the hands-on usability portion of this study. Survey responses that support the results pre- sented herein will be reported in a separate venue. While this article only discusses VuFind at a single institution, it does offer a generalized view of next-generation catalogs and how library users use such a catalog compared to a traditional online catalog. The VuFind open–source, next-generation catalog system was implemented by the Consortium of Academic and Research Libraries in Illinois as an alternative to the WebVoyage OPAC system. The University of Illinois at Urbana-Champaign began offering VuFind alongside WebVoyage in 2009 as an experiment in next generation catalogs. Using a faceted search discovery interface, it offered numerous improvements to the UIUC catalog and focused on limiting results after searching rather than limiting searches up front. Library users have praised VuFind for its Web 2.0 feel and features. However, there are issues, particularly with catalog data. V uFind is an open–source, next-generation catalog overlay system developed by Villanova University Library that was released to the public as beta in 2007 and version 1.0 in 2008.1 As of July 2009, four institu- tions implemented VuFind as a primary catalog interface, and many more are either beta or internally testing it.2 More information about VuFind, including the technical requirements and compatible OPACs, is available on the project website (http://www.vufind.org). In Illinois, the state Consortium of Academic and Research Libraries in Illinois (CARLI) released a beta installation of VuFind in 2008 on top of its WebVoyage catalog database. The CARLI installation of VuFind is a base installation with minor customizations to the CARLI catalog environment. Some libraries in Illinois utilize VuFind as an alternative to their online catalog, including the University of Illinois at Urbana-Champaign (UIUC), which currently adver- tises VuFind as a more user friendly and faster version of the library catalog. As a part of the evaluation of next- generation catalog systems, UIUC decided to conduct hands-on usability testing during the spring of 2009. The CARLI catalog environment is very complex and comprises 153 member libraries throughout Illinois, rang- ing from tiny academic libraries to the very large UIUC library. Currently, 76 libraries use a centrally managed WebVoyage system referred to as I-Share. I-Share is com- posed of a union catalog containing holdings of all 76 libraries as well as individual institution catalogs. Library users heavily use the union catalog because of a strong culture of sharing materials between member institutions. CARLI’s VuFind installation uses the records of the entire union catalog, but has library-specific views. Each of these views is unique to the member library, but each library uses the same interface to view records throughout I-Share. Jennifer emanuel (emanuelj@illinois.edu) is digital Services and reference librarian, university of illinois at urbana-champaign. usABilitY OF tHe VuFiND Next-GeNerAtiON ONliNe cAtAlOG | eMANuel 45 not simply find them.6 As a result, the past five years have been filled with commercial OPAC provid- ers releasing next-generation library interfaces that over- lay existing library catalog information and require an up-front investment by libraries to improve search capabilities. As these systems are inherently commercial and require a significant investment of capital, several open–source, next-gener- ation catalog projects have emerged, such as VuFind, Blacklight, Scriblio, and the eXtensible Catalog Project.7 These interfaces are often developed at one institution with their users in mind and then modified and adapted by other institutions to meet local needs. However, because they can be locally customized, libraries with sig- nificant technical expertise can have a unique interface that commercial vendors cannot compete against. One cannot discuss next-generation catalogs without mentioning the metadata that underlie OPAC systems. Some librarians view the interface as only part of the problem of library catalogs and point to cataloging and metadata practices as the larger underlying problem. Many librarians view traditional cataloging using Machine-Readable Cataloging (MARC), which has been used since the 1960s, as outdated because it was devel- oped with nearly fifty-year-old technology in mind.8 However, because MARC is so common and allows cataloging with a fine degree of granularity, current OPAC systems still utilize it. Librarians have developed additional cataloging standards, such as Dublin Core (DC), Metadata Object Description Schema (MODS), and Functional Requirements for Bibliographic Records (FRBR), but none of these have achieved widespread adoption for cataloging printed materials. Newly devel- oped catalog projects, such as eXtensible Catalog, are beginning to integrate these new metadata schemas, but currently others continue to use MARC.9 Many librarians also advocate to integrate folksonomy, or user tagging, into library catalogs. Folksonomy is used by many library websites, most notably Flickr, Delicious, and LibraryThing, each of which store user-submitted content that istagged with self-selected keywords that allow for easy retrieval and discovery.10 VuFind integrates tagging into individual item records ■■ Literature Review Librarians have complained about the usability of online catalogs since they were first created.4 When Amazon.com became the go-to site for books and book information in the early 2000s, librarians and their users began to harshly criticize both OPAC interfaces and metadata standards.5 Ever since North Carolina State University announced a partnership with the commercial-search corporation Endeca in 2006, librarians have been interested in the next generation of library catalogs and more broadly, discovery systems designed to help users discover library materials, Figure 1. VuFind Default Search Figure 2. VuFind Advanced Search Figure 3. Facets in VuFind 46 iNFOrMAtiON tecHNOlOGY AND liBrAries | MArcH 2011 searching the library’s online catalog and were eager to see changes made to it. The test used was developed from a statewide usabil- ity test of different catalog interfaces usedin Illinois. The test was adapted using the same sample searches, but was customized to the features and uses of VuFind (see appendix). The VuFind test was similar to the original test to allow a comparison of other catalog interfaces to VuFind for internal evaluation purposes. I designed the test to allow subjects to perform a progressively compli- cated series of sample searches using the catalog while the moderator pointed out various features of the catalog interface. Subjects were also asked what they thought about the search result sets and their opinions of the interface and navigation; they also were asked to perform specific tasks using VuFind. The tasks were common library-catalog tasks using topics familiar at undergradu- ate–level students. The tasks ranged from a keyword search for “global warming” to a more complicated search for a specific compact disc by the artist Prince. The tasks also included using the features associated with cre- ating and using an account with VuFind, such as adding tags and creating a favorite items list. Through complet- ing the test, subjects got an overview of VuFind and were then asked to draw conclusions about their experience and compare it to other library catalogs they have used. The tests were performed in a small meeting room with one workstation set up with an install of the Morae software, a microphone, and a web camera. Morae is a very powerful software program developed by TechSmith that records the screen on which the user is interacting with an interface, as well as environmental audio and video. Although the study did not utilize all the features of the Morae software, it was invaluable to the researcher to be able to review the entire testing experience with the same detail as when the test actually occurred in person. The study was carried out with the researcher sitting next to the workstation asking subjects to perform a task from the script while Morae recorded all of their actions. Once all fifteen subjects completed the test, the researcher watched the resulting videos and coded the answers into various themes on the basis of both broad subject catego- ries and individual question answers. The researcher then gathered the codes into categories and used them to fur- ther analyze and gain insight into both the useful features of and problems with the VuFind interface. ■■ Analysis Participants generally liked VuFind and preferred it to the current WebVoyage system. When asked to choose which catalog they would rather use, only one person, a faculty member, stated he would still use WebVoyage. This faculty but does not pull tags from other sources; rather, users must tag items individually. Additionally, next-generation catalogs offer a search mechanism that focuses on discovery rather than simply searching for library materials. Users, accustomed to new ways of searching both on the Internet and through com- mercial library indexing and abstracting databases, now search in a fundamentally different style than they did when OPACs first became a part of library services. The online catalog is now just one of many tools that library users use to locate information and now covers fewer resources than it did ten to fifteen years ago. Library users are now accustomed to using a single search box, such as with Google; they also use nonlibrary online tools to find information about books and no longer view library cata- logs as the primary place to look for books.11 As users are no longer accustomed to using the con- trolled language and particular searching methods of library catalogs because they have moved to discover- ing materials online, libraries must adapt to new way of obtaining information and focus not on teaching users how to locate library materials, but give them the tools to discover on their own.12 VuFind is one option among many in the genre of next-generation or discovery-catalog tools. ■■ Methods The study employed fifteen subjects who participated in individual, hands-on usability test sessions lasting an average of thirty minutes. I recruited volunteers though several methods, including posting to a university faculty and staff e-mail discussion list, an e-mail discussion lists aimed toward graduate students, and flyers in the under- graduate library. All means of recruitment stated that the library sought volunteer subjects to perform a variety of sample searches in a possible new library catalog inter- face. I also informed subjects that there was a gift card as a thank you for their time. All subjects had to sign a human subjects statement of informed consent approved by the University of Illinois Institutional Review Board. I sought a diverse sample, and therefore accepted the first five volunteers from the following pools: faculty and staff, graduate students, and undergraduate students. I felt that these three user groups were distinct enough to war- rant having separate pools. The number of five users in each group was chosen because of Jakob Nielsen’s statement that five users will find 85 percent of usability problems and that fifteen users will discover all usability problems.13 Although I did not specifically aim to recruit a diverse sample, the sample showed a large diversity in areas including age, library experience, and academic discipline. All subjects stated they had some experience usABilitY OF tHe VuFiND Next-GeNerAtiON ONliNe cAtAlOG | eMANuel 47 though there were questions as to how results were deemed relevant to the search statement as well as how they were ranked. Participants were then asked to look at the right sidebar of the results page, which contains the facets. Most users did not understand the term “facets,” with faculty and staff understanding the term more than graduate and undergraduate students did. One faculty member who understood the term facet noted that “facets are like a diamond with different sides or ways of viewing something.” However, when asked what term would be better to call the limiting options other than facet, several users suggested either calling the facets “categories” or renaming the column “Refine Search,” “Narrow Search,” or “Sort Your Search.” Participants were then asked to find how to see results for other I-Share libraries. Only two faculty members found I-Share results quickly, and just half of the remain- ing participants were able to find the option at all. When asked what would make that option easier to find, most said they liked the wording, but the option needed to stand out more, perhaps with a different colored link or bolder type. Two users thought having the location integrated as a facet would be the most useful way of seeing it. Participants, however, quickly took to using the facets, as they were asked to use the climate change search results to find an electronic book published in 2008. No user had problems with this task, and several remarked that using facets was a lot easier than limiting to format and year before searching. The next task for participants was to open and exam- ine a single record within their original climate change results (see figures 4 and 5). Participants liked the layout, including the cover image with some brief title infor- mation, and a tabbed bar below showing additional information, such as more detailed description, holdings information, a table of contents, reviews, comments, and a link to request the item. Several users remarked that they liked having information contained under tabs, but VuFind organized each tab as a new webpage that made going back to previous tabs or the results page cumber- some. The only problem users had with the information contained within the tabs was the “staff view,” which contained the MARC record information. Most users looked at the MARC record with confusion, including one graduate student who said, “If the staff view is of no use to the user, why even have it there?” One other useful feature that individual records in VuFind contain is a link to an overlay window containing the full citation infor- mation for the item in both APA and MLA formats. Users were able to find this “Cite This” link and liked having that information available. However, several participants noted that citation information would be much more ben- eficial if it could be easily exported to Refworks or other bibliographic software. The next several searches used progressively higher-level member thought most of his searches were too advanced for the VuFind interface and needed options that VuFind did not have, such as limiting a search to an individual library or call number searching. This user did, however, specify that VuFind would be easier to use for a fast and simple search. Other users all responded very favorably to VuFind, liking it better than any other online catalog they have used, with most stating that they wanted it as a permanent addition to the library. The most common responses to Vufind were that the layout is easier on the eyes and displayed data much better than the WebVoyage catalog; there were no comments about actual search results. Several users stated that it was nice to be able to do a broad search and then have all limiting options presented to them as facets, allowing users to both limit after search- ing and letting them browse through a large number of search results. One user, an undergraduate student, stated she liked VuFind because it “was new” and she always wants to try out new things on the Internet. The first section of the usability test asked users to examine both the basic and advanced search options. Users easily recognized how the interface functioned and liked having a single search box as the basic interface, noting that it looked more like a web search engine. They also recognized all of the dropdown menu options and agreed that the options included what they most often searched. However, four users wanted a keyword search. Even though there is not a keyword search in WebVoyage and there is an “all fields” menu option, participants seemed to think of the one box search universally as a keyword search and wanted that to be the default search option. One participant, an international graduate stu- dent, remarked that keyword is more understood by international students than the “all fields” search because, internationally, a field is not a search field but a scholarly field such as education or engineering. In the advanced search, all users thought the search options were clear and liked having icons to depict the var- ious media formats. However, two users did remark that it would be useful to be able to limit by year on the advanced search page. The advanced search also is where the user can select one of seven languages, all of which are consid- ered western languages, including Latin and Russian. Two users, both international graduate students, stated that more languages would be beneficial, especially Asian and more Slavic languages. The University of Illinois has sepa- rate libraries for Asian and Slavic materials, and these two participants said it would be useful to have search options that include the languages served by the libraries. The first task that participants were asked to do was an “all fields” search for “climate change.” They were instructed to look at the results page and an individual record to give feedback as to how they liked the layout and what they thought of the search results. Upon looking at the results, all participants thought they were relevant, 48 iNFOrMAtiON tecHNOlOGY AND liBrAries | MArcH 2011 to items in which James Joyce is the author, no par- ticipant had any problems, though several pointed out that there were three fac- ets using his name—Joyce, James; Joyce, James Avery; and Joyce, J. A.—because of inconsistencies in cataloging (see figure 6). Participants were next asked to search for an audio recording by the art- ist Prince using the basic (single) search box. Most participants did an “all fields” search for Prince and attempted to use the facets to limit by a particu- lar format. All but one was confident that they achieved the proper result, but there was confusion about the for- mat. Some participants were confused as to what format an audio recording was because the correspond- ing facet was for a music recording. A couple of users thought “audio recording” could be a spoken-word record- ing. Most participants preferred that the format facets be more concrete toward a single actual physical format, such as a record, cassette, or a compact disc (see figure 7). Physical formats appeared to resonate more with users than the broad cataloging term of “music recording.” A more specific format type (i.e., compact disc) is contained in the call number and should be straightforward to pull out as a facet. It appears VuFind pulls the format informa- tion from MARC field 245 subfield $h for medium rather than the call number (which at Illinois can specify the format) or the 300 physical description field or another field such as a notes field that some institutions may use to specify the exact format. However, when participants were asked to further use facets to find Prince’s first album, 1978’s For You, limitations with VuFind became more apparent. Each par- ticipant used a different method to search for this album, and none actually found the item either locally or in I-Share, though the item has multiple copies available in both locations. Most participants tried initially limiting by date because they were given that information. However, VuFind’s facets focus on eras rather than specific years, which participants stated was frustrating as many items can fall under a broad era. Also, the era facets brought up many more eras than one would consider an audio research skills and showed problems with both VuFind and the catalog record data. The first search asked participants to do an “all fields” search for James Joyce. All were able to complete the search, but there was notable confusion as to which records were written by James Joyce and which were items about him. About half of the first-page results for this search did not list an author on the results page. VuFind appears to pull the author field on the results page from the 100 field in the MARC record, so if the 700 field is used instead for an editor, this information is not displayed on the results page. Individual records do substitute the 700 field if the 100 field is not present, but this should also be the case on the initial results screen as well. Several users thought it was strange that the results page often did not list the author, but an author was listed in the individual record. Additionally, when asked to use the facets to limit Figure 4. Results Set Figure 5. Record Display Figure 6. Author Facet Figure 7. Format Facet usABilitY OF tHe VuFiND Next-GeNerAtiON ONliNe cAtAlOG | eMANuel 49 about both the reviews and comments that could be seen in the various records participants were asked to examine. Many of the participants wanted more information as to where the reviews came from because this information was not clear. They also wanted to know whether the reviews or comments from catalog users had any type of moderation by a librarian. For the most part, participants liked having reviews inside the catalog records, but they liked having a summary even more. Several users, all graduate students, expressed concern about the objective- ness of having reviews in the catalog, especially because it was not clear who did the review and feared that reviews may interject some bias that had no place in a library cata- log record. One of these participants stated, “If I wanted reviews, I would just go to Amazon. I don’t expect reviews, which can be subjective, to be in a library catalog—that is too commercial.” Several undergraduate participants stated that reviews helped them decide whether the book was something that would be useful to them. The final task of the usability test asked participants to create an account with VuFind because it is not connected to our user database. Most users had no problems finish- ing this task, though they found some problems with the interface. First, it was not clear that users had to create an account and could not log in with their library number as they did in the library’s OPAC. Second, the default field asks users for their barcode, which is not a term used at UIUC (users are assigned a library number). Once logged in, participants were satisfied with the menu options and how their account information was displayed. Finally, participants were asked, while logged in, to search for a favorite book and add it to their favorites list. All users liked the favorites-list feature, and many already knew of ways they could use it, but several wished they could create multiple lists and have the ability to arrange lists in folders. ■■ Discussion Participants thought favorably of the VuFind interface and would use it again. They liked the layout of informa- tion much more than the current WebVoyage interface and thought it was much easier to look at. They also had many comments that the color scheme (yellow and grey) was easier than the blues of the primary library OPAC. VuFind also had more visual elements, such as cover images and icons representing format types that partici- pants also commented on favorably. When asked to compare VuFind to both the WebVoyage catalog and Amazon, only one participant indicated a preference for Amazon, while the rest preferred VuFind. The user who specified Amazon, a faculty member, stated that that was where he always started searching for books; he would then search for specific titles in the recording, such as the 15th century. Granted, the 15th century probably brings up music that originated in that era, not recorded then, but participants wanted the date to correspond to when an item was initially published or released. It appears that VuFind pulls the era facet information from the subject headings and ignores the copyright or issue year. To users, the era facets are not useful for most of their search needs; users would rather limit by copyright or the original date of issue. Another search that further highlighted problems searching for multimedia in VuFind is the title search participants did for Gone with the Wind. Everyone thought this search brought up relevant results, but when asked to determine whether the UIUC library had a copy of the DVD, many users expressed confusion. Once again, the confusion was based on the inability to limit to a specific format. Participants could use the facets to limit to a film or video, but not to a specific format. Several participants stated that they needed specific formats because when they are doing a comparable search, they only want to find DVDs. However, because all film formats are linked together under “Film/Video,” they must to go into indi- vidual records and examine the call number to determine the exact format. Most participants stated clearly that “DVD” needed to be it’s own format facet and that enter- ing a record to find the format required too much effort. Participants also expressed frustration that the call num- ber was the only place to determine specific format and believed that this information should be contained in the brief item information and not buried in the tabbed areas. The frustrations with the lack of specific formats also were evident when participants were asked to do an advanced search for a DVD on public speaking. All users initially thought the advanced search limiter for film/video was sufficient when they first looked at the advanced search options. However, when presented with an actual search (“public speaking”), they found that there should be more options and specific format choices up-front within the advanced search. Another search that participants conducted was an author search for Jack London. They then used the fac- ets to find the book White Fang. This search was chosen because the resulting records are mostly for older mate- rials that often do not contain a lot of the additional information that newer records contain. Participants looked at a specific record and then were asked what they thought of the information that was displayed. Most answered that they would like as much informa- tion as you can give them, but were accepting of missing information. Several participants stated that most people already know this book and thus did not need additional information. However, when pressed as to what informa- tion they would like added to the record, several users stated a summary would be the most useful. Additionally, several users asked for more information 50 iNFOrMAtiON tecHNOlOGY AND liBrAries | MArcH 2011 the simplicity of the favorites listing feature, the difficulty of linking to other I-Share library holdings, and the dif- ficulties in using the facet categories. ■■ Implications I intend to continue to perform similar usability tests on next-generation catalogs on a trial basis to examine one aspect regarding the future of online catalogs at UIUC. UIUC is looking at various catalog interfaces, of which VuFind is one option, to see which best meets the needs of our users. Users stated multiple times during testing that they find the current WebVoyage interface to be very frustrating and will accept nearly anything that is an improvement, even if the new interface has some usabil- ity issues. VuFind is not perfect for all searches, as shown by a lack of a call number search and the limitations in searching for multimedia options, but it does provide a more intuitive interface for most patrons. The future of VuFind at UIUC is still open. Development is currently stalled because of a lack of developer updates and internal staffing constraints both at UIUC and CARLI. However, because VuFind is open–source, and the only ongoing cost is that of server maintenance, both CARLI and the library are continuing to display it as an option for searching the catalog. Both CARLI and UIUC are closely examining other options for catalog interfaces that would provide patrons with a better search experience, but they have taken no further action to permanently adapt either VuFind or to demo other options. Despite its limitations, VuFind is still a viable option for libraries with substantial technology expertise that are interested in a next-generation catalog interface at a low price. Although it does have limitations, it has a bet- ter out-of-the-box interface than traditional OPACs and should be considered alongside commercial options for any library thinking of adapting a catalog interface overlay. This usability test focused on one institution’s installation of VuFind, which may or may not apply to other installa- tions and other institutional needs. It would be interesting to study an installation of VuFind at a smaller, nonresearch institution, where users have different searching needs and expectations related to a library’s OPAC. References 1. John Houser, “The VuFind Implementation at Villanova University,” Library Hi Tech 27, no. 1 (2009): 96–105. 2. VuFind, “VuFind: About,” http://www.vufind.org/about .php (accessed Sept. 10 2009). 3. Kathleen Bauer, “Yale University VuFind Test— Undergraduates,” http://www.library.yale.edu/libepub/ usability/studies/summary_undergraduate.doc (accessed Mar. 20, 2010). library catalog to check availability. Other participants who made comments about Amazon stated that it was commercial and more about marketing materials, while the library catalog just provided the basic information needed to evaluate materials without attempting to sell them to you. Several participants also stated they checked Amazon for book information, but generally did not like it because of its commercial nature; because VuFind pro- vides much of the same information as Amazon, they will use VuFind first in the future. Participants also thought Amazon was for a popular and not scholarly audience, making it not useful for academic purposes. Most users did not have much to say about the WebVoyage OPAC, except it was overwhelming, had too many words on the result screen, and was not pleasantly visual. Participants were also asked to look at VuFind, Amazon, and WebVoyage from a visual preference. Again, participants believed that VuFind had the best layout. They liked that VuFind had a very clean and uncluttered interface and that the colors were few and easy on the eye. They also commented about the visuals contained (cover art and icons) in the records and the vertical orientation of VuFind (WebVoyage has a horizontal orientation) to display records. They also liked how the facets were dis- played, though two users thought they would be better situated on the left side of the results because they scan websites from the left to the right. The one thing that was mentioned several times was VuFind’s lack of the star rating system that Amazon uses to quickly rate an item. Participants thought such a system might be better than reviews because it allows users to quickly scan through the item and not have to read through multiple reviews. When asked to rate the ease of use for VuFind, with 1 being easy and 5 being difficult, participants rated it an average of 1.92. Faculty rated the ease at 1.6, graduate stu- dents at 1.75, and undergraduates at 2.8. Undergraduates were more likely to get frustrated at media searching and thought that some of the facets related to media items were confusing, which they used to explain their lower scores. However, when asked if they would rather use VuFind over the current library catalog (WebVoyage), all but one participant enthusiastically stated they would use VuFind. Most users stated that although VuFind was not perfect, it was still much better than the other library catalog because of the better layout, visuals, and ability to limit results. The only user that specified they would still rather use the WebVoyage catalog believed it had more options for advanced search, such as call number search- ing, which VuFind lacked. There are, however, several changes that could make VuFind more useful to our users that came out of usabil- ity testing. Some of these are easy to implement on a local level, and others would improve the base build of VuFind. A number of issues arose from usability testing, but the largest issues are the lack of Refworks integration, usABilitY OF tHe VuFiND Next-GeNerAtiON ONliNe cAtAlOG | eMANuel 51 9. Jennifer Bowen, “Metadata to Support Next-Generation Library Resource Discovery: Lessons from the eXtensible Catalog, Phase 1,” Information Technology & Libraries 27, no. 2 (2008): 6–19. 10. Tom Steele, “The New Cooperative Cataloging,” Library Hi Tech 27, no. 1 (2009): 68–77. 11. Ian Rowlands and David Nicholas, “Understanding Information Behaviour: How do Students and Faculty Find Books?” Journal of Academic Librarianship 34, no. 1 (2008): 3–15. 12. Ja Mi and Cathy Weng, “Revitalizing the Library OPAC: Interface, Searching, and Display Challengers,” Information Technology & Libraries 27, no. 1 (2008): 5–22. 13. Jakob Nielsen, “Why You Only Need to Test with 5 Users,” http://www.useit.com/alertbox/20000319.html (accessed Mar. 20, 2010). 4. Christine Borgman, “Why are Online Catalogs Still Hard to Use?” Journal of the American Society for Information Science 47, no. 7 (1996): 493–503. 5. Georgia Briscoe, Karne Selden, and Cheryl Rae Nyberg, “The Catalog versus the Home Page: Best Practices for Connecting to Online Resources,” Law Library Journal 95, no. 2 (2003): 151–74. 6. Kristin Antelman, Emily Lynema, and Andrew K. Pace, “Toward a Twenty-First Century Library Catalog,” Information Technology & Libraries 25, no. 3 (2006): 128–39. 7. Marshall Breeding, “Library Technology Guides: Discovery Layer Interfaces,” http://www.librarytechnology. org/discovery.pl?SID=20100322930450439 (accessed Mar. 2010). 8. Karen M. Spicher, “The Development of the MARC Format,” Cataloging & Classification Quaterly 21, no 3/4 (1996): 75–90. Appendix. VuFind Usability Study Logging Sheets I. The Look and Feel of VuFind A. Basic Screen (the VuFind main page) 1) Is it obvious what to do? Yes _____ No _____; What were you trying to do? 2) Open the drop down box, examine the options. Do you recognize theseoptions? Yes _____ No _____ Some _____ (If some, find out what the patron was expecting and get suggestions for improvement). Comments: B. Click on the Advanced Search option—take a minute to allow the participants to look around the screen 1) Examine each of the Advanced Search options a) Are the advanced search options clear? Yes_____ No_____ b) Are the advance search options helpful? Yes_____No_____ 2) Examine the Limits fields, open the drop-down menu boxes a) Are the limits clearly identified? Yes _____ No _____ b) Are the pictures helpful? Yes _____ No _____ c) Are the drop-down menu box options clear? Yes _____ No _____ Comments: II. (Back to the) Basic Search Field A. Enter the phrase—climate change (search all fields)—examine the search results 1) Do the records retrieved appear to be relevant to your search statement? Yes _____No _____Don’t Know _____ 2) What information would you like to see in the record? How should it be displayed? 3) Examine the right sidebar. Are the “facets” clear? Yes _____No _____Some, not all _____ 4) If you want to view items from other libraries in your search results, can you find the option? Yes _____No _____ 5) Can you find an electronic book published in 2008? Yes _____No _____Don’t Know _____ Comments: B. Click on the first book record in the original Climate Change search results 1) Is information about the book clearly represented? Yes _____ No _____ 2) Is it clear where to find item? Yes _____ No _____ 3) Look at the Tags. Do you understand what this feature is? Yes _____ No _____ Comments: C. Look at the brief item information provided on the screen 1) Is the information displayed useful in determining the scope and content of the item? Yes _____No _____ 2) Are the topics in the record useful for finding additional information on the topic? Yes _____No _____ Comments: D. Click on each button below the brief record information 1) Is this information useful? Yes _____ No _____ 2) Are the names for the tabs accurate? What should they be named? E. Can you easily determine where the item is located and how to request it? Yes _____No _____ Comments: F. Go back to the basic search box and enter the author James Joyce (all fields) as a new search 1) Is it easy to distinguish items by James Joyce from items about James Joyce? Yes _____No _____ 2) Using the facets, can you find only titles with James Joyce as author? Yes _____No _____ 3) Can you find out how to cite an item? Yes _____ No _____ Comments: 52 iNFOrMAtiON tecHNOlOGY AND liBrAries | MArcH 2011 G. Now try to find an audio recording by the artist Prince using Basic Search Were you successful? Yes _____No _____ H. Find the earliest Prince recording ( “For You”; 1978). Is it in the local collection? Yes _____ No _____ If not, can you get a copy? Comments: III. In the Advanced Search Screen: A. Use the title drop down to find the item: Gone with the Wind 1) Were you successful? Yes _____ No _____ Not Sure _____ 2) Can you locate a DVD of the same title? Yes _____ No _____ 3) Are copies of the DVD available in the University of Illinois Library? Yes _____ No _____ Comments: B. Use the author drop down in the Advanced Search to locate titles by: Jack London Using the facets, find and open the record for the Jack London novel, White Fang. Explore each of the: Description, Holdings, and Comments tabs: 1) Is this information useful? Yes _____ No _____ 2) Would you change the names of the tabs or the information on them? 3) Other than your local library copy of White Fang, can you find copies at other libraries? Yes _____ No _____ Comments: C. Using the Advanced Search, find a DVD on public speaking (Hint: use the limit box to select the film/video format) Are there instructional videos in the University of Illinois library? Yes _____ No _____ 1) Identify the author that’s responsible for one of the DVDs 2) Can you easily find other works by this author? Yes _____ No _____ Comments: IV. Exploring the Account features: A. Click on Login in the upper right corner of the page. On the next page, create an account. Is it clear how to create an account? Yes _____ No _____ B. Once you have your account and are logged in to VuFind, look at the menu on the right hand side. Is it clear what each of the menu items are? Yes _____ No _____ C. While still logged in, do a search for your favorite book and add it to your favorites list. Is this tool useful, would you consider using it? Yes _____ No _____ Comments: V. Comparing VuFind to other resources: A. Open three browser windows (this is easiest in Firefox by entering Ctrl-T for each new window) with 1) Your Library Catalog 2) VuFind 3) Amazon.com Enter global warming in each website in the basic search window of each. Based on your initial reactions, which service appears the best for most of your uses? Library Catalog _____ VuFind _____ Amazon _____ Comments: C. Do you have a preference in the display formats? Library Catalog _____ VuFind _____ Amazon _____ Comments: Debriefing Now that you have used VuFind, how would you rate it—on a scale from 1–5, from easy to confusing to use? Comments? How does it compare to other library catalogs you’ve used? If VuFind and your home library catalog were available side-by-side, which would you use first? Why? Are you familiar with any of these other products: Aquabrowser _____ GoogleBooks _____ Microsoft Live Search _____ LibraryThing _____Amazon.com _____Other preferred service _____ That’s it! Thank you for participating in our usability. You will be receiving one other survey through email, we appreciate your opinions on the VuFind product. LITA covers 2, 3, and 4 Index to Advertisers 3093 ---- Searchable Signatures: Context and the Struggle for Recognition Gina Schlesselman- Tarango INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2013 5 ABSTRACT Social networking sites made possible through Web 2.0 allow for unique user-generated tags called “searchable signatures.” These tags move beyond the descriptive and act as means for users to assert online individual and group identities. This paper presents a study of searchable signatures on the Instagram application, demonstrating that these types of tags are valuable not only because they allow for both individuals and groups to engage in what social theorist Axel Honneth calls the “struggle for recognition,” but also because they provide contextual use data and sociohistorical information so important to the understanding of digital objects. Methods for the gathering and display of searchable signatures in digital library environments are also explored. INTRODUCTION A comparison of user-generated tags with metadata traditionally assigned to digital objects suggests that social network platforms provide an intersubjective space for what social theorist Axel Honneth has termed the “struggle for recognition.” 1 Social network users, through the creation of identity-based tags—or what can be understood as “searchable signatures”—are able to assert and perform online selves and are thus able to demand, or struggle for, recognition within a larger social framework. Baroncelli and Freitas cogently argue that Web 2.0, or the interactive online social arena, in fact functions as a “recognition market in which contemporary individuals . . . trade personal worth through displays and exchanges of . . . self-presentations.” 2 A comparison of a metadata schema used in Yale University’s Digital Images Database with user- generated tags accompanying shared photographs on the social networking platform Instagram demonstrates that searchable signatures are unique to social networking sites. As phenomena that allow for public presentations of disembodied selves, searchable signatures thus provide specific information about the context of the digital images with which they are associated. Capturing context remains a challenge for those working with digital collections, but searchable signatures allow viewers to derive valuable use data and sociohistorical information to better understand the world in which digital images originated and exist. LITERATURE REVIEW Web 2.0 Identities and Recognition Theory While Web 2.0 can be imagined as a highly collaborative space where social actors are able to Gina Schlesselman-Tarango (gina.schlesselman@du.edu) holds a Master of Social Sciences from the University of Colorado Denver and is currently an MLIS candidate at University of Colorado. mailto:gina.schlesselman@du.edu SEARCHABLE SIGNATURES: CONTEXT AND THE STRUGGLE FOR RECOGNITION | SCHLESSELMAN-TARANGO 6 communicate to the world new identities, some warn that this communication is somehow engineered and performed. Van Dijck, in an analysis of social media, argues that it is indeed “publicity strategies [that] mediate the norms for sociality and connectivity,” and Baroncelli and Freitas note that Web 2.0 allows people to make themselves visible through modes of spectacularization.3 Though his focus is on the spectacle in fin de siècle France, Clark provides some insight into the effects of spectacularization on the individual. 4 Working within a historical materialist framework, Clark points that with the growth of capitalism, the individual has become colonized. 5 Clark further describes this colonization as “massive internal extension of the capitalist market—the invasion and restructuring of whole areas of free time, private life, leisure, and personal expression . . . the making-into-commodities of whole areas of social practice which had once been referred to casually as everyday life.” 6 Here, Web 2.0 is not a liberatory tool but instead a space where users are colonized to the extent that they create selves exchanged through social networking sites owned by capitalist enterprises. Web 2.0, then, has created a situation in which personal time and identification can be successfully commodified. Baroncelli and Freitas conclude, “From that formula, personal life becomes a capital to be shared with other people—preferably, with a large audience.” 7 The problem, then, is that one’s existence is defined simply “by being seen by others” and can no longer be understood as authentic.8 Despite the sophistication of the argument detailed above, there are some who view the online self, created through Web 2.0, as a legitimate and authentic identity. In an account of the online self, Hongladarom summarizes this position, noting that both offline and virtual identities are constructed in social environments. 9 For Hongladarom, these identities are not different in essence because “what it is to be a person . . . is constituted by external factors.” 10 The online world as an external factor has the ability to affirm one’s existence, regardless of whether that existence is physical or virtual. In sum, it is the social other and not a material existence that is the authenticating factor in identity formation. There are others who validate the role that spectacle—or what also can be understood as performance—plays in identity formation. Pearson calls on the work of Goffman to argue, “identity-as-performance is seen as part of the flow of social interaction as individuals construct identity performances fitting their milieu.” 11 For Pearson, the identity is always performed, be it through Web 2.0 or otherwise. There is nothing particularly worrisome, then, about the effects of Web 2.0 on the self, nor does Web 2.0 threaten the authenticity of the self. Identity is always performed and is in a sense a spectacle—this does not mean, however, that identity in itself is spurious. It is with this perspective of the online self as a performed albeit authentic identity that this paper further develops. Before a thorough analysis of the searchable signature as an online self can be conducted, a deeper understanding of Honneth’s theory of recognition is in order. INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2013 7 In his 1995 work The Struggle for Recognition: The Moral Grammar of Social Conflicts, Honneth sets out to develop a social theory based on what he calls “morally motivated struggle.” 12 Based on the Habermasian concept of communicative action, Honneth contends that it is through mutual recognition that “one can develop a practical relation-to-self [and can] view oneself from the normative perspective of one’s partners in interaction, as their social addressee.” 13 Relation-to- self is key for Honneth, and he argues that a healthy relation-to-self, or what can be thought of as self-esteem, is developed when one is seen as valuable by others. Beyond self-esteem, Honneth points that the success of social life itself depends on “symmetrical esteem between individualized (and autonomous) subjects.” 14 For Honneth, this “symmetrical esteem” can lead to solidarity between individuals. “Relationships of this sort,” he explains, “can be said to be cases of ‘solidarity’ because they inspire not just passive tolerance but felt concern for what is individual and particular about the other person.” 15 That is to say that felt concern for another allows one to see the specific traits of the other as valuable in working towards common goals, and Honneth imagines that in situations of “symmetrical esteem . . . every subject is free from being collectively denigrated, so that one is given the chance to experience oneself to be recognized, in light of one’s own accomplishments and abilities, as valuable for society.” 16 Until this ideal is realized, however, individuals must find sites in which to struggle to be recognized as valuable social assets. According to Baroncelli and Freitas, it is in fact Web 2.0 that provides the arena where “the contemporary demand for the visibility of the self” is able to flourish. 17 They position this argument within Honneth’s framework, asserting that the visibility of self is “directed towards a quest for recognition,” and they thus conclude that Web 2.0 can be understood as a “recognition market.” 18 Context and Its Importance Capturing and integrating markers of context into records, according to Chowdhury, still present a challenge for many.19 “There is now a general consensus that the major challenge facing a digital library as well as a digital preservation program is that it must describe its content as well as the context sufficiently well to allow its correct interpretation by the current and future generations of users,” he contends.20 Context in itself is difficult to define, let alone its myriad facets that might or might not facilitate better understanding of digital objects. Dervin, in her exploration of the meaning of context, points that it is often conceptualized as the “container in which the phenomenon resides.” 21 She points that the list of factors that constitute the container and might be considered contextual is in fact “inexhaustible”—items on this list, for example, might include the gender, race, and ethnicity of those involved in a phenomenon. 22 In an indexing or digital collection environment, the goal is to determine which of these many factors ought be included in a record to best allow for discovery and use. SEARCHABLE SIGNATURES: CONTEXT AND THE STRUGGLE FOR RECOGNITION | SCHLESSELMAN-TARANGO 8 Others imagine context as a fluid, ever-changing process rather than as a static container of data. “In this framework,” Dervin writes, “reality is in a continuous and always incomplete process of becoming.” 23 This understanding of context as changing is helpful for those working with objects that live in digital environments, especially Web 2.0. Certainly the interactive nature of the web has created room for a variety of users to create, share, appropriate, comment on, tag, reject, celebrate, and ultimately understand images in a multitude of contexts that might be different from one moment to the next. There are many reasons to include contextual information in records of digital objects. Lee argues that by providing context, or what he describes as the “social and documentary” world “in which [a digital object] is embedded,” future users will be able to better understand the “details of our current lives.” 24 Further, Lee contends that context is helpful in that is illustrates the ways in which a digital object is related to other materials: Relationships to other digital objects can dramatically affect the ways in which digital objects have been perceived and experienced. In order for a future user to make sense of a digital object, it could be useful for that user to know precisely what set of . . . representations—e.g. titles, tags, captions, annotations, image thumbnails, video keyframes—were associated with a digital object at a given point in time. 25 The user-generated tag, then, is a valuable representation that provides contextual information surrounding the perception and experience of the image with which it is directly related. DISCUSSION User-Generated Tags and Traditional Metadata User-generated tags have been hailed as an important stage in the evolution of image description and are said to have the potential to shape controlled vocabularies used in traditional metadata schemas. For example, in a comparison of Flickr tags and index terms from the University of St. Andrews Library Photographic Archive, Rorissa stresses the importance of exploring similarities and differences between indexers’ and users’ language, noting that “social tagging could serve as a platform on which to build future indexing systems.” 26 Like others, Rorissa hopes that continued research into user-generated social tags will be able to “bridge the semantic gap between indexer- assigned terms and users’ search language.” 27 In fact, some are currently utilizing social tags in an effort to describe and facilitate access to collections. One such organization is Steve: The Museum Social Tagging Project, “a place where you can help museums describe their collections by applying keywords, or tags, to objects.” 28 The organization allows users to not only view traditional metadata associated with cultural objects, but also tags generated by others. In an effort to better understand the similarities and differences between user-generated tags and the language used in traditional metadata schemas, one must compare the two systems. INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2013 9 Yale University’s Digital Images Database provides a glimpse at the ways in which traditional metadata schemas are typically used to describe images in digital library settings. Most of the images included in the database are accompanied by descriptive, structural, and administrative metadata. For example, an item entitled “Boy sitting on a stoop holding a pole” (see figure 1) from the university’s collection of 1957–90 Andrews St. George papers provides a digital copy of the image, the image number, name of the creator, date of creation, type of original material, dimensions, copyright information, manuscript group name and number, box and folder numbers, and a credit line.29 The image is further described by the following: “Man in the shed is making homemade bombs. The boy and man are also in image 45350.” 30 Figure 1. “Boy sitting on a stoop holding a pole” from Yale University’s Digital Images Database collection of 1957–90 Andrews St. George papers, November 2012. Certainly, such information is useful in library environments and provides users with helpful and formatted data to best guide the information discovery process. The finding aid for the Andrews St. George collection is additionally helpful in that it includes information about provenance, access, processing, associated materials, and the creator; it also contains descriptive information about the collection by box and folder number. 31 However, if additional use data and sociohistorical SEARCHABLE SIGNATURES: CONTEXT AND THE STRUGGLE FOR RECOGNITION | SCHLESSELMAN-TARANGO 10 information specific to this individual item were available, it would be most helpful in assisting users in determining the image’s greater context. A study of modes of participation on social networking sites suggests that it is now possible to supply such contextual information for digital objects that live in interactive online environments. A useful site for exploring user-generated tags associated with images is Instagram, a social application designed for iPhone and Android.32 Instagram users are able to upload and edit photos, and other users can then view, like, and comment on the shared photos. Instagram users are able to follow other users and search for photos by the creator’s username or by accompanying tags. Instagram, owned by Facebook, is interoperable with other social networking sites, and users have the ability to share their photos on Facebook, Flickr, Tumblr, and Twitter. As of July 2012, it was reported that Instagram had 80 million users, and in September 2012, the New York Times reported that 5 billion photos were shared through the application.33 Users are limited to 30 tags per photo, and Instagram suggests that users be as specific as possible when describing an image with a tag so that communities of users with similar interests can form.34 Many tags, like the information included in traditional metadata schemas, aim to best describe an image by explaining its content; for example, one user assigned the tags #kids, #nieces, #nephews, and #family to a photograph of a group of smiling children (see figure 2). Like the information accompanying the photograph in the Yale University Digital Images Database, such tags provide users and viewers with tools to better determine the “aboutness” of the image at hand. INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2013 11 Figure 2. Photo shared on Instagram assigning both descriptive tags and the searchable signature #proudaunt, November 2012. However, Instagram users are repurposing the tagging function in a way that is unique to social networking sites. In addition to the descriptive tags assigned to the image of the children described above, the user also tagged the photo with the term #proudaunt (see figure 2). There is, however, no aunt (what can be assumed to be an adult female) in the photograph. This tag, then, functions to further identify the user who created or shared the photograph and does not describe the content of the image at hand. A search of the same tag, #proudaunt, demonstrates that this user is not alone in identifying as such: in November 2012, this search returned 40,202 images with the same tag and more than 58,000 images with tags derived from the same phrase (#proudaunty, #proudauntie, #proudaunties, #proudauntiemoment, and #proudaunti) (see figure 3). Figure 3. List of results from #proudaunt hashtag search on Instagram, November 2012. This type of user-generated tag—one that identifies the creator or sharer of the photograph yet is not necessarily meant to describe the content of the image—can be understood as a searchable signature. Such identity-based tags are not found within Yale University’s Digital Images Database; the closest relative of the searchable signature is the creator’s name. While searchable, this name is not alternative, or secondary, and it was not created and does not exist in a social environment. SEARCHABLE SIGNATURES: CONTEXT AND THE STRUGGLE FOR RECOGNITION | SCHLESSELMAN-TARANGO 12 Currently, born-digital objects are often created and shared in a technological milieu that allows for the assignment of user-generated tags. Consequently, the integration of the searchable signature into the presentation of digital objects has become part of accepted social practice and offers unique opportunities for digital library curators and users alike. Until quite recently, most materials—be they photographs, manuscripts, or government documents—were not born in digital environments. However, digitization projects have been undertaken to ensure that such historical materials are more widely and eternally available. These reborn digital objects, then, have been and can be integrated into dynamic social environments. Steve: The Museum Social Tagging Project, mentioned earlier in this paper, is one example of an organization that has capitalized on the social practice of user-generated tagging and is using descriptive tags along with traditional metadata to better describe reborn digital objects. It is important, then, to explore what (if any) implications the application of the searchable signature, a unique type of user-generated tag, has for historical objects that are later integrated into digital environments. Searchable signatures associated with born digital images on social networking sites contain valuable information about their creators, users, and the images’ context. One cannot ignore that users will, if given the chance, also likely apply signatures to reborn digital objects in similar ways that they do to objects that have always existed in social environments. Since the searchable signature is used to identify not only digital image creators, but also sharers, and if these signatures do in fact provide important insight into the sharers and their motivations, then these signatures are not to be ignored. Rather than focusing on the creating, the lens through which to understand the searchable signature for reborn digital objects can be shifted to the social act of sharing: by whom, when, in which social environments, and for what purposes. A deeper analysis of the presentation of self through the searchable signature and the role that the signature plays in providing valuable contextual information for both born- and reborn-digital objects is developed below. Searchable Signatures and the Struggle for Recognition If Web 2.0 indeed functions as a recognition market, then social media and social networking sites might appear to be tables at such a market. Placing oneself behind a table—be it Facebook, Twitter, or Instagram—the user is able to perform his or her online identity to passersby and effectively struggle to be recognized as a unique individual or as a member of a social group. These performances, which could be deemed narcissistic in nature, can alternatively be read as healthy attempts to self-actualize and connect to larger society.35 One such “table” in the recognition market is Instagram. Beyond Instagram’s social nature that allows participants to interact with and follow one another, the specific role of the searchable signature is of interest to those who are concerned with struggles for recognition. Rather than describing shared images, searchable signatures reflect performative yet authentic user identities. INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2013 13 McCune, in a case study of consumer production on Instagram, acknowledges the potential of the tag to not only facilitate image exchange but to communicate users’ positions as members of social groups.36 Through a simple search of tags, users who identify as, for example, “cat ladies,” are able to validate their identities when they see that there are many others who use the same or similar language in demonstrations of the self (see figure 4). Other signatures such as #proudaunt, while not necessarily playful, still function to provide viewers with additional information about the Instagram user that cannot be determined through the photo itself. The ability to find images based on these searchable signatures allows users to find others who identify in a like manner and to imagine themselves as part of a larger social group. In effect, searchable signatures allow users to be recognized as social addressees of like-minded others. Positioning oneself within a group must be understood as a struggle for recognition, for to imagine oneself as part of the social fabric is also to see oneself as valuable. Figure 4. List of results from #catlady hashtag search on Instagram, November 2012. Enabled by Web 2.0, searchable signatures contain potential for marginalized peoples or groups to assert online selves to be seen and ultimately heard in a truly intersubjective landscape. It is not too much of a leap to imagine that searchable signatures might make possible the organization of individuals and groups for political purposes. In fact, in a discussion of social groups, Honneth notes that “the more successful social movements are at drawing the public sphere’s attention to SEARCHABLE SIGNATURES: CONTEXT AND THE STRUGGLE FOR RECOGNITION | SCHLESSELMAN-TARANGO 14 the neglected significance of the traits and abilities they collectively represent, the better their chances of raising the social worth, or indeed, the standing of their members.” 37 Here, searchable signatures might provide such movements with a venue to capture the public’s attention and to effectively struggle for and gain recognition. Searchable Signatures and Context As markers of individual and group identities, searchable signatures are unique in that they provide a snapshot of the multitude of social, historical, political, individual, and interpersonal relationships that ontologize the images with which they are paired. It is this very contextual information that is at times lacking in traditional indexing environments. By examining searchable signatures, experts and users are able to understand which individuals and groups create, use, and identify with certain images. Thus, as markers of self, searchable signatures provide use data for scholars to better investigate which images are important to online individual or group identities. If the searchable signature is used in a political fashion, historians and sociologists might be able to study which types of images, for example, marginalized groups rally around, identify with, and use in their struggles for recognition. Such use data also illuminates how and by whom certain digital images have been appropriated over time. For example, if a picture of a cat is first created or shared via Instagram by an animal rights activist, the image might be accompanied by the searchable signature #humanforcats. This same image, shared by another user months later, might be accompanied by the #catlady signature. Those interested will be able to examine how the same image has been historically used for different purposes and will be better able to grasp the evolving nature of its digital context. In addition to use data, the searchable signature provides insight into the sociohistorical context surrounding digital images. For those who perceive “reality . . . as accessible only (and always incompletely) in context, in specific historicized moments in time space” the searchable signature clarifies and makes more accessible that reality surrounding the digital image. 38 In a traditional library setting, a photo of a cat might be indexed with descriptive subject headings such as “Cat,” “Persian cat,” or “Kitten—Behavior.” However, the searchable signature #catladyforlife provides additional information on how the cat has become, for a certain social group in a specific moment in time, a trope of sorts for those who are proud of not only their relationships with their domestic pets, but of their shared values and lifestyles as well. If a historian were to dig deeper, he or she also might see that “cat lady” has historically been used in a derogatory manner to mark single, unattractive women thought to be crazy and unable to care for the great number of cats they own and that, by (re)claiming this title, women might be engaging in a struggle for recognition that extends beyond mere admiration for felines.39 Chowdhury, in a continued discussion of challenges facing the digital world, asks whether it is “possible to capture the changing context along with the content of each information resource, because as we know the use and importance . . . changes significantly with time.” 40 Additionally, he INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2013 15 asks, “Will it be possible to re-interpret the stored digital content in the light of the changing context and user community, and thereby re-inventing the importance and use of the stored objects?” 41 It is here that the searchable signature offers use data and sociohistorical information to illuminate the (changing) value digital images have for individuals, communities, and society. CONCLUSION Clark argues that representation must be understood within the confines of what he calls “social practice.” 42 Social practice, among other things, can be understood as “the overlap and interference of representations; it is their rearrangement in use.” 43 Representation of self also must be understood within current social practice, and an important facet of today’s practice is Web 2.0. As a social space, Web 2.0 allows for the creation of disembodied self-representations. One type of such representation, the searchable signature, is a phenomenon unique to social networking sites. While many acknowledge the potential of descriptive, user-generated tags to inform or even to be used in conjunction with metadata schemas or controlled vocabularies, Instagram users have created an additional, alternative use for the tag. Rather than simply using tags to describe shared images, they have successfully created a route to online identity formation and recognition. Searchable signatures demonstrate the power of the online self, as they allow users to struggle to be recognized as unique individuals or as parts of larger social groups. These signatures, too, might act as platforms on which social groups can assert their value and thus demand recognition. Additionally, searchable signatures provide contextual information that reflects the social practice in which digital images live. While the capture and integration of such information remains a challenge for those engaged in traditional indexing, Web 2.0 allows for this unique type of user- generated tag and thus provides better understanding of the context surrounding digital images. As to the question of whether searchable signatures can be integrated into existing metadata schemas or be used to inform controlled vocabularies in library environments, it is not unreasonable to suggest that digital objects be accompanied by their supplemental yet valuable representations (e.g., searchable signatures and the like). Many methods exist through which these signatures might be both gathered and displayed. Certainly, a full exploration of such practices is the stuff of future research; however, some initial ideas are detailed below. One method of gathering identity-based tags would involve the active hunting down of searchable signatures. Locating objects on social networking sites that are also in one’s digital collection, the indexer would identify and track associated user-generated searchable signatures. This method would require extreme diligence, high levels of comfort navigating and using Web 2.0, a clear idea of which social networking sites yield the most valuable searchable signatures, and likely one or more full-time staff members devoted to such activities. Even if feed systems were employed for individual digital objects, this method demands much of indexers and would likely not be sustainable over time. SEARCHABLE SIGNATURES: CONTEXT AND THE STRUGGLE FOR RECOGNITION | SCHLESSELMAN-TARANGO 16 A more passive yet efficient way of gathering searchable signatures would simply be to build on methods that have shown to be successful. By creating interactive digital environments that encourage users to assign not only descriptive but also identity-based tags, indexers are freed of the time-consuming task of hunting for searchable signatures on the web. Since searchable signatures have come to be part of online social practice, the assigning of them would likely be familiar to users—initially, libraries might need to prompt users to share signatures or provide them with examples. This gathering tactic could be used to harvest signatures for items that are already part of the library’s digital collection (telling us about signatures used by potential sharers) or as a means to incorporate new digital objects into the collection (telling us about signatures used by both creators and sharers). In both gathering scenarios, indexers might choose to display only the most occurring or what they deem to be the most relevant searchable signatures, or they might choose to display all such tags; decisions such as these will ultimately depend on each institution’s mission and resources. Of course, if a library integrates a born-digital image into its collection and can identify the searchable signatures originally assigned to it via social networking sites or otherwise, this information should also be recorded. Here, users will be able to get a glimpse of the image in its pre-library life. Providing associated usernames, dates posted, and the name of the social networking sites too will assist in providing a more complete picture of the individuals or groups linked to the image. This information can provide valuable data about the information creators and sharers who use specific social platforms. The aim of this paper is to lay the theoretical groundwork to better understand the role of searchable signatures in today’s digital environment as well as the signature’s unique ability to provide context for digital images. Surely, further research into the phenomenon of the searchable signature would demonstrate how it is currently used outside of Instagram or as a political tool. Others might consider examining the username as another arena in which individuals or groups construct and perform online identities and thus engage in struggles for recognition. Usernames also might provide contextual use data and sociohistorical information that inevitably support greater understanding of digital objects. Finally, further research is needed to identify how libraries could utilize the searchable signature in promotional activities and to build and cater to user communities. REFERENCES 1. Axel Honneth, The Struggle for Recognition: The Moral Grammar of Social Conflicts (Cambridge: MIT Press, 1995). 2. Lauane Baroncelli and Andre Freitas, “The Visibility of the Self on the Web: A Struggle for Recognition,” In Proceedings of 3rd ACM International Conference on Web Science, 2011, accessed August 12, 2013, www.websci11.org/fileadmin/websci/Posters/191_paper.pdf. INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2013 17 3. Jose van Dijck, “Facebook as a Tool for Producing Sociality and Connectivity,” Television & New Media 13, no. 2 (2012): 160–76; Baroncelli and Freitas, “The Visibility of the Self.” 4. T. J. Clark, introduction to The Painting of Modern Life: Paris in the Art of Manet and His Followers (Princeton, NJ: Princeton University Press, 1984), 1–22. 5. Ibid. 6. Ibid., 9. 7. Baroncelli and Freitas, “The Visibility of the Self.” 8. Ibid. 9. Soraj Hongladarom, “Personal Identity and the Self in the Online and Offline World,” Minds & Machines 21 (2011): 533–48. 10. Ibid., 541. 11. Erika Pearson, “All the World Wide Web’s a Stage: The Performance of Identity in Online Social Networks,” First Monday 14 (2009), accessed November 9, 2012, www.uic.edu/htbin/cgiwrap/bin/ojs/index.php/fm; Erving Goffman, The Presentation of Self in Everyday Life (Garden City, NY: Doubleday, 1959). 12. Honneth, The Struggle for Recognition, 1. 13. Jurgen Habermas, The Theory of Communicative Action (Boston: Beacon, 1984); Honneth, The Struggle for Recognition, 92. 14. Honneth, The Struggle for Recognition, 129. 15. Ibid. 16. Ibid., 130. 17. Baroncelli and Freitas, “The Visibility of the Self.” 18. Ibid. 19. Gobinda Chowdhury, “From Digital Libraries to Digital Preservation Research: The Importance of Users and Context,” Journal of Documentation 66, no. 2 (2010): 207–23, doi: 10.1108/00220411011023625. 20. Ibid., 217. 21. Brenda Dervin, “Given a Context by Any Other Name: Methodological Tools For Taming the Unruly Beast,” in Information Seeking in Context, ed. Pertti Vakkari et al. (London: Taylor Graham, 1997), 13–38. SEARCHABLE SIGNATURES: CONTEXT AND THE STRUGGLE FOR RECOGNITION | SCHLESSELMAN-TARANGO 18 22. Ibid., 15. 23. Ibid., 18. 24. Christopher A. (Cal) Lee, “A Framework for Contextual Information in Digital Collections,” Journal of Documentation 67 (2011): 95–143. 25. Ibid., 100. 26. Abebe Rorissa, “A Comparative Study of Flickr Tags and Index Terms in a General Image Collection,” Journal of the American Society for Information Science and Technology 61, no. 11 (2010): 2230–42. 27. Ibid., 2239. 28. “Steve Central: Social Tagging for Cultural Collections,” Steve: The Museum Social Tagging Project, accessed December 16, 2012, http://tagger.steve.museum. 29. “Yale University Library Manuscripts & Archives Department,” Yale University Manuscripts & Archives Digital Images Database, last modified April 19, 2012, accessed December 3, 2012, http://images.library.yale.edu/madid. 30. Ibid. 31. “Andrew St. George Papers (MS 1912),” Manuscripts and Archives, Yale University Library, accessed April 30, 2013, http://drs.library.yale.edu:8083/fedoragsearch/rest. 32. “FAQ,” Instagram, accessed November 10, 2012, http://instagram.com/about/faq. 33. Emil Protalinksi, “Instagram Passes 80 Million Users,” CNET, July 6, 2012, accessed November 13, 2012, http://news.cnet.com/8301-1023_3-57480931-93/instagram-passes-80-million- users; Jenna Wortham, “It’s Official: Facebook Closes Its Acquisition of Instagram,” New York Times, September 6, 2012, accessed November 13, 2012, http://bits.blogs.nytimes.com/2012/09/06/its-official-facebook-closes-its-acquisition-of- instagram. 34. “Tagging Your Photos Using #hashtags,” Instagram, accessed November 10, 2012, http://help.instagram.com/customer/portal/articles/95731-tagging-your-photos-using- hashtags; “Instagram Tips: Using Hashtags,” Instagram, accessed November 10, 2012, http://blog.instagram.com/post/17674993957/instagram-tips-using-hashtags. 35. Andrew L. Mendelson and Zizi Papacharissi, “Look at Us: Collective Narcissism in College Student Facebook Photo Galleries,” in A Networked Self: Identity, Community and Culture on Social Network Sites, ed. Zizi Papacharissi (New York: Routledge, 2010), 251–73. 36. Zachary McCune, “Consumer Production in Social Media Networks: A Case Study of the http://tagger.steve.museum/ http://images.library.yale.edu/madid/ http://drs.library.yale.edu:8083/fedoragsearch/rest/ http://instagram.com/about/faq/ http://news.cnet.com/8301-1023_3-57480931-93/instagram-passes-80-million-users/ http://news.cnet.com/8301-1023_3-57480931-93/instagram-passes-80-million-users/ http://bits.blogs.nytimes.com/2012/09/06/its-official-facebook-closes-its-acquisition-of-instagram/ http://bits.blogs.nytimes.com/2012/09/06/its-official-facebook-closes-its-acquisition-of-instagram/ http://help.instagram.com/customer/portal/articles/95731-tagging-your-photos-using-hashtags http://help.instagram.com/customer/portal/articles/95731-tagging-your-photos-using-hashtags http://blog.instagram.com/post/17674993957/instagram-tips-using-hashtags INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2013 19 ‘Instagram’ iPhone App” (master’s dissertation, University of Cambridge, 2011), accessed December 20, 2012, http://thames2thayer.com/portfolio/a-study-of-instagram. 37. Honneth, The Struggle for Recognition, 127. 38. Dervin, “Given a Context by Any Other Name,” 17. 39. Kiri Blakeley, “Crazy Cat Ladies,” Forbes, October 15, 2009, accessed December 4, 2012, www.forbes.com/2009/10/14/crazy-cat-lady-pets-stereotype-forbes-woman-time- felines.html; Crazy Cat Ladies Society & Gentlemen's Auxiliary homepage, accessed December 4, 2012, www.crazycatladies.org. 40. Chowdhury, “From Digital Libraries to Digital Preservation,” 219. 41. Ibid. 42. Clark, introduction to The Painting of Modern Life, 6. 43. Ibid. ACKNOWLEDGMENTS Many thanks to Erin Meyer and Dr. Krystyna Matusiak at the University of Denver for their feedback and guidance. http://thames2thayer.com/portfolio/a-study-of-instagram/ 3123 ---- First Aid Training for Those on the Front Lines: Digital Preservation Needs Survey Results 2012 Jody DeRidder INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2013 18 “The dilemma for the cultural heritage preservation community derives from the lag between immediate need and the long-term transformation of digital preservation expertise.” 1 INTRODUCTION Every day history is being made and recorded in digital form. Every day, more and more digitally captured history disappears completely or becomes inaccessible due to obsolescence of hardware, software, and formats.2 Although it has long been the focus of libraries and archives to retain, organize, and preserve information, these communities face a critical skills gap. 3 Further, the typical library cannot support a true, trusted digital repository compliant with the Open Archival Information System (OAIS) framework.4 Until we have in place the infrastructure, expertise, and resources to distill critical information from the digital deluge and preserve it appropriately, what steps can those in the field take to help mitigate the loss of our cultural heritage? The very “scale of the digital landscape makes it clear that preservation is a process of triage.” 5 While educational systems across the country are scrambling to develop training programs to address the problem, it will be years, if ever, before every cultural heritage institution has at least one of these formally trained employees on staff. Librarians and archivists already in place are wondering what they can do in the meantime. Those on the front lines of this battlefront to save our cultural history need training. Surrounded by content under digitization, digital content coming into special collections and archives, assisting content creators in their research and scholarship, these archivists and librarians need to know what they can do to prevent more critical loss. Even if developing a preservation program is limited to ensuring the digital content survives long enough to be collected by some better-funded agency, capturing records in open standard interoperable technology neutral formats would help to ease later ingest of such content into a trusted digital repository.6 As Molinaro has pointed out, those in the field need “the knowledge and skills to ensure that their projects and programs are well conceived, feasible, and have a solid sustainability plan.” 7 For those on the front lines, digital preservation education needs to be accessible, practical, and targeted to an audience that may have little technical expertise. Since “resources for preservation are meager in small and medium-sized heritage organizations,” 8 such training needs to be free or as low-cost as possible. Jody L. DeRidder (jlderidder@ua.edu) is Head of Digital Services at the University of Alabama Libraries, Tuscaloosa. mailto:jlderidder@ua.edu FIRST AID TRAINING FOR THOSE ON THE FRONT LINES | DERIDDER 19 In an effort to address these needs, the Library of Congress established the Digital Preservation Outreach & Education (DPOE) train-the-trainer network.9 In six one-hour modules,10 this training provides a basic overview of the framework necessary to begin to develop a digital preservation program. The modules formed the basis for three well-attended ASERL webinars in February 2012.11 Attendee feedback after the webinars indicated a deep need for practical, detailed instruction for those in the field. This article reports on the results of a follow-up survey to identify the topics and types of materials most important to webinar attendees and their institutions for digital preservation, in the fall of 2012. APPROACH The survey was open from October 2 until December 15, 2012. Invitations to participate were sent to the following discussion lists: Society of American Archivists (SAA) Archives & Archivists (A&A), SAA Preservation Section Discussion List, SAA Metadata and Digital Object Round Table Discussion List, digital-curation (Google group), Digital Library Federation (DLF-announce), and the Library of Congress Digital Preservation and Outreach (DPOE) general listserv. Each invitation clarified that respondents need not be Association of South Eastern Research Libraries (ASERL) members in order to attend the free webinars or to participate in the survey. The survey consisted of three questions, the first to determine the sources of digital content most important for respondents’ institutions to preserve, and the second to identify the topics of greatest concern to respondents themselves. For these two questions, respondents were asked to rate the options as: • Extremely important • Somewhat important • Maybe of value • Not important at all The first two questions are as follows: Please rate the following sources of digital content in terms of importance for preservation at your institution: • Born-digital institutional records • Born-digital special collections materials • Digitized collections • Digital scholarly content (institutional repository or grey literature) • Digital research data • Web content • Other Please rate the following topics in terms of importance to YOU, for inclusion in future training webinars: INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2013 20 • How to inventory content to be managed for preservation • Developing selection criteria, and setting the scope for what your institution commits to preserving • Selecting storage options and number of copies • Determining what metadata to capture and store • Methods of preservation metadata extraction, creation, and storage • Legal issues surrounding access, use, migration, and storage • Selecting file formats for archiving • Validating files and capturing checksums • Monitoring status of files and media • File conversion and migration issues • Business continuity planning • Security and disaster planning at multiple levels of scope • Self-assessment and external audits of your preservation implementation • Developing your institution's preservation policy and planning team • Planning for provision of access over time • Other After each of these questions, respondents were provided a free text field in which to add additional entries related to the “Other” entry. The last question on the survey asked respondents whether they are members of an ASERL institution, since ASERL is supporting this series of webinars. RESULTS Of the 182 respondents, 37 (20.7 percent) self-identified as ASERL members, 142 (79.3 percent) as non-ASERL members, and three skipped the question. All respondents answered the first two queries. Sources of Digital Content For the complete set of respondents, the top three types of material considered extremely important for preservation were born-digital special collections materials (65 percent, 117 respondents), born-digital institutional records (62.7 percent, 111 respondents), and digitized collections (61.2 percent, 109 respondents). Digital scholarly content, digital research data, and web content trailed in importance, rated extremely important by only 37 percent (64 respondents), 33.9 percent (59 respondents), and 30.6 percent (52 respondents) respectively. In clarification, one respondent listed “born-digital correspondence (e-mail),” another listed “state government digital archival records,” a third asked for instructions for use of “Kodak’s new Asset Protection Film for preservation of moving and still images,” and one specified that by “special collections” she meant “audiovisual.” FIRST AID TRAINING FOR THOSE ON THE FRONT LINES | DERIDDER 21 The concern for A/V materials was echoed by some of the 8 respondents suggesting other content as extremely important: “born-digital moving image preservation” (an ASERL respondent), “best practices for preservation of different audio and video formats” (also an ASERL respondent), “born digital photographs and video of college events,” and a request for an “audio digitization workshop.” Additional “other” entries were copyright pitfalls, data security, and “very practical steps that very small institutions can take to preserve their digital materials (e.g. how to check digital integrity, and how often, selection of storage media, and creation of a ‘dark archive’).” One ASERL respondent indicated that she did not rate “born digital” institutional and special collections materials as extremely important for preservation only because her institution does not yet have a system set up for these, nor do they yet collect many born-digital special collections. She clarified that she does think this is extremely important despite the seeming lack of interest on the part of her institution. Figure 1. Results for all survey respondents indicating sources of digital content of importance for preservation at their institution. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2013 22 In comparing the responses to the first question by whether the respondents self-identified as members of an ASERL institution (37 respondents as opposed to 142), those who did considered born-digital special collections materials far more important (73 percent, 27 respondents) than non-ASERL respondents (62.9 percent, 88 respondents), but this still was rated most important by both groups. Second for ASERL respondents was digitized collections (69.4 percent, 25 respondents) whereas born-digital institutional records held second place for non-ASERL respondents (62 percent, 85 respondents). Third and fourth-ranked material sources for ASERL respondents were born-digital institutional records (64.9 percent, 24 respondents) and digital scholarly content (63.9 percent, 23 respondents); digital research data only rated 52.8 percent (19 respondents). Non-ASERL respondents considered digitized collections the third most important source of digital content for preservation (59.7 percent, 83 respondents), and this group of respondents was far less concerned with digital scholarly content (29.9 percent, 40 respondents) or digital research data (29.6 percent, 40 respondents) than the ASERL respondents. Web content ranked lowest for both groups: 29.4 percent (10) ASERL respondents and 30.6 percent (41) non- ASERL respondents considered this content extremely important. Figure 2. Results for ASERL survey respondents indicating sources of digital content of importance for preservation at their institution. FIRST AID TRAINING FOR THOSE ON THE FRONT LINES | DERIDDER 23 Figure 3. Results for non-ASERL survey respondents indicating sources of digital content of importance for preservation at their institution. Perhaps most surprising was that 20 non-ASERL respondents (14.8 percent) rated digital research data as “not important at all” for preservation at their institutions, but this may be reflective of their type of institution. Museums and historical societies, non-research institutions, and government agencies likely are not concerned with research data; this theory seems to be supported by the 12.7 percent (17) non-ASERL respondents who rated digital scholarly content as “not important at all.” In comparison, only one ASERL respondent (2.8 percent) indicated that research data had no importance to his institution for preservation (0 for digital scholarly content). This may simply reflect a lack of awareness of current issues on the part of the respondent. Topics of Interest Both groups of respondents agreed on the three most important topics for future training webinars. “Methods of preservation metadata extraction, creation and storage” led the way with 77.3 percent (140 respondents: 70.3 percent or 26 ASERL and 79.4 percent or 112 non-ASERL) INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2013 24 listing this as extremely important. Next was “Determining what metadata to capture and store” (68 percent, 96 respondents: 62.2 percent or 23 ASERL and 66.7 percent or 120 non-ASERL). The third most important topic is “Planning for provision of access over time” at 65.4 percent (117 respondents: 1.1 percent or 22 ASERL and 65.7 percent or 92 non-ASERL). Figure 4. Results for all survey respondents indicating topics of importance to them, for future training webinars. Fourth in importance overall was “file conversion and migration issues” (58.8 percent, 107 respondents: 54.1 percent or 20 ASERL and 60.6 percent or 86 non-ASERL), though the ASERL respondents thought this topic was slightly less critical than “developing selection criteria, and setting the scope for what your institution commits to preserving” (56.8 percent, 21 respondents as opposed to 49.6 percent or 70 non-ASERL respondents; overall percentage 51.9 percent, 94 respondents). Close in relative importance were “validating files and capturing checksums” (53.9 percent, 97 respondents), “monitoring status of files and media” (52.8 percent, 95 respondents), and “developing your institution’s preservation policy and planning team” (51.1 percent, 92 FIRST AID TRAINING FOR THOSE ON THE FRONT LINES | DERIDDER 25 respondents). Interestingly, however, “validating files and capturing checksums” is far more important to non-ASERL respondents (53.6 percent, 75 respondents) than those from ASERL institutions (only 37.8 percent, 14 respondents). “Legal issues surrounding access, use, migration and storage” is a more important topic for ASERL respondents (51.4 percent, 19 respondents) than non-ASERL (42.8 percent, 77 respondents), and ASERL respondents were more concerned (37.8 percent, 14 respondents) than non-ASERL (33.1 percent, 46 respondents) with “Self- assessment and external audits.” Additionally, “Selecting file formats for archiving” and “Selecting storage options and number of copies” is more important for non-ASERL (47.5 percent, 67 respondents and 47.9 percent, 67 respondents) than ASERL respondents (35.1 percent, 13 respondents and 32.4 percent, 12 respondents, respectively). Figure 5. Results for ASERL survey respondents indicating topics of importance to them, for future training webinars. “Security and disaster planning” was ranked extremely important by only 32.6 percent (45) respondents overall, followed by “Business continuity planning” at only 29.2 percent (40) respondents. The latter may reflect a lack of widespread awareness of just how critical the loss of INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2013 26 a single key employee can be, especially in smaller institutions. It also seems clear that there’s a level of complacency or sense of security about our ephemeral digital content that may be in error. Then again, it is quite possible that the respondents are not administrators and feel they do not have the power in their organizations to address such issues. Figure 6. Results for non-ASERL survey respondents indicating topics of importance to them, for future training webinars. Additional topics considered extremely important to respondents are as follows, listed in the free text area (the last four by ASERL members): • "Clean" work station setup—hardware & software for ingest, virus scan, checksum, disk image, metadata, conversion, etc. • Integrating tools into your workflow. There is a need to address the nuts and bolts for those of us that are further along in determining the metadata required to capture, selection criteria, and asset audit and preservation policy. FIRST AID TRAINING FOR THOSE ON THE FRONT LINES | DERIDDER 27 • Methods for providing researchers access to born digital content (not necessarily online, could be just in-house). • Strategies for locating digital assets on physical media in large collections that have been using MPLP [“More Product, Less Process”] for decades. • Format determination and successful migration or emulation. • Staff diversity and training. • How to validate files, migrate files, and which born-digital institutional files our special collections needs to be preserving. • Creating and maintaining effective organizational models for digital preservation (i.e. collaboration with Central IT and/or external vendors, etc.). • Case studies of digital preservation, establishing workflow of digital preservation. • Web archiving (best practices, alternatives to Archive-It, methods of selection, etc.). • One (non-ASERL) respondent said it was “somewhat important” to include the topic of “trends for field, future outlook.” CONCLUSIONS The results from this survey are clear: free or low-cost training needs to focus immediately on preservation of born-digital special collections materials, born-digital institutional records, and digitized collections. The topics of prime importance to respondents were “Methods of preservation metadata extraction, creation and storage,” “Determining what metadata to capture and store,” and “Planning for provision of access over time.” The variations in ratings between respondents from self-identifying as ASERL members versus non-ASERL members indicates that the needs of those in research libraries differs somewhat from that of cultural heritage institutions in the field dealing with “the long tail” of digital content. 12 Future training may need to target these differing audiences appropriately to ensure these needs are met. Additionally, administrators need to be addressed as a unique audience in order to focus on the requirements for addressing “Security and disaster planning” and “Business continuity planning,” as these critical areas need to be developed by those in management positions. Future surveys of this nature should include a component to determine the level of technical expertise and support the respondents have, as well as a measure of their position or power in the administrative hierarchy. Continued surveys would be extremely helpful in ensuring that available educational options meet the needs of librarians and archivists in the field. As Molinaro has pointed out, “Getting the right information in the right hands at the right time is a problem that has plagued the library community for decades.” 13 Now is the time to develop free, openly available, practical digital preservation training for those on the front lines, if we are to retain critical cultural heritage materials which are only available in digital form. For them to effectively perform necessary triage on incoming digital content, they must be trained in “first aid.” Our history is at stake. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2013 28 REFERENCES 1. Paul Conway, “Preservation in the Age of Google: Digitization, Digital Preservation, and Dilemmas,” Library Quarterly 80, no. 1 (January 2010): 73–74, doi:10.1086/64846.3. 2. Clifford Lynch, “Challenges and Opportunities for Digital Stewardship in the Era of Hope and Crisis” (keynote speech, IS&T Archiving 2009 Conference, Arlington, Virginia, May 2009). 3. Karen F. Gracy and Miriam B. Kahn, “Preservation in the Digital Age,” American Library Association, Library Resources and Technical Services 56, no. 1 (2012): 30. 4. Marshall Breeding, “From Disaster Recovery to Digital Preservation,” Computers In Libraries 32, no. 4 (2012): 25. 5. Mike Kastellec, “Practical Limits to the Scope of Digital Preservation,” Information Technology & Libraries 31, no. 2 (2012): 70, doi:10.6017/ital.v31i2.2167. 6. Charles Dollar and Lori Ashley, “Digital Preservation Capability Maturity Model,” Ver. 2.4, (November 2012), https://docs.google.com/file/d/0BwbqtwrvKHokRXNVNmhXTmo2SUU/edit?pli=1 (accessed Dec. 24, 2012). 7. Mary Molinaro, “How Do You Know What You Don’t Know? Digital Preservation Education,” Information Standards Quarterly 22, no. 2 (2010): 45. 8. Conway, “Preservation in the Age of Google,” 70. 9. Library of Congress, “Digital Preservation Outreach & Education: DPOE Background,” accessed December 31, 2012, www.digitalpreservation.gov/education/background.html. 10. Library of Congress, “Digital Preservation Outreach & Education: DPOE Curriculum,” accessed December 31, 2012, www.digitalpreservation.gov/education/curriculum.html. 11. Jody L. DeRidder, “Introduction to Digital Preservation—A Three-Part Series Based on the Digital Preservation, Outreach and Education (DPOE) Model,” Association of Southeastern Research Libraries, 2012, [archived webinars], accessed December 31, 2012, www.aserl.org/archive. 12. Jody L. DeRidder, “Benign Neglect: Developing Life Rafts for Digital Content,” Information Technology & Libraries 30:2 (June 2011): 71–74. 13. Molinaro, “How Do You Know What You Don’t Know?” 47. https://docs.google.com/file/d/0BwbqtwrvKHokRXNVNmhXTmo2SUU/edit?pli=1 http://www.digitalpreservation.gov/education/background.html http://www.digitalpreservation.gov/education/curriculum.html http://www.aserl.org/archive/ 3388 ---- A Comparative Analysis of the Effect of the Integrated Library System on Staffing Models in Academic Libraries Ping Fu and Moira Fitzgerald INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2013 47 ABSTRACT This analysis compares how the traditional integrated library system (ILS) and the next-generation ILS may impact system and technical services staffing models at academic libraries. The method used in this analysis is to select two categories of ILSs—two well-established traditional ILSs and three leading next-generation ILSs—and compare them by focusing on two aspects: (1) software architecture and (2) workflows and functionality. The results of the analysis suggest that the next- generation ILS could have substantial implications for library systems and technical staffing models in particular, suggesting that library staffing models could be redesigned and key librarian and staff positions redefined to meet the opportunities and challenges brought on by the next-generation ILS. INTRODUCTION Today, many academic libraries are using well-established traditional integrated library systems (ILSs) built on the client-server computing model. The client-server model aims to distribute applications that partition tasks or workloads between the central server of a library automation system and all the personal computers throughout the library that access the system. The client applications are installed on the personal computers and provide a user-friendly interface to library staff. However, this model may not significantly reduce workload for the central servers and may increase overall operating costs because of the need to maintain and update the client software across a large number of personal computers throughout the library. 1 Since the global financial crisis, libraries have been facing severe budget cuts, while hardware maintenance, software maintenance, and software licensing costs continue to rise. The technology adopted by the traditional ILS was developed more than ten years ago and is evidently outdated. The traditional ILS does not have sufficient capacity to provide efficient processing for meeting the changing needs and challenges of today’s libraries, such as managing a wide variety of licensed electronic resources and collaborating, cooperating, and sharing resources with different libraries.2 Ping Fu (pingfu@cwu.edu), a LITA member, is Associate Professor and Head of Technology Services in the Brooks Library, Central Washington University, Ellensburg, WA. Moira Fitzgerald (moira.fitzgerald@yale.edu), a LITA member, is Access Librarian and Assistant Head of Access Services in the Beinecke Rare Book and Manuscript Library, Yale University, New Haven, CT. mailto:pingfu@cwu.edu mailto:moira.fitzgerald@yale.edu A COMPARATIVE ANALYSIS OF THE EFFECT OF THE INTEGRATED LIBRARY SYSTEM ON STAFFING MODELS IN ACADEMIC LIBRARIES | FU AND FITZGERALD 48 Today’s libraries manage a wide range of licensed electronic resource subscriptions and purchases. The traditional ILS is able to maintain the subscription records and payment histories but is unable to manage details about trial subscriptions, license negotiations, license terms, and use restrictions. Some vendors have developed electronic resources management system (ERMS) products as standalone products or as fully integrated components of an ILS. However, it would be more efficient to manage print and electronic resources using a single, unified workflow and interface. To reduce costs, today’s libraries not only band together in consortia for cooperative resource purchasing and sharing, but often also want to operate one “shared ILS” for managing, building, and sharing the combined collections of members.3 Such consortia are seeking a new ILS that exceeds traditional ILS capabilities and uses new methods to deliver improved services. The new ILS should be more cost effective, should provide prospects for cooperative collection development, and should facilitate collaborative approaches to technical services and resource sharing. One example of a consortium seeking a new ILS is the Orbis Cascade Alliance, which includes thirty-seven universities, colleges, and community colleges in Oregon, Washington, and Idaho. As a response to this need, many vendors have started to reintegrate or reinvent their ILSs. Library communities have expressed interest in the new characteristics of these next-generation ILSs; their ability to manage print materials, electronic resources, and digital materials within a unified system and a cloud-computing environment is particularly welcome.4 However, one big question remains for libraries and librarians, and that is what implications the next-generation ILS will have on libraries’ staffing models. Little on this topic has been presented in the library literature. This comparative analysis intends to answer this question by comparing the next- generation ILS with the traditional ILS from two perspectives: (1) software architecture, and (2) workflows and functionality, including the capacity to facilitate collaboration between libraries and engage users. SCOPE AND PURPOSE The purpose of the analysis is to determine what potential effect the next-generation ILS will have on library systems and technical services staffing models in general. Two categories of ILSs were chosen and compared. The first category consists of two major traditional ILSs: Ex Libris’s Voyager and Innovative Interfaces’ Millennium. The second category includes three next- generation ILSs: Ex Libris’s Alma, OCLC’s WorldShare Management Services (WMS), and Innovative Interfaces’ Sierra. Voyager and Millennium were chosen because they hold a large portion of current market shares and because the authors have experience with these systems. Yale University Library is currently using Voyager, while Central Washington University Library is using Millennium. Alma, WMS, and Sierra were chosen because these three next-generation ILSs are produced by market leaders in the library automation industry. The authors have learned about these new products by reading and analyzing literature and vendors’ proposals, as well as INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2013 49 attending vendors’ webinars and product demonstrations. In the long run, Yale University Library must look for a new library service platform to replace Voyager, Verde, MetaLib, SFX, and other add-ons. Central Washington University Library is affiliated with the Orbis Cascade Alliance mentioned above. The Alliance is implementing a new library management service to be shared by all thirty-seven members of the consortium. Ex Libris, Innovative Interfaces, OCLC, and Serials Solutions all bid for the Alliance’s shared ILS. After an extensive RFP process, in July 2012 the Orbis Cascade Alliance decided to choose Ex Libris’s Alma and Primo as their shared library services platform. The system will be implemented in four cohorts of approximately nine member libraries each over a two-year period, beginning in January 2013. The Central Washington University Library is in the forth migration cohort, and their new system will be live in December 2014. It is important to emphasize that the next-generation ILS has no local Online Public Access Catalog (OPAC) interface. Vendors use additional discovery products as the discovery-layer interfaces for their next-generation ILSs. Specifically, Ex Libris uses Primo as the OPAC for Alma, while OCLC’s WorldCat Local provides the front-end interface for WMS. Innovative Interfaces offers Encore as the discovery layer for Sierra. As front-end systems, these discovery platforms provide library users with one-stop access to their library resources, including print materials, electronic resources, and digital materials. While these discovery platforms will also impact library organization and librarianship, they will have more impact on the way that end-users, rather than library staff, discover and interact with library collections. In this analysis, we focus on the effects that back-end systems such as Alma, WMS, and Sierra will have on library organizational structure and staffing, rather than the end-user experience. As our sample only includes five ILSs, the scope of the analysis is limited, and the findings cannot be universal or extended to all academic libraries. However, readers will gain some insight into what challenges any library may face when migrating to a next-generation ILS. LITERATURE REVIEW A few studies have been published on library staffing models. Patricia Ingersoll and John Culshaw’s 2004 book about systems librarianship describes vital roles that systems librarians play, with responsibilities in the areas of planning, staffing, communication, development, service and support, training, physical space, and daily operations. 5 Systems librarians are the experts who understand both library and information technology and can put the two fields together to context. They point out that system librarians are the key players who ensure that a library stays current with new information technology. The daily and periodic operations for systems librarians include ILS administration, server management, workstation maintenance, software and applications maintenance and upgrades, configuration, patch management, data backup, printing issues, security, and inventory. All of these duties together constitute the workloads of systems librarians. Ingersoll and Culshaw also emphasize that systems librarians must be proactive in facing constant changes and keep abreast of emerging library technologies. A COMPARATIVE ANALYSIS OF THE EFFECT OF THE INTEGRATED LIBRARY SYSTEM ON STAFFING MODELS IN ACADEMIC LIBRARIES | FU AND FITZGERALD 50 Edward Iglesias et al., based on their own experiences and observations at their respective institutions, studied the impact of information technology on systems staff.6 Their book covers concepts such as the client-server computing model, Web 2.0, electronic resource management, open-source, and emerging information technologies. Their 2004 studies show that, tough there are many challenges inherent in the position, there are also many ways for system staff to improve their knowledge, skills, and abilities to adapt to the changing information technologies. Janet Guinea has also studied the roles of systems librarians at an academic library.7 Her 2003 study shows that systems librarians act as bridge-builders between the library and other university units in the development of library-initiated projects and in the promotion of information technology-based applications across campus. Another relevant study was conducted by Marshall Breeding at Vanderbilt University in an investigation of the library automation market. His 2012 study compares the well-established, traditional ILSs that dominate the current market (and are based on client-server computing architecture developed more than a decade ago) to the next-generation ILSs deployed through multitenant Software-as-a-Service (SaaS) models, which are based on service-oriented architecture (SOA).8 Through this comparison, Breeding indicates that next-generation ILSs will differ substantially from existing traditional ILSs and will eliminate many hardware and maintenance investments for libraries. The next-generation ILS will bring traditional ILS functions, ERMS, digital asset management, link resolvers, discovery layers, and other add-on products together into one unified service platform, he argues.9 He gave the next-generation ILS a new term, library services platform.10 This term signifies that a conceptual and technical shift is happening: the next-generation ILS is designed to realign traditional library functions and simplify library operations through a more inclusive platform designed to handle different forms of content within a unified single interface. Breeding’s findings conclude that the next-generation ILS provides significant innovations, including management of print and electronic library materials, reliance on global knowledge bases instead of localized databases, deployment through multitenant SaaS based on a service-oriented architecture, and the provision of a suite of application programming interfaces (APIs) that enable greater interoperability and extensibility.11 He also predicts that the next-generation ILS will trigger a new round of ILS migration.12 METHOD Our method narrowed down the analysis for the implications of ILSs on library systems and technical services staffing models to two major aspects: (1) software architecture, and (2) workflows and functionality, including facilitation of collaborations between libraries and user engagement. First, we analyzed two traditional ILSs, Voyager and Millennium, which are built on a client-server computing model, deliver modular workflow functionality, and are implemented in our institutions. Through the analysis, we determined how these two aspects affect library organizational structure and librarian positions designed for managing these modular tasks. Then, INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2013 51 based on information we collected and grouped from vendors’ documents, RFP responses, product demonstrations, and webinars, we examined the next-generation ILSs Alma, WMS, and Sierra— which are based on SOA and intended to realign traditional library functions and simplify library operations—to evaluate how these two factors will impact staffing models. To provide a more in-depth analysis, particularly for systems staffing models, we also gathered and analyzed online systems librarian job postings, particularly for managing the Voyager or Millennium system, for the past five years. The purpose of this compilation is to cull a list of typical responsibilities of systems librarians and then determine what changes may occur when they must manage a next-generation ILS such as Alma, WMS, or Sierra. Data on job postings were gathered from online job banks that keep an archive of past listings, including code4lib jobs, ALA JobLIST, and various university job listing sites. Duplicates and reposts were removed. The responsibilities and duties described in the job descriptions were examined for similarities to determine a typical list. The data from all sources were gathered together in a single database to facilitate its organization and manipulation. Specific responsibilities, such as administering an ILS, were listed individually, while more general responsibilities for which descriptions may vary from one posting to another were grouped under an appropriate heading. To ensure complete coverage, all postings were examined a second time after all categories had been determined. We also used our own institutions as examples to support the analysis. The Implications of ILS Software Architecture on Staffing Models Voyager and Millennium are built on client-server architecture. Libraries that use these ILSs also use add-ons, such as ERMS and link resolvers, to manage their print materials and licensed electronic resources. The installation, configuration, and updates of the client software require a significant amount of work for library IT staff. Many libraries must allocate substantial staff effort and resources to coordinating the installation of the new software on all computers throughout the library that access the system. Those libraries that allow staff to work remotely have experienced additional costs and IT challenges. In addition, server maintenance, backups, upgrades, and disaster recovery also require excessive time and effort of library IT staff. Administering ILSs, ERMS, and other library hardware, software, and applications is one of the primary responsibilities for a library systems department. Positions such as systems librarian, electronic resource librarian, and library IT specialist were created to handle this complicated work. At a very large library, such as Yale University Library, the systems group of library IT is only responsible for Voyager’s configuration, operation, maintenance, and troubleshooting. Two other IT support groups—a library server support group and a workstation support group—are responsible for installation, maintenance, and upgrade of the servers and workstations. Specifically, the library server support group deals with the maintenance and upgrade of ILS servers and the software and relational database running on the servers, while the workstation support group takes care of the installation and upgrade of the client software on hundreds of A COMPARATIVE ANALYSIS OF THE EFFECT OF THE INTEGRATED LIBRARY SYSTEM ON STAFFING MODELS IN ACADEMIC LIBRARIES | FU AND FITZGERALD 52 workstations throughout twenty physical libraries. At a smaller library, such as Central Washington University Library, on the other hand, one systems librarian is responsible for the administration of Millennium, including configuration, maintenance, backup, and upgrade on the server. Another library IT staff member helps install and upgrade the Millennium client on about forty-five staff computers throughout its main library and two center campus libraries. Comparatively, the next-generation ILSs Alma, WMS, and Sierra have a SaaS model designed by SOA principles and deployed through a cloud-based infrastructure. OCLC defines this model as “Web-scale Management Services.”13 Using this innovation, service providers are able to deliver services to their participating member institutions on a single, highly scalable platform, where all updates and enhancements can be done automatically through the Internet. The different participating member institutions using the service can configure and customize their views of the application with their own brandings, color themes, and navigational controls. The participating member institutions are able to set functional preferences and policies according to their local needs. Web-scale services reduce the total cost of ownership by spreading infrastructure costs across all the participating member institutions. The service providers have complete control over hardware and software for all participating member institutions, dramatically eliminating capital investments on local hardware, software, and other peripheral services. Service providers can centrally implement applications and upgrades, integration across services, and system-wide infrastructure requirements such as performance reliability, security, privacy, and redundancy. Thus participating member institutions are relieved from this burdensome responsibility that has traditionally been undertaken by their IT staff.14 From this perspective, the next-generation ILS will have a huge impact on library organizational structure, staffing, and librarianship. Since the next-generation ILS is implemented through the cloud-computing model, there is no requirement for local staff to perform the functions traditionally defined as “systems” staff activities, such as server and storage administration, backup and recovery administration, and server-side network administration. For example, the entire interfaces of Alma and WMS are served via web browser; there is no need for local staff to install and maintain clients on local workstations. Therefore, if an institution decided to migrate to a next-generation ILS, the responsibilities and roles of systems staff within the institution would need to be readdressed or redefined. We have learned from attending OCLC’s webinars and product demonstrations that library systems staff would be required to prepare and extract data from their local systems during new systems implementation. They also would be required to configure their own settings such as circulation policies. However, after the migration, a systems staff member would likely serve as a liaison with the vendor. This would require, according to OCLC’s proposal, only 10 percent of the systems staff’s time on an ongoing basis. Through attending Ex Libris’s webinars and product demonstrations, we have learned that a local system administrator may be required to take on basic management processes, such as record-loading or integrating data from other campus systems. Similarly, we have learned from Innovative Interfaces’ webinars and product demonstrations that Sierra would still need local systems INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2013 53 expertise to perform the installations of the client software on staff workstations. Sierra would require library IT staff to perform administrative tasks like the user account administration and to support Sierra in interfacing with local institution-specific resources. In general, as shown in table 1, local systems staff could be freed from the burdensome responsibility of administering the traditional ILS because of the software architecture of the next- generation ILS. Systems Librarian Responsibilities Workload Percentage Traditional ILS Next- gen ILS Managing ILS Applications, including modules and the OPAC 10 X Managing associated products such as discovery systems, ERMs, link resolver, etc. 10 X Day-to-day operations including management maintenance, troubleshooting, and user support 10 X X Server maintenance, database maintenance and backup 10 X Customizations and integrations 5 X X Configurations 5 X X Upgrades and enhancements 5 X Patches or other fixes 5 X Design and coordination of statistical and managerial reports 5 X X Overall staff training 5 X X Primary representative and contact to the designated library system vendors 5 X X Keeping abreast of developments in library technologies to maintain current awareness of information tools 5 X X Engaging in scholarly pursuit and other professional activities 10 X X Serving on various teams and committees 5 X X Reference and instruction 5 X X Total 100 100% 60% Table 1. Systems librarian responsibilities comparison for traditional ILS and next-generation ILS. Note: The systems librarian responsibilities and the approximate percentage of time devoted to each function are slightly readjusted based on the compiled descriptions of the systems librarian job postings we collected and analyzed from the Internet and from vendors’ claims. A total of 47 position A COMPARATIVE ANALYSIS OF THE EFFECT OF THE INTEGRATED LIBRARY SYSTEM ON STAFFING MODELS IN ACADEMIC LIBRARIES | FU AND FITZGERALD 54 descriptions were gathered. The workload percentage is adopted from the job description of the systems librarian position at one of our institutions. Our analysis shows that systems staff might reduce their workload by approximately 40 percent. Therefore library systems staff could use their time to focus on local applications development and other library priority projects. However, it is important to emphasize that library systems staff should reengineer themselves by learning how to use APIs provided by the next-generation ILS so that they will be able to support the customization of their institutions’ discovery interfaces and the integration of the ILS with other local enterprise systems, such as financial management systems, learning management systems, and other local applications. The Implications of ILS Workflows and Functionality on Staffing Models The typical workflow and functionality of both Voyager and Millennium are built on a modular structure. Major function modules, called client modules, include Systems Administration, Cataloging, Acquisitions, Serials, Circulation, and Statistics and Reports. Additionally, the traditional ILS provides an OPAC interface for library patrons to access library materials and manage their accounts. Millennium has an ERMS module built in as a component of their ILS while Ex Libris has developed an independent ERMS as an add-on to Voyager. The Systems Administration module is used to add system users and to set up locations, patron types, material types, and other library policies. The Cataloging module supports the functions of cataloging resources, managing the authority files, tagging and categorizing content, and importing and exporting bibliographic records. The sophistication of the Cataloging module depends primarily on the ILS. The Acquisitions module helps in the tracking of purchases and acquisition of materials for a library by facilitating ordering, invoicing, and data exchange with serial, book, and media vendors through electronic data interchange (EDI). The Circulation module is used to set up rules for circulating materials and for tracking those materials, allowing the library to add patrons, issue borrowing cards, and form loan rules. It also automates the placing of holds, interlibrary loan (ILL), and course reserves. Self-checkout functionality can be integrated as well. The Serials module is essentially a cataloging module for serials. Libraries are often dependent on the Serials module to help them track and check-in serials. The Statistics and Reports module is used to generate reports such as circulation statistics, age of collection, collection development, and other customized statistical reports. A typical traditional ILS comprises a relational database, software to interact with that database, and two graphical user interfaces—one for patrons and one for staff. It usually separates software functions into discrete modules, each of them integrated with a unified interface. The traditional ILS’s modular design was a perfect fit for a traditional library organizational structure. The staff at Central Washington University library, for example, under the library administration, are organized into the following three major groups: public services, including the Reference and Circulation Departments; technical and technology services, including the Cataloging, Collection Development, Serials & Electronic Resource, and Systems Departments; and INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2013 55 other library services and centers, including the Government Documents Department, the Music Library, two center campus libraries, the Academic and Research Commons, and the Rare Book Collection & Archive. Each department has at least one professional librarian and other library staff members responsible for their daily operations. For example, the collection development librarian is responsible for the acquisition of print monographs and serials, while the electronic resource librarian is responsible for purchasing and managing licensed databases or e-journals. However, the next-generation ILS significantly enhances and reintegrates the workflow of traditional ILS functions. The functionality is quite different from the traditional ILS’s modular structure. The design of the functionality stresses two principles: modularity and extensibility. It brings together the selection, acquisition, management, and distribution of the entire library collection. It provides a centralized data-services environment to its unified workflows for all types of library assets. One of the big enhancements of the next-generation ILS is the Acquisitions module, which enables the management of both print and electronic materials within a single unified interface, with no need to move between modules or multiple systems for different formats and related activities. For example, according to OCLC, WMS streamlines selection and acquisition processes via built-in access to WorldCat records and publisher data. Vendor, local, consortium, and global library data share the same workflows. WMS automatically creates holdings for both physical and electronic resources. The WorldCat knowledge-base simplifies electronic resource management and delivery. Order data from external systems can be automatically uploaded. For consortium users, WMS’s unified workflow and interface fosters efficient resource-sharing between different institutions whose holdings share a common format. Similarly, Ex Libris’s Alma has an integrated Central Knowledge Base (CKB) that describes available electronic resources and packages, so there is no need to load additional descriptive records when acquiring electronic resources based on the CKB. The purchasing workflow manages orders for both print and electronic resources in a very similar way and handles some aspects unique to electronic resources, such as license management and the identification of an access provider. Staff users can start the ordering process by searching the CKB directly and ordering from there. This search is integrated into the repository search, allowing a staff user to perform searches both in his or her institution as well as in the Community Zone, which holds the CKB. The next-generation ILS provides unified data services and workflows, and a single interface to manage all physical, electronic, and digital materials. This will require libraries to rethink their acquisitions staffing models. For example, in small libraries could merge the acquisition librarian position and the electronic resource librarian position or reorganize the two departments. Another functionality enhancement of the next-generation ILS provides the authoritative ability for consortia users to manage local holdings and collections as well as shared resources. For example, WMS’s single shared knowledge base eliminates the need for each library to maintain a copy of a knowledge base locally, because all consortia members can easily see what is licensed by other members of the consortia. Cataloging records are shared at the consortium and global levels A COMPARATIVE ANALYSIS OF THE EFFECT OF THE INTEGRATED LIBRARY SYSTEM ON STAFFING MODELS IN ACADEMIC LIBRARIES | FU AND FITZGERALD 56 in real time. Each institution immediately benefits from original cataloging records added to the system and from enhancements to existing records. Authority control is built into WorldCat, so there is no need to do authority processing against local bibliographic databases. With real-time circulation between libraries’ collections, there is no need to re-create bibliographic and item data in separate local systems. Similarly, Sierra enhances the traditional technical services workflows by providing a shared bibliographic database. Whenever a member library performs selection or ordering, the library is able to determine if other consortia members have already selected, ordered, and cataloged the title. This may impact a local selection, allowing consortia members to more collectively develop their individual collections and reduce duplication. Alma’s centralized Metadata Management Service (MMS) takes a very similar approach to WMS and Sierra, allowing several options for local control and shared cataloging, depending on an institution’s needs, while Ex Libris maintains authority files. Very large institutions, for example, might manage some records in the local catalog and most records in a shared bibliographic database, while smaller institutions might manage all of their records in the shared bibliographic database. All these approaches require more collaboration and cooperation between consortia members. According to vendors’ claims on their proposals to the Orbis Cascade Alliance, small institutions might not need to have a professional cataloger, since the cataloging process is simplified and it is therefore easier for paraprofessional staff to operate and copy bibliographic records from the knowledgebases of these ILSs. In addition, the next-generation ILS also allows library users to actively engage with ILS software development. For example, by adding OpenSocial containers to the product, WMS allows library developers to use API to build social applications called gadgets and add these gadgets to WMS. One example highlighted by OCLC is a gadget in the Acquisitions area of WMS that will show the latest New York Times Best Sellers and how many copies the library has available for each of those titles. Similarly, Sierra’s Open Developer Community will allow library developers to share ideas, reference code samples, and build a wide range of applications using Sierra’s web services. Also, Sierra will provide a centralized online resource called Sierra Developer Sandbox to offer a comprehensive library of documented APIs for library-developed applications. All these enhancements provide library staff with new opportunities to redefine their roles in a library. CONCLUSIONS AND ARGUMENTS In summary, compared to the client-server architecture and modular design of the traditional ILS, the next-generation ILS has an open architecture and is more flexible and unified in its workflow and interface, which will have a huge impact on library staffing models. The traditional ILS specifies clear boundaries between staff modules and workflows while the next-generation ILS has blurred these boundaries. The integration and enhancement of the functionality of the next- generation ILS will help libraries streamline and automate workflows and processes for managing both print and electronic resources. It will increase libraries’ operational efficiency, reduce the INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2013 57 total cost of ownership, and improve services for users. Particularly, it will free approximately 40 percent of library systems staff time from managing servers, software upgrades, client application upgrades, and data backups. Moreover, the next-generation ILS provides a new way for consortial libraries to collaborate, cooperate, and share resources. In addition, the web-scale services provided by the next-generation ILS allow libraries to access an infrastructure and platforms that enable them to reach a broad, geographically diverse community while simultaneously focusing their services on meeting the specific needs of their end-users. Thus the more integrated workflows and functionality allow library staff to work with more modules, play multiple roles, and back up each other, which will bring changes to traditional staffing models. However, the next-generation ILS also brings libraries new challenges along with its clear advantages. Librarians and library staff might have concerns pertaining to their job security and can be fearful of new technologies. They may feel anxious about how to reengineer their business processes, how to get training, how to improve their technological skills, and how to prepare for a transition. We argue here that library directors might think about these staff frustrations and find ways to address their concerns. Libraries should provide staff more opportunities and training to help them to improve their knowledge and skills. Redefining job descriptions and reorganizing library organizational structures might be necessary to better adapt to the changes brought about by the next-generation ILS. Systems staff might invest more time in local application developments, other digital initiatives, website maintenance, and other library priority projects. Technical staff might reconsider their workflows and cross-train themselves to expand their knowledge and improve their work efficiency. They might spend more time on data quality control and special collection development or interact more with faculty on book and e-resource selections. We hope this analysis will provide some useful information and insights for those libraries planning to move to the next-generation ILS. The shift will require academic libraries to reconsider their organizational structures and rethink their manpower distribution and staffing optimization to better focus on library priorities, projects, and services critical to their users. REFERENCES 1. Marshall Breeding, “A Cloudy Forecast for Libraries,” Computers in Libraries 31, no. 7 (2011): 32–34. 2. Marshall Breeding, “Current and Future Trends in Information Technologies for Information Units,” El profesional de la información 21, no. 1 (2012): 11. 3. Jason Vaughan and Kristen Costello, “Management and Support of Shared Integrated Library Systems,” Information Technology & Libraries 30, no. 2 (2011): 62–70. 4. Marshall Breeding, “Agents of Change,” Library Journal 137, no. 6 (2012): 30–36. A COMPARATIVE ANALYSIS OF THE EFFECT OF THE INTEGRATED LIBRARY SYSTEM ON STAFFING MODELS IN ACADEMIC LIBRARIES | FU AND FITZGERALD 58 5. Patricia Ingersoll and John Culshaw, Managing Information Technology: A Handbook for Systems Librarians (Westport, CT: Libraries Unlimited, 2004). 6. Edward G. Iglesias, An Overview of the Changing Role of the Systems Librarian: Systemic Shifts (Oxford, UK: Chandos, 2010). 7. Janet Guinea, “Building Bridges: The Role of the Systems Librarian in a University Library,” Library Hi Tech 21, no. 3 (2003): 325–32. 8. Breeding, “Agents of Change,” 30. 9. Ibid. 10. Ibid., 33. 11. Ibid., 33. 12. Ibid., 30. 13. Sally Bryant and Grace Ye, “Implementing OCLC’s WMS (Web-Scale Management Services) Circulation at Pepperdine University,” Journal of Access Services 9, no. 1 (2012): 1. 14. Gary Garrison et al., “Success Factors for Deploying Cloud Computing,” Communications of the ACM 55, no. 9 (2012): 62–68. 3420 ---- Assessing the Treatment of Patron Privacy in Library 2.0 Literature Michael Zimmer INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2013 29 ABSTRACT As libraries begin to embrace Web 2.0 technologies to serve patrons, ushering in the era of Library 2.0, unique dilemmas arise regarding protection of patron privacy. The norms of Web 2.0 promote the open sharing of information—often personal information—and the design of many Library 2.0 services capitalize on access to patron information and might require additional tracking, collection, and aggregation of patron activities. Thus embracing Library 2.0 potentially threatens the traditional ethics of librarianship, where protecting patron privacy and intellectual freedom has been held paramount. As a step towards informing the decisions to implement Library 2.0 to adequately protect patron privacy, we must first understand how such concerns are being articulated within the professional discourse surrounding these next generation library tools and services. The study presented in this paper aims to determine whether and how issues of patron privacy are introduced, discussed, and settled, if at all, within trade publications utilized by librarians and related information professionals INTRODUCTION In today’s information ecosystem, libraries are at a crossroads: several of the services traditionally provided within their walls are increasingly made available online, often by non-traditional sources, both commercial and amateur, thereby threatening the historical role of the library in collecting, filtering, and delivering information. For example, web search engines provide easy access to millions of pages of information, online databases provide convenient gateways to news, images, videos, as well as scholarship, and large- scale book digitization projects appear poised to make roaming the stacks seem an antiquated notion. Further, the traditional authority and expertise enjoyed by librarians has been challenged by the emergence of automated information filtering and ranking systems, such as Google’s algorithms or Amazon’s recommendation system, as well as amateur, collaborative, and peer- produced knowledge projects, such as Wikipedia, Yahoo! Answers, and Delicious. Meanwhile, the professional, educational, and social spheres of our lives are increasingly intermingled through online social networking spaces such as Facebook, LinkedIn, and Twitter, providing new interfaces for interacting with friends, collaborating with colleagues, and sharing information. Michael Zimmer, PhD, (zimmerm@uwm.edu), a LITA member, is Assistant Professor, School of Information Studies, and Director, Center for Information Policy Research, University of Wisconsin-Milwaukee. mailto:zimmerm@uwm.edu INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2013 30 Libraries face a key question in this new information environment: what is the role of the library in providing access to knowledge in today’s digitally networked world? One answer has been to actively incorporate features of the online world into library services, thereby creating “Library 2.0.” Conceptually, Library 2.0 is rooted in the global Web 2.0 discussion, and the professional literature often links the two concepts. According to O’Reilly, Web 2.0 marks the World Wide Web’s shift from a collection of individual websites to a computing platform that provides applications for end users and can be viewed as a tool for harnessing the collective intelligence of all web users.1 Web 2.0 represents a blurring of the boundaries between web users and producers, consumption and participation, authority and amateurism, play and work, data and the network, reality and virtuality.2 Its rhetoric suggests that everyone can and should use new Internet technologies to organize and share information, to interact within communities, and to express oneself. In short, Web 2.0 promises to empower creativity, to democratize media production, and to celebrate the individual while also relishing the power of collaboration and social networks. Library 2.0 attempts to bring the ideology of Web 2.0 into the sphere of the library. The term is generally attributed to Casey,3 and while over sixty-two distinct viewpoints and seven different definitions of Library 2.0 have been advanced,4 there is general agreement that implementing Library 2.0 technologies and services means bringing interactive, collaborative, and user-centered web-based technologies to library services and collections.5 Examples include • providing synchronous messaging (through instant message platforms, Skype, etc.) to allow patrons to chat with library staff for real-time assistance; • using blogs, wikis, and related user-centered platforms to encourage communication and interaction between library staff and patrons; • allowing users to create personalized subject headings for library materials through social tagging platforms like Delicious or Goodreads; • providing patrons the ability to evaluate and comment on particular items in a library’s collection through rating systems, discussion forums, or comment threads; • using social networking platforms like Facebook or LinkedIn to create online connections to patrons, enabling communication and service delivery online; and • creating dynamic and personalized recommendation systems (“other patrons who checked out this book also borrowed these items”), similar to Amazon and related online services. Launching such Library 2.0 features, however, poses a unique dilemma in the realm of information ethics, especially patron privacy. Traditionally, the context of the library brings with it specific norms of information flow regarding patron activity, including a professional commitment to patron privacy (see, for example, American Library Association’s Privacy Policy, 6 Foerstel,7 Gorman,8 and Morgan 9). In the library, users’ intellectual activities are protected by decades of established norms and practices intended to preserve patron privacy and confidentiality, most ASSESSING THE TREATMENT OF PATRON PRIVACY IN LIBRARY 2.0 LITERATURE | ZIMMER 31 stemming from the ALA’s Library Bill of Rights and related interpretations.10 As a matter of professional ethics, most libraries protect patron privacy by engaging in limited tracking of user activities, having short-term data retention policies (many libraries actually delete the record that a patron ever borrowed a book once it is returned), and generally enable the anonymous browsing of materials (you can walk into a public library, read all day, and walk out, and there is no systematic method of tracking who you are or what you’ve read). These are the existing privacy norms within the library context. Library 2.0 threatens to disrupt these norms. In order to take full advantage of Web 2.0 platforms and technologies to deliver Library 2.0 services, libraries will need to capture and retain personal information from their patrons. Revisiting the examples provided above, each relies on some combination of robust user accounts, personal profiles, and access to flows of patrons’ personal information: • Providing synchronous messaging might necessitate the logging of a patron's name (or chat username), date and time of the request, e-mail or other contact information, and the content of the exchange with the librarian staff member. • Library-hosted blogs or wikis will require patrons to create user accounts, potentially tying posts and comments to patron IP addresses, library accounts, or identities. • Implementing social tagging platforms would similarly require unique user accounts, possibly revealing the tags particular patrons use to label items in the collection and who tagged them. • Comment and rating systems potentially link patrons’ particular interests, likes, and dislikes to a username and account. • Using social networking platforms to communicate and provide services to patrons might result in the library gaining unwanted access to personal information of patrons, including political ideology, sexual orientation, or related sensitive information. • Creating dynamic and personalized recommendation systems requires the wholesale tracking, collecting, aggregating, and processing of patron borrowing histories and related activities. Across these examples, to participate and benefit from Library 2.0 services, library patrons could potentially be required to create user accounts, engage in activities that divulge personal interests and intellectual activities, be subject to tracking and logging of library activities, and risk having various activities and personal details linked to their library patron account. While such Library 2.0 tools and services can greatly improve the delivery of library services and enhance patron activities, the increased need for the tracking, collecting, and retaining of data about patron activities presents a challenge to the traditional librarian ethic regarding patron privacy.11 Despite these concerns, many librarians recognize the need to pursue Library 2.0 initiatives as the best way to serve the changing needs of their patrons and to ensure the library’s continued role in INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2013 32 providing professionally guided access to knowledge. Longitudinal studies of library adoption of Web 2.0 technologies reveal a marked increase in the use of blogs, sharing plugins, and social media between 2008 and 2010.12 In this short amount of time, Library 2.0 has taken hold in hundreds of libraries, and the question before us is not whether libraries will move towards Library 2.0 services, but how they will do it, and, from an ethical perspective, whether the successful implementation of Library 2.0 can take place without threatening the longstanding professional concerns for, and protections of, patron privacy. RESEARCH QUESTIONS Recognizing that Library 2.0 has been implemented, in varying degrees, in hundreds of libraries,13 and is almost certainly being considered at countless more, it is vital to ensure that potential impacts on patron privacy are properly understood and considered. As a step towards informing the decisions to implement Library 2.0 to adequately protect patron privacy, we must first understand how such concerns are being articulated within the professional discourse surrounding these next generation library tools and services. The study presented in this paper aims to determine whether and how issues of patron privacy are introduced, discussed, and settled—if at all—within trade publications utilized by librarians and related information professionals. Specifically, this study asks the following primary research questions: RQ1. Are issues of patron privacy recognized and addressed in literature discussing the implementation of Library 2.0 services? RQ2. When patron privacy is recognized and addressed, how is it articulated? For example, is privacy viewed as a critical concern, as something that we will need to simply “get over,” or as a non-issue? RQ3. What kind of mitigation strategies, if any, are presented to address the privacy issues related to Library 2.0? DATA ANALYSIS The study combines content and textual analyses of articles published in professional publications (not peer-reviewed academic journals) between 2005 and 2011 discussing Library 2.0 or related web-based services, retrieved through the Library, Information Science, and Technology Abstracts (LISTA) and Library Literature & Information Science Full Text Databases. The discovered texts were collected in winter 2011 and coded to reflect the source, author, publication metadata, audience, and other general descriptive data. In total, there were 677 articles identified discussing Library 2.0 and related web-based library services, appearing in over 150 different publications. Of the articles identified, 50 percent of appeared in 18 different publications, which are listed in table 1. ASSESSING THE TREATMENT OF PATRON PRIVACY IN LIBRARY 2.0 LITERATURE | ZIMMER 33 Table 1. Top Publications with Library 2.0 Articles (2005–2011) Publication Count Computers in Libraries Library Journal Information Today Library and Information Update incite Scandinavian Public Library Quarterly American Libraries Electronic Library ONLINE School Library Journal Information Outlook Mississippi Libraries College & Research Library News Library Hi Tech News Library Media Connection CSLA Journal (California School Library Association) Knowledge Quest Multimedia Information and Technology 51 51 21 21 20 18 16 15 14 14 13 13 12 12 12 10 10 8 Each of the 677 source texts was then analyzed to determine if a discussion of privacy was present. Full-text searches were performed on word fragments to ensure the identification of variations in terminology. For example, each text was searched for the fragment “priv” to include hits on both the terms “privacy” and “private.” Additional searchers were performed for word fragments related to “intellectual freedom” and “confidentiality” in order to capture more general considerations related to patron privacy. Of the 677 articles discussing Library 2.0 and related web-based services, there were a total of 203 mentions of privacy or related concepts in 71 articles. These 71 articles were further refined to ensure the appearance of the word “privacy” and related terms were indeed relevant to the ethical issues at hand (eliminating false positives for mentions of “private university,” for example, or mention of a publication’s “privacy policy” that happened to be provided in the PDF searched). The final analysis yielded a total of 39 articles with relevant mention of patron privacy as it relates to Library 2.0, amounting to only 5.8 percent of all articles discussing Library 2.0 (see table 2). A full listing of the articles is in appendix A. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2013 34 Table 2. Article Summary Count % Total articles discussing Library 2.0 Articles with hit in “priv” and related text searches Articles with relevant discussion of privacy 677 71 39 10.5 5.8 The majority of these articles were authored by practicing librarians in both public and academic settings and present arguments for the increased use of Web 2.0 by libraries or highlight successful deployment of Library 2.0 services. Of the 39 articles, only 4 focus primarily on challenges faced by libraries hoping to implement Library 2.0 solutions.14 A textual analysis of the 39 relevant articles was performed to assess how privacy was discussed in each. Two primary variables were evaluated: the length of discussion, and the level of concern. Length of discussion was measured qualitatively as high (concern over privacy is explicit or implicit in over 50 percent of the article’s text), moderate (privacy is discussed in a substantive section of the article), and minimal (privacy is mentioned, but not given significant attention). The level of concern was measured qualitatively as high (indicated privacy as a critical variable for implementing Library 2.0), moderate (recognized privacy as one of a set of important concerns), and minimal (mentioned privacy largely in passing, giving it no particular importance). Results of these analyses are reported in table 3. Table 3. Length of Discussion and Level of Concern Length of Discussion Level of Concern High Moderate Minimal 3 8 28 9 13 16 Of the 39 relevant articles, only three had lengthy discussions of privacy-related issues. As early as 2007, Coombs recognized that the potential for personalization of library services would force libraries to confront existing policies regarding patron privacy. 15 Anderson and Rethlefsen similarly engage in lengthy discussions of the challenges faced by libraries wishing to balance patron privacy with new Web 2.0 tools and services. 16 These three articles represent less than 1 percent of the 677 total articles identified that discussed Library 2.0 While only three articles dedicate lengthy discussions to issues of privacy, over half the articles that mention privacy (21 of 39) indicate a high or moderate level of concern. For example, Cvetkovic warns that while “privacy is a central, core value of libraries…the features of Web 2.0 applications that make them so useful and fun all depend on users sharing private information with the site owners.” 17 And Casey and Savastinuk’s early discussion of Library 2.0 puts these concerns in context for librarians, warning that “libraries should remain as vigilant with ASSESSING THE TREATMENT OF PATRON PRIVACY IN LIBRARY 2.0 LITERATURE | ZIMMER 35 protecting customer privacy with technology-based services as they are with traditional, physical library services.” 18 While 21 articles indicated a high or moderate level of concern over patron privacy, less than half of these provided any kind of solution or strategy for mitigating the privacy concerns related to implementing Library 2.0 technologies. Overall, 14 of the 39 relevant articles provided privacy solutions of one kind or another. Breeding, for example, argues that librarians must “absolutely respect patron privacy,” 19 and suggests any Library 2.0 tools that rely on user data should only be implemented if users must explicitly “opt-in” to having their information collected, a solution also offered by Wisniewski in relation to protecting patron privacy with location-based tools.20 Rethlefsen goes a step further, proposing libraries take steps to increase the literacy of patrons regarding their privacy and the use of Library 2.0 tools, including the use of classes and tutorials to help educate patrons and staff alike. 21 Conversely, Cvetkovic argues that “the place of privacy in our culture is changing,” and that while “in many ways our privacy is diminishing, but many people…seem not too concerned about it.” 22 As a result, while she argues for only voluntary participation in Library 2.0 services, Cvetkovic takes a position that information sharing is becoming the new norm, weakening any absolute position regarding protecting patron privacy above all. DISCUSSION RQ1 asks if issues of patron privacy are recognized and addressed within literature discussing Library 2.0 and related web-based library services. Of the 677 articles published for professional audiences that discuss Library 2.0, only 39 contained a relevant discussion of the privacy issues that stem from this new family of data-intensive technologies, and only 11 of these discussed the issue beyond a passing mention. RQ2 asks how the privacy concerns, when present, are articulated. Of the 39 articles with relevant discussions of privacy, only 11 make more than a minimal mention of privacy concerns. However, the discussion in 22 of the articles reveals a high or moderate level of concern. This suggests that while privacy might not be a primary focus of discussion, when it is mentioned, even minimally, its importance is recognized. Finally, RQ3 seeks to understand if any solutions or mitigation strategies related to the privacy concerns are articulated. With only 14 of the 39 articles providing a means for practitioners to address privacy issues, readers of Library 2.0 publications are more often than not left with no real solutions or roadmaps for dealing with these vital ethical issues. Taken together, the results of this study reveal minimal mention of privacy alongside discussions of Library 2.0. Less than 6 percent of all 677 articles on Library 2.0 include mention of privacy; of these, only 11 make more than a passing mention of privacy, representing less than 2 percent of INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2013 36 all articles. Of the 39 relevant articles, 22 express more than a minimal concern, but of these, only 9 provide any mitigation strategy. These results suggest that while popular publications targeted at information professionals are giving significant attention to potential for Library 2.0 to be a powerful new option for delivering library content and services, there is minimal discussion of how the widespread adoption and implementation of these new tools might impact patron privacy and even less discussion of how to address these concerns. Consequently, as the interest in, and adoption of, Library 2.0 services increase, librarians and related information practitioners seeking information regarding these new technologies in professional publications will not likely be confronted with the possible privacy concerns, nor learn of any strategies to deal with them. This absence of clear guidance for addressing patron privacy in the Library 2.0 era resembles what computer ethicist Jim Moor would describe as a “policy vacuum”: A typical problem in Computer Ethics arises because there is a policy vacuum about how computer technology should be used. Computers provide us with new capabilities and these in turn give us new choices for action. Often, either no policies for conduct in these situations exist or existing policies seem inadequate. A central task of Computer Ethics is to determine what we should do in such cases, that is, formulate policies to guide our actions. 23 Given the potential for the data-intensive nature of Library 2.0 technologies to threaten the longstanding commitment to patron privacy, these results show that work must be done to help fill this vacuum. Education and outreach must be increased to ensure librarians and information professionals are aware of the privacy issues that typically accompany attempts to implement Library 2.0, and additional scholarship must take place to help understand the true nature of any privacy threats and to come up with real and useful solutions to help find the proper balance between enhanced delivery of library services through Web 2.0-based tools and the traditional protection of patron privacy. ACKNOWLEDGEMENTS This research was supported by a Ronald E. McNair Postbaccalaureate Achievement Program summer student research grant,and a UW-Milwaukee School of Information Studies Internal Research Grant. The author thanks Kenneth Blacks, Jeremy Mauger, and Adriana McCleer for their valuable research assistance. ASSESSING THE TREATMENT OF PATRON PRIVACY IN LIBRARY 2.0 LITERATURE | ZIMMER 37 REFERENCES 1. Tim O’Reilly, “What Is Web 2.0? Design Patterns and Business Models for the Next Generation of Software,” 2005, www.oreillynet.com/pub/a/oreilly/tim/news/2005/09/30/what-is-web- 20.html. 2. Michael Zimmer, “Preface: Critical Perspectives on Web 2.0,” First Monday 13, no. 3 (March 2008), http://firstmonday.org/htbin/cgiwrap/bin/ojs/index.php/fm/article/view/2137/1943. 3. Michael Casey, “Working Towards a Definition of Library 2.0,” LibraryCrunch (October 21, 2005), www.librarycrunch.com/2005/10/working_towards_a_definition_o.html. 4. Walt Crawford, “Library 2.0 and ‘Library 2.0,’” Cites & Insights 6, no 2 (Midwinter 2006): 1–32, http://citesandinsights.info/l2a.htm. 5. Michael Casey and Laura Savastinuk, “Library 2.0: Service for the Next-Generation Library,” Library Journal 131, no. 14 (September 1, 2006): 40–42; Michael Casey and Laura Savastinuk, Library 2.0: A Guide to Participatory Library Service (Medford, NJ: Information Today, 2007).; Nancy Courtney, Library 2.0 and Beyond: Innovative Technologies and Tomorrow’s User (Westport, CT: Libraries Unlimited, 2007). 6. American Library Association, “Policy on Confidentiality of Library Records,” www.ala.org/offices/oif/statementspols/otherpolicies/policyconfidentiality. 7. Herbert N. Foerstel, Surveillance in the Stacks: The FBI’s Library Awareness Program (New York: Greenwood, 1991). 8. Michael Gorman, Our Enduring Values: Librarianship in the 21st Century (Chicago: American Library Association, 2000). 9. Candace D. Morgan, “Intellectual Freedom: An Enduring and All-Embracing Concept,” in Intellectual Freedom Manual. (Chicago: American Library Association, 2006). 10. Library Bill of Rights, American Library Association, www.ala.org/advocacy/intfreedom/librarybill; American Library Association, “Privacy: An Interpretation of the Library Bill of Rights,” www.ala.org/Template.cfm?Section=interpretations&Template=/ContentManagement/Conten tDisplay.cfm&ContentID=132904 11. Rory Litwin, “The Central Problem of Library 2.0: Privacy,” Library Juice (May 22, 2006), http://libraryjuicepress.com/blog/?p=68. http://www.oreillynet.com/pub/a/oreilly/tim/news/2005/09/30/what-is-web-20.html http://www.oreillynet.com/pub/a/oreilly/tim/news/2005/09/30/what-is-web-20.html http://firstmonday.org/htbin/cgiwrap/bin/ojs/index.php/fm/article/view/2137/1943 http://www.librarycrunch.com/2005/10/working_towards_a_definition_o.html http://citesandinsights.info/l2a.htm http://www.ala.org/offices/oif/statementspols/otherpolicies/policyconfidentiality http://www.ala.org/advocacy/intfreedom/librarybill http://www.ala.org/Template.cfm?Section=interpretations&Template=/ContentManagement/ContentDisplay.cfm&ContentID=132904 http://www.ala.org/Template.cfm?Section=interpretations&Template=/ContentManagement/ContentDisplay.cfm&ContentID=132904 http://libraryjuicepress.com/blog/?p=68 INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2013 38 12. Zeth Lietzau and Jamie Helgren, U.S. Public Libraries and the Use of Web Technologies, 2010 (Denver: Library Research Service, 2011), www.lrs.org/documents/web20/WebTech2010_CloserLookReport_Final.pdf. 13. Ibid. 14. Sue Anderson, “Libraries Struggle to Balance Privacy and Patron Access,” Alki 24, no. 2 (July 2008): 18–28; Karen Coombs, “Privacy Vs. Personalization,” netConnect (April 15, 2007): 28; Milica Cvetkovic, “Making Web 2.0 Work–From ‘Librarian Habilis’ to ‘Librarian Sapiens,’” Computers in Libraries 29, no. 9 (October 2009): 14–17, www.infotoday.com/cilmag/oct09/Cvetkovic.shtml;, Melissa L. Rethlefsen, “Tools at Work: Facebook’s March on Privacy,” Library Journal 135, no. 12 (June 2010): 34–35. 15. Coombs, “Privacy Vs. Personalization.” 16. Anderson, “Libraries Struggle to Balance Privacy and Patron Access.”; Melissa L Rethlefsen, “Facebook’s March on Privacy,” Library Journal 135, no. 12 (2010): 34–35. 17. Cvetkovic, “Making Web 2.0 Work.” 18. Casey and Savastinuk, “Library 2.0: Service for the Next-Generation Library.” 19. Marshall Breeding, “Taking the Social Web to the Next Level,” Computers in Libraries 30, no. 7 (September 2010): 34–37, www.librarytechnology.org/ltg-displaytext.pl?RC=15053. 20. Jeff Wisniewski, “Location, Location, Location,” Online 33, no. 6 (2009): 54–57. 21. Rethlefsen, “Tools at Work: Facebook’s March on Privacy.” 22. Cvetkovic, “Making Web 2.0 Work,” 17. 23. James Moor, “What Is Computer Ethics?” Metaphilosophy 16, no. 4 (October 1985): 266–75. http://www.lrs.org/documents/web20/WebTech2010_CloserLookReport_Final.pdf http://www.infotoday.com/cilmag/oct09/Cvetkovic.shtml http://www.librarytechnology.org/ltg-displaytext.pl?RC=15053 ASSESSING THE TREATMENT OF PATRON PRIVACY IN LIBRARY 2.0 LITERATURE | ZIMMER 39 Appendix A: Articles with relevant mention of patron privacy as it relates to Library 2.0 Anderson, Sue. “Libraries Struggle to Balance Privacy and Patron Access.” Alki 24, no. 2 (July 2008): 18–28. Balnaves, Edmund. “The Emerging World of Open Source, Library 2.0, and Digital Libraries.” Incite 30, no. 8 (August 2009): 13. Baumbach, Donna J. “Web 2.0 and You.” Knowledge Quest 37, no. 4 (2009): 12–19. Breeding, Marshall. “Taking the Social Web to the Next Level.” Computers in Libraries 30, no. 7 (September 2010): 34–37. Casey, Michael E. and Laura Savastinuk. “Library 2.0: Service for the Next-Generation Library.” Library Journal 131, no. 14 (September 1, 2006): 40–42. Cohen, Sarah F. “Taking 2.0 to the Faculty Why, Who, and How.” College & Research Libraries News 69, no. 8 (September 2008): 472–75. Coombs, Karen. “Privacy Vs. Personalization.” netConnect (April 15, 2007): 28. Coyne, Paul. “Library Services for the Mobile and Social World.” Managing Information 18, no. 1 (2011): 56–58. Cromity, Jamal. “Web 2.0 Tools for Social and Professional Use.” Online 32, no. 5 (October 2008): 30–33. Cvetkovic, Milica. “Making Web 2.0 Work—From ‘Librarian Habilis’ to ‘Librarian Sapiens.’” Computers in Libraries 29, no. 9 (October 2009): 14–17. Eisenberg, Mike. “The Parallel Information Universe.” Library Journal 133, no. 8 (May 1, 2008): 22–25. Gosling, Maryanne, Glenn Harper, and Michelle McLean. “Public Library 2.0: Some Australian Experiences.” Electronic Library 27, no. 5 (2009): 846–55. Han, Zhiping, and Yan Quan Liu. “Web 2.0 Applications in Top Chinese University Libraries.” Library Hi Tech 28, no. 1 (2010): 41–62. Harlan, Mary Ann. “Poetry Slams Go Digital.” CSLA Journal 31, no. 2 (Spring 2008): 20–21. Hedreen, Rebecca C., Jennifer L. Johnson, Mack A. Lundy, Peg Burnette, Carol Perryman, Guus Van Den Brekel, J. J. Jacobson, Matt Gullett, and Kelly Czarnecki. “Exploring Virtual Librarianship: Second Life Library 2.0.” Internet Reference Services Quarterly 13, no. 2–3 (2008): 167–95. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2013 40 Horn, Anne, and Sue Owen. “Leveraging Leverage: How Strategies Can Really Work for You.” In Proceedings of the 29th Annual International Association of Technological University Libraries (IATUL) Conference, Auckland, NZ (2008): 1–10, http://dro.deakin.edu.au/eserv/DU:30016672/horn-leveragingleveragepaper-2008.pdf. Huwe, Terence. “Library 2.0, Meet the ‘Web Squared’ World.” Computers in Libraries 31, no. 3 (April 2011): 24–26. “Idea Generator.” Library Journal 134, no. 5 (1976): 44. Jayasuriya, H. Kumar Percy, and Frances M. Brillantine. “Student Services in the 21st Century: Evolution and Innovation in Discovering Student Needs, Teaching Information Literacy, and Designing Library, 2.0-Based Student Services.” Legal Reference Services Quarterly 26, no. 1–2 (2007): 135–70. Jenda, Claudine A., and Martin Kesselman. “Innovative Library 2.0 Information Technology Applications in Agriculture Libraries.” Agricultural Information Worldwide 1, no. 2 (2008): 52–60. Johnson, Doug. “Library Media Specialists 2.0.” Library Media Connection 24, no.7 (2006): 98. Kent, Philip G. “Enticing the Google Generation: Web 2.0, Social Networking and University Students.” In Proceedings of the 29th Annual International Association of Technological University Libraries (IATUL) Conference, Auckland, NZ (2008), http://eprints.vu.edu.au/800/1/Kent_P_080201_FINAL.pdf. Krishnan, Yyvonne. “Libraries and the Mobile Revolution.” Computers in Libraries 31, no. 3 (April 2011): 5–9. Li, Yiu-On, Irene S. M. Wong, and Loletta P. Y. Chan. “MyLibrary Calendar: A Web 2.0 Communication Platform.” Electronic Library 28, no. 3 (2010): 374–85. Liu, Shu. “Engaging Users: The Future of Academic Library Web Sites.” College & Research Libraries 69, no. 1 (January 2008): 6–27. McLean, Michelle. “Virtual Services on the Edge: Innovative Use of Web Tools in Public Libraries.” Australian Library Journal 57, no. 4 (November 2008): 431–51. Oxford, Sarah. “Being Creative with Web 2.0 in Academic Liaison.” Library & Information Update 5 (May 2009): 40–41. Rethlefsen, Melissa. “Facebook’s March on Privacy.” Library Journal 135, no. 12 (2010): 34–35. Schachter, Debbie. “Adjusting to Changes in User and Client Expectations.” Information Outlook 13, no. 4 (2009): 55. http://dro.deakin.edu.au/eserv/DU:30016672/horn-leveragingleveragepaper-2008.pdf http://eprints.vu.edu.au/800/1/Kent_P_080201_FINAL.pdf ASSESSING THE TREATMENT OF PATRON PRIVACY IN LIBRARY 2.0 LITERATURE | ZIMMER 41 Shippert, Linda Crook. “Thinking About Technology and Change, or, ‘What Do You Mean It’s Already Over?’” PNLA Quarterly 73, no. 2 (2008): 4, 26. Stephens, Michael. “The Ongoing Web Revolution.” Library Technology Reports 43, no. 5 (2007): 10–14. Thornton, Lori. “Facebook for Libraries.” Christian Librarian 52, no. 3 (2009): 112. Trott, Barry and Kate Mediatore. “Stalking the Wild Appeal Factor.” Reference & User Services Quarterly 48, no. 3 (2009): 243–46. Valenza, Joyce Kasman. “A Few New Things.” LMC: Library Media Connection 26, no. 7 (2008): 10– 13. Widdows, Katharine. “Web 2.0 Moves 2.0 Quickly 2.0 Wait: Setting up a Library Facebook Presence at the University of Warwick.” SCONUL Focus 46 (2009): 54–59. Wisniewski, Jeff. “Location, Location, Location.” Online 33, no. 6 (2009): 54–57. Woolley, Rebecca. “Book Review: Information Literacy Meets Library 2.0: Peter Godwin and Jo Parker (eds.).” SCONUL Focus 47, (2009): 55–56. Wyatt, Neal. “2.0 for Readers.” Library Journal 132, no. 18 (2007): 30–33. 3421 ---- Detection of Information Requirements of Researchers Using Bibliometric Analyses to Identify Target Journals Vadim Nikolaevich Gureyev, Nikolai Alekseevich Mazov INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2013 66 ABSTRACT Bibliometric analyses were used to identify journals that are representative of the authors’ research institutes. Methods to semiautomatically collect data for an institute’s publications and which journals they cite are described. Citation analyses of lists of articles and their citations can help librarians to quickly identify the preferred journals in terms of the number of publications, and the most frequently cited journals. Librarians can use these data to generate a list of journals that an institute should subscribe to. BACKGROUND Recent developments in information technology have had a significant impact on the research activities of scientific libraries. Such tools have provided new insights into the workload and duties of librarians in research libraries. In the present study, we performed bibliometric analyses to determine the information needs of researchers, and to determine whether they are satisfied with the journal subscriptions available at their institutes. Such analyses are important because of limited funding for subscriptions, increases in the cost of electronic resources, and the publication of new journals, especially open-access journals. Bibliometric analyses are more accessible and less labor-intensive when using specially designed web services and software. Several databases of citation data are accessible online. The leading publishers of these databases, including Thomson Reuters and Elsevier, promote their products such as the Web of Science (WoS) and Scopus with travelling and online seminars to increase the number of skilled users. Of note, the number of articles devoted to bibliometric analysis has increased about 4-fold since 2000 (see Figure 1). Vadim Nikolaevich Gureyev (gureyev@vector.nsc.ru) is Leading Bibliographer, Information Analysis Department, State Research Center of Virology and Biotechnology Vector, Novosibirsk, Russia. Nikolai Alekseevich Mazov (mazovNA@ipgg.sbras.ru) is Head of Information and Library Center, Trofimuk Institute of Petroleum Geology and Geophysics SB RAS, Novosibirsk, Russia. mailto:mgureyev@vector.nsc.ru mailto:mazovNA@ipgg.sbras.ru INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2013 67 Figure 1. Growth of publications devoted to informetric analysis. Data were generated from the WoS using the following request: «Topic=((bibliometric* or informetric* or webometic* or scientometric*) and (stud* or analys*))». Bibliometric analysis appears to be the most objective method for use by librarians. It is important to note that bibliometric analysis shows high objectivity, even when compared with peer review.1 Citation analysis can be used to select target journals because it accurately reflects the needs of researchers and can reveal current scientific trends. It also allows librarians to evaluate the effectiveness of each journal, the significance of each journal to the institute, and the minimum archival depth.2 Citation analysis is particularly useful when generating a list of journals for subscription and to determine whether to subscribe to specific journals.3 In the present study, we performed citation analyses of scientific papers that were published by researchers at SRC VB “Vector” (biology and medicine) and IPGG SB RAS (geosciences). We analyzed groups of journals that published articles from these two institutes and compared the characteristics of the cited and citing journals. Many journals publish articles covering the fields associated with the two institutes (biology and medicine, and geosciences), and journals in these fields tend to have the highest impact factors of all fields. Therefore, the methods applied in this study and the results may be generalized to other research libraries. DETECTION OF INFORMATION REQUIREMENTS OF RESEARCHERS USING BIBLIOMETRIC ANALYSES TO IDENTIFY TARGET JOURNALS | GUREYEV AND MAZOV 68 STUDY DESIGN Sources. We analyzed articles published in journals or books by researchers at SRC VB “Vector” and IPGG, together with the references cited in these articles. We limited the articles to those published in 2006–2010 (IPGG) or 2007–2011 (SRC VB “Vector”). We did not analyze monographs, theses, or conference proceedings (including those that were published in journals), because our aim was to optimize the list of subscribed journals. To collect comprehensive data regarding these publications, we used four overlapping sources. (1) The Russian Science Citation Index (SCI) was used to retrieve articles based on the profile of each researcher. The “Bound and Unbound Publications in One List” option was switched off. (2) Thomson Reuters SCI Expanded was used to examine the profile of each researcher. The “Conference Proceedings Citation Index” option was switched off. (3) Scopus was used to retrieve the publications for each researcher. (4) Each head of department provided us with the articles published by each member of the research group within the last 5 years. Along with publications in which the affiliation was clearly stated, we also analyzed articles where the authors’ affiliation was not stated, the authors reported a superior organization such as a governmental ministry, and the authors from either institute attributed the work to another affiliation (if they worked at two or more organizations). The translated and original versions of the same article were treated as a single article, and the English version was used in our analyses. For journals that published the original Russian article and an English translation, we analyzed the latter. Citations. Citations from the published articles were analyzed to identify the most frequently cited journals. We ignored references that lacked sufficient information or references included in footnotes. Cited monographs, theses, and conference proceedings (including those that were published in journals) were also ignored. For citations published in Russian with an English translation, we analyzed the translated version, even if the authors originally cited the Russian version. We preferred to include the translated versions because they are included in WoS database and we can treat them automatically. For example, the WoS indexes articles from Russian Geology and Geophysics (print ISSN 1068-7971) but not the Russian-language version Geologiya i Geofizika (print ISSN 0016-7886). Journals that had been renamed were treated as one journal, and the current/most recent name was used in the analysis. However, journals that had split into multiple titles were analyzed separately, and the journal’s name at the time the cited article was published was used in the analysis. For this study, we first retrieved the journal name and the year the cited article was published. We then expanded on this information by recording the journal publisher, journal accessibility (i.e. subscription method, paper or electronic), open/pay-per-view access, embargo status, and journal length. We ignored the accessibility of individual articles that had been deposited in the author’s personal website or in an institutional repository. INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2013 69 RESULTS AND DISCUSSION Table 1 summarizes the publication activities for both institutes. A. Year Number of articles Included in Russian SCI* (%) Included in WoS (%) Included in Scopus* (%) Nowhere indexed (%) Number of journals** 2007 118 94.9 28.8 54.2 5 66 2008 84 96.4 41.6 51.1 3.5 57 2009 82 97.5 39 52.4 2.4 58 2010 100 94.0 41 61 6 60 2011 105 91.4 25.7 55.2 8.5 50 B. Year Number of articles Included in Russian SCI* (%) Included in WoS (%) Included in Scopus* (%) Nowhere indexed (%) Number of journals** 2007 188 79.8 43.1 43.1 21 82 2008 218 96.8 39.4 41.7 3 88 2009 259 93.0 39.0 37.8 7 87 2010 250 84.4 31.2 29.6 5 102 2011 267 70.4 30.4 30.0 29 97 *The Russian SCI and Scopus indexed some articles twice, particularly those published in Russian with an English translation. Therefore, some articles have different timelines and citations. These duplications were analyzed as one article. **Number of journals in this field, excluding translated journals. Table 1. Publication activity and articles presence in the main bibliographic databases in the fields of biomedicine (A; 2007–2011) and geoscience (B; 2007–2011). Table 1 shows that the two institutes have a stable publication history relative to other Russian scientific institutes in terms of publication activity, in publishing approximately 150 articles per year. Therefore, our results can be generalized to other institutes in these fields. Collecting this information may seem to be a daunting task, especially for librarians who have not conducted such analyses before. We used three databases and contacted the heads of department DETECTION OF INFORMATION REQUIREMENTS OF RESEARCHERS USING BIBLIOMETRIC ANALYSES TO IDENTIFY TARGET JOURNALS | GUREYEV AND MAZOV 70 directly. However, our data indicate that it is sufficient to use the free-of-charge Russian SCI, an extensive index of Russian scientific articles that includes almost all of the articles published by Russian researchers in Russian and international journals. Nevertheless, it is essential to review the profile of each author. When searching for articles by affiliation, the number of articles retrieved ranged from 28% to 51%, but the number of publications retrieved tended to decrease over time. This phenomenon may be caused by a deficient system used to identify affiliations because of differences in the spelling of the affiliation name (in our case, more than 70 variants have been used), attribution of the research to a superior organization, and two or more affiliations may have the same name.4 Furthermore, recent studies1,5 confirmed that information about authors should be collated by their affiliations, rather than by performing searches in bibliographic databases. It seems paradoxical that the WoS and Scopus databases index Russian articles quicker than the Russian SCI. By subscribing to the same print and electronic journals, we noted that print editions are published before electronic ones. Nevertheless, this seems reasonable based on this 2-year retrospective analysis. Therefore, routine analysis of Russian articles can be partly automated by efficient searches of the Russian SCI. Table 2 presents the citation details. Citing year Number of references Number of cited journals Average number of references in article 2007 1830 492 15.5 2008 1354 472 16.1 2009 1536 558 18.7 2010 1591 471 15.9 2011 1613 484 15.4 Table 2. Number of cited articles, cited journals, and mean number of references in article in the biomedical field References from articles not indexed in WoS were manually extracted, which takes time and effort. References from articles indexed in the WoS, including Russian articles translated into English, were analyzed semiautomatically. For this purpose, we used EndNote software developed by Thomson Reuters. EndNote web is a free alternative that could also be used for this purpose. The references cited in each article were exported into EndNote. Next, the references were arranged according to the chosen parameters to simplify our analyses. Of note, about 35% of the Russian articles indexed in WoS accounted for 80% of all the references cited in the articles. Two possibly reasons for this are (1) the greater number of articles published in translated and international INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2013 71 journals, and (2) Russian researchers are adopting the Western citing culture.6 This finding suggests that it is possible to avoid labor-intensive routine work and to use automated services developed by Thomson Reuters to collate and analyze up to 80% of all references. The authors of articles in the geosciences field cited 1000 journals, including 750 in Western journals and 250 in Russian journals. In terms of biomedical articles, the index included 1339 articles, of which 1168 were in Western journals and 171 in Russian journals. We analyzed about 8000 references cited by authors from each institute over 5 years. The references were divided into three equal groups. The most frequently cited Russian journals and book series are listed in Table 3. Biological and medical sciences Geosciences Journals/Book series Percent of references Total (%) Journals/Book series Percent of references Total (%) Problems of Virology 16.94 16.94 Russian Geology and Geophysics 34.99 34.99 Molecular biology 6.44 23.38 Doklady Earth Sciences 18.18 53.17 Biotechnology in Russia 6.07 29.45 Geochemistry International 6.49 59.66 Doklady Biological Sciences 5.09 34.54 Petrology 2.72 62.38 Atmospheric and Oceanic Optics 4.42 38.96 Geotectonics 2.55 64.93 Annals of the Russian Academy of Medical Sciences 4.04 43 Geology of Ore Deposits 2.23 67.16 Journal of Microbiology, Epidemiology and Immunobiology 3.82 46.82 National geology 1.98 69.14 Molecular Genetics Microbiology And Virology 2.92 49.74 Stratigraphy and Geological Correlation 1.96 71.1 Bulletin of Experimental Biology and Medicine 2.77 52.51 Izvestiya. Physics of the Solid Earth 1.55 72.65 Russian Journal of Bioorganic Chemistry 2.69 55.2 Proceedings of All-union Mineralogic Society 1.45 74.1 Problems of Tuberculosis 2.62 57.82 Bulletin of the Russian Academy of Sciences: Geology 1.42 75.52 Biochemistry (Moscow) 1.79 59.61 Lithology and Mineral Resources 1.42 76.94 Pharmaceutical Chemistry Journal 1.72 61.33 Oil and gas geology 1.26 78.2 Infectious Diseases 1.57 62.9 Russian Journal of Pacific Geology 0.82 79.02 Bulletin Siberian Branch of Russian Academy of Medical Sciences 1.2 64.1 Chemistry for Sustainable Development 0.69 79.71 Russian Journal of Genetics 1.12 65.22 Physics of the Solid State 0.68 80.39 Table 3. Characteristics of the 16 most frequently cited Russian journals and journals in the second group listed in order of number of citations. The journals in the colored region include one-third of all citations. The translated titles of each journal and the official translated titles of journals without translated variants are given. DETECTION OF INFORMATION REQUIREMENTS OF RESEARCHERS USING BIBLIOMETRIC ANALYSES TO IDENTIFY TARGET JOURNALS | GUREYEV AND MAZOV 72 Table 3 shows that two-thirds of all citations were published in only 9% (16/171) of the cited Russian biomedical journals. This statistic is even more pronounced in the field of geosciences, as 6% (16/250) of Russian journals published 80% of the cited articles. Comparing the data between the two institutes, it is notable that the results are consistent. The only difference evident to us is that the geoscience researchers tended to cite more Russian journals, whereas biomedical researchers preferred to cite international literature. The greater concentration of citations to select journals in the geosciences field can be explained by the smaller number of citations. In the biomedical field, we observed a high trend towards abundant citations resulting in a wider distribution of citations in each article; the journals with the highest impact factors in biology and medicine confirmed our observation. Figures 2 and 3 show the correlations between citations and publication activity in Russian journals. Figure 2. Correlation between publication activity (red) and citations (blue) in the biomedical field (in %) for the data shown in Table 3. Timescale: 2007–2011. Figure 3. Correlation between publication activity (red) and citations (blue) in the geosciences field (in %) for the data shown in Table 3. Timescale: 2006–2010. INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2013 73 The citing and cited journals are often the same journals, and publication activity is highly correlated with citation activity. This is more apparent in the geosciences field, where Russian Geology and Geophysics is the most frequently cited journal, as it published about two-thirds of all cited articles. This is unsurprising because it is published by our institute and is the main multidisciplinary Russian journal in the field of geosciences. The most frequently cited international journals are listed in Table 4. Biological and medical sciences Geosciences Journals/Book series Percent of references Total (%) Journals/Book series Percent of references Total (%) Journal of Virology 6.03 6.03 Earth Planetary Science Letters 6.46 6.46 Proceedings of the National Academy of Sciences of the United States of America 3.36 9.39 Geochimica et Cosmochimica Acta 6.28 12.74 Virology 3.15 12.54 Contributions to Mineralogy and Petrology 5.67 18.41 Vaccine 2.77 15.31 Journal of Geophysical Research 4.9 23.31 Journal of Biological Chemistry 2.4 17.71 Nature 3.67 26.98 Journal of General Virology 2.4 20.11 American Mineralogist 3.53 30.51 Nature 2.04 22.15 Journal of Petrology 3.22 33.73 Science 1.94 24.09 Lithos 2.58 36.31 Journal of Clinical Microbiology 1.94 26.03 Chemical Geology 2.29 38.6 Emerging Infectious Diseases 1.89 27.92 Geology 2.01 40.61 Nucleic Acids Research 1.59 29.51 Tectonophysics 1.94 42.55 Journal of Infectious Diseases 1.38 30.89 Economic Geology 1.93 44.48 DETECTION OF INFORMATION REQUIREMENTS OF RESEARCHERS USING BIBLIOMETRIC ANALYSES TO IDENTIFY TARGET JOURNALS | GUREYEV AND MAZOV 74 Journal of Molecular Biology 1.35 32.24 Science 1.87 46.35 Journal of Immunology 1.24 33.48 Journal of Crystal Growth 1.56 47.91 Journal of Medical Virology 1.19 34.67 Canadian Mineralogist 1.48 49.39 Virus Research 0.86 35.53 Russian Geology and Geophysics 1.35 50.74 New England Journal of Medicine 0.86 36.39 European Journal of Mineralogy 1.32 52.06 Archives of Virology 0.83 37.22 Geophysics 1.02 53.08 Antiviral Research 0.75 37.97 Geophysical Research Letters 1.02 54.1 Lancet 0.73 38.7 Journal of Metamorphic Geology 0.98 55.08 Cell 0.65 39.35 Journal of Geology 0.93 56.01 Applied and Environmental Microbiology 0.6 39.95 International Geology Review 0.91 56.92 Biochemistry 0.59 40.54 Physical Review. Ser. B 0.9 57.82 Journal of Experimental Medicine 0.59 41.13 Precambrian Research 0.9 58.72 FEBS Letters 0.56 41.69 Mineralogical Magazine 0.88 59.6 Table 4. Characteristics of the 25 most frequently cited international journals and journals within the second group listed in terms of number of citations. The colored area includes one-third of all citations. The distribution of citations to international journals was similar to that observed for Russian journals, with a greater citation density in journals in the geosciences field. Notably, two-thirds of all citations were to articles published in just 25 journals. In terms of biomedical journals, two- thirds of all citations were to articles published in 100 journals. Only 1.3% (15/1 168) of the cited journals contained one-third of the cited articles in the biomedical field. The corresponding value for journals in the geosciences field was 0.9% (7/750). The correlations between citation activity and publication activity are shown in Figures 4 and 5. INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2013 75 Fig. 4. Correlation between publication activity (red) and citations (blue) for biomedical journals (in %) for the data shown in Table 4. Timescale: 2007–2011. Fig. 5. Correlation between publication activity (red) and citations (blue) for journals in the geosciences field (in %) for the data shown in Table 4. Timescale: 2006–2010 DETECTION OF INFORMATION REQUIREMENTS OF RESEARCHERS USING BIBLIOMETRIC ANALYSES TO IDENTIFY TARGET JOURNALS | GUREYEV AND MAZOV 76 As illustrated in Figures 4 and 5, the distribution of citations to international journals was broader than for Russian journals, where there are only 1–4 frequently cited journals. This is probably due to the smaller number of Russian journals than international journals. Figures 4 and 5 also revealed a difference between the two disciplines, as geoscience researchers published their articles in top cited international journals, whereas biomedical researchers rarely published their research in highly cited journals. This may be due to the greater number of biomedical journals or the lower rate of publication, because relatively few articles were published in the major multidisciplinary journals, such as Nature or Science, or in specialized journals, such as the Journal of Virology. CONCLUSION Citation analysis enabled rapid identification of the most frequently cited journals that are essential to academic researchers. In the biomedical field, we found that 16 Russian and 100 international journals published two-thirds of all cited articles in the last 5 years. In the field of geosciences, we identified 4 Russian and 25 international journals that were essential to researchers in this field. Interestingly, there were four times as many Russian and international journals in the biomedical field than in the geosciences field. The journals that published the researchers’ articles were partially correlated with the cited journals in the geosciences field, but this correlation was less obvious for biomedical journals. It is important to note that all aspects of this study were performed by librarians who used tools that were available in both institutes. We did not require any additional facilities or the assistance of any researchers. We believe our method is one of the most objective and accessible approach for scientific libraries to select target journals. We used our results to optimize subscribed periodical items. In addition to journal acquisition, our methods and results may be applied to other tasks that may be performed by research libraries. For example, it is possible to study the citing and cited half- lives of journals, and compare the results with those reported in the Journal Citation Reports. This allows researchers in specific institutes to determine whether they are citing cutting edge or obsolete literature in their studies. The results can also be used to determine whether the subjects of the cited articles are relevant to the institute’s field of research. Finally, the results can be used to compare the list of the most frequently cited international journals within a particular field with the list of journals that are most frequently cited by a research institute. PERSPECTIVES In this study, we revealed some differences in the correlation between citing and cited journals in two distinct fields, namely geosciences and biomedical science. Notably, this correlation was greater for journals in the geosciences field. To determine the factors underlying this phenomenon, it will be interesting to extend our study to a greater number of disciplines. It will also be interesting to compare data for cited journals with their usage statistics. INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2013 77 REFERENCES 1. A.F.J. van Raan. “The use of bibliometric analysis in research performance assessment and monitoring of interdisciplinary scientific developments.” Technikfolgenabschätzung – Theorie und Praxis, Vol. 1 no.12 (2003): 20-29. 2. N.A. Slashcheva, Yu.V. Mokhnacheva and T.N. Kharybina. (2008)/ “Study of information requirement of scientists from Pushchino Scientific Center RAS in Central Center library.” http://dspace.nbuv.gov.ua:8080/dspace/bitstream/handle/123456789/31392/20- Slascheva.pdf?sequence=1 (accessed January 21, 2013). 3. Nikolay A. Mazov. “Estimation of a flow of scientific publications in research institute on the basis bibliometric citation analysis”. Information technologies in social researches, no.16 (2011): 25-30. 4. Leo Egghe and Ronald Rousseau. “Citation analysis”, In Introduction to informetrics: Quantitative methods in library, documentation and information science. Amsterdam: Elsevier science publishers. (1990): 217-218. 5. “Bibliometrics Publication Analysis as a Tool for Science Mapping and Research Assessment.” (2008), http://ki.se/content/1/c6/01/79/31/introduction_to_bibliometrics_v1.3.pdf (accessed January 21, 2013). 6. A.E. Warshawsky and V.A. Markusova. (2009). “Estimation of efficiency of Russian scientists should be corrected.” http://strf.ru/organization.aspx?CatalogId=221&d_no=17296 (accessed January 21, 2013). http://dspace.nbuv.gov.ua:8080/dspace/bitstream/handle/123456789/31392/20-Slascheva.pdf?sequence=1 http://dspace.nbuv.gov.ua:8080/dspace/bitstream/handle/123456789/31392/20-Slascheva.pdf?sequence=1 http://ki.se/content/1/c6/01/79/31/introduction_to_bibliometrics_v1.3.pdf http://strf.ru/organization.aspx?CatalogId=221&d_no=17296 3423 ---- An Evaluation of Finding Aid Accessibility for Screen Readers Kristina L. Southwell and Jacquelyn Slater INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2013 34 ABSTRACT Since the passage of the American Disabilities Act in 1990 and the coincident growth of the Internet, academic libraries have worked to provide electronic resources and services that are accessible to all patrons. Special collections are increasingly being added to these web-based library resources, and they must meet the same accessibility standards. The recent popularity surge of Web 2.0 technology, social media sites, and mobile devices has brought greater awareness about the challenges faced by those who use assistive technology for visual disabilities. This study examines the screen-reader accessibility of online special collections finding aids at 68 public US colleges and universities in the Association of Research Libraries. INTRODUCTION University students and faculty today expect some degree of online access to most library resources. Special collections libraries are no exception, and researchers now have access to troves of digitized finding aids and original materials at university library websites nationwide. As part of the websites of higher education institutions, these resources must be accessible to patrons with disabilities. Section 504 of the Rehabilitation Act of 1973 first prohibited exclusion of the disabled from programs and activities of public entities, and the 1990 Americans with Disabilities Act (ADA) mandated accessibility of public services and facilities. Section 508 of the Rehabilitation Act, as amended by the Workforce Investment Act of 1998, also requires accessibility of federally funded services. Since the passage of these laws, libraries at US colleges and universities have made progress in physical and electronic accessibility for the disabled. According to the Employment and Disability Institute at Cornell University, 2.1 percent of noninstitutionalized persons in the United States in 2010 had a visual disability.1 The US Census Bureau counted nearly 8.1 million people (3.3 percent) who reported difficulty seeing and 2 million who are blind or unable to see.2 These numbers indicate that there are students, faculty, and patrons outside the academic community with visual impairments who are potential consumers of online special collections materials. As ADA improvements increasingly pave the way for greater enrollment numbers of students with visual impairments, libraries must anticipate these students’ need for fully accessible information resources. Kristina Southwell (klsouthwell@ou.edu) is Associate Professor of Bibliography and Assistant Curator at the Western History Collections, University of Oklahoma, Norman, OK. Jacquelyn Slater (jdslater@ou.edu) is Assistant Professor of Bibliography and Librarian at the Western History Collections, University of Oklahoma, Norman, OK. mailto:klsouthwell@ou.edu mailto:jdslater@ou.edu AN EVALUATION OF FINDING AID ACCESSIBILITY FOR SCREEN READERS | SOUTHWELL AND SLATER 35 Library websites and the constantly changing resources they offer must be regularly evaluated for compatibility with screen readers and other accessibility technologies to ensure access. Perhaps because special collections materials are relatively late arrivals to the Internet, their accessibility has not received as much attention as more traditional library offerings like published books and periodicals. The goal of this study is to determine whether a sampling of special collections finding aids available on public US college and university library websites are accessible to patrons using screen readers. INTERNET ACCESS AND SCREEN READERS Blind and low-vision Internet users have various types of assistive technology available to them. These include screen readers with text-to-speech synthesizers, refreshable Braille displays, text enlargement, screen magnification software, and nongraphical browsers. Guidelines for making websites accessible via assistive technology are available from the W3C’s Web Content Accessibility Guidelines (WCAG 2.0).3 These rules provide success criteria for levels A, AA, and AAA for web developers to meet. Many websites today still do not conform to these guidelines, and barriers to access persist. Screen-reader users access information on the Internet differently than sighted persons. The keyboard usually replaces the monitor and mouse as the primary computer interface. Webpage content is spoken aloud in a strictly linear order, which may differ from the visual order on screen. Instead of visually scanning the page to look for the desired content, screen-reader users can use the “find” or “search” function to look for something specific or use one of several options for skimming the page via keyboard shortcuts. These shortcuts, which vary by screen reader, allow navigation to the available links, headings, ARIA landmarks, frames, paragraphs, and other elements of the page. A recent WebAIM survey of screen-reader users indicated 60.8 percent navigated lengthy webpages first by their headings. Using the “find” feature was the second most common method (16.6 percent), followed by navigation with links (13.2 percent) and ARIA landmarks (2.3 percent). Only 7.0 percent reported reading through a long website without using navigational shortcuts.4 Some websites also offer a “skip navigation link” at the beginning of a page, which allows the user to skip over the repetitive navigational information in the banner to hear the “main content” as soon as possible. These fundamental differences in the way screen- reader users access Internet content are the key to making websites that work well with assistive technology. LITERATURE REVIEW Accessibility studies of library web sites in higher education have primarily focused on the library’s homepage and its resources and services. More than a decade ago, Lilly and Van Fleet and Spindler determined only 40 percent and 42 percent, respectively, of academic library homepages were rated accessible using Bobby accessibility testing software.5 A series of similar studies followed by Schmetzke and Comeaux and Schmetzke, which found accessibility rates of library homepages fluctuating over time, decreasing from a 2001 rate of 59 percent to 51 percent in 2002, INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2013 36 and rising back to 60 percent in 2006.6 Providenti and Zai III examined academic library homepages in Kentucky, comparing data from 2003 and 2007. They also found low accessibility rates with minimal improvement in four years.7 Many accessibility studies have focused on one of the mainstays of academic library sites: databases of e-journals. Early studies by Coonin, Riley, Horwath, and others found significant accessibility barriers in most electronic content providers’ databases.8 Problems ranged from missing and unclear alternative text to inaccessible journal PDFs saved as images instead of text. As awareness of web accessibility in library resources spread, research studies began to find that most databases were Section 508 compliant but still lacked user-friendliness for users of assistive technology.9 More recent studies examined the actual usability of journal databases and the challenges they pose for the disabled. Power and LaBeau still found vendor databases that were not Section 508 compliant and others that were minimally compliant but lacked functionality.10 Dermody and Majekodunmi found that students were hindered by advanced database features intended to improve general users’ experiences.11 Disabled students were also confronted with excessive links, unreadable links, and inaccessible PDFs. Related studies have focused on providing guidelines for accessible library web design and services. Brophy and Craven highlighted the importance of universal design in library sites because of the ever-increasing complexity of web-based media.12 Vandenbark provided a clear explanation of US regulations and standards for accessible design and outlined basic principles of good design and how to achieve it.13 Recent works by Samson and Willis addressed best practices for reference and general library services to disabled patrons. Samson found no consistent set of best practices between eight academic libraries studied, noting that five of the eight based their services on reactions to individual complaints instead of using a broader, proactive approach.14 Willis followed up on a 1996 study by surveying technology and physical-access issues for the disabled in academic health sciences libraries. She found improvements in physical access, but technological access proved to be a mixed bag. While library catalogs were more accessible because they were available online, library webpages continued to pose problems for the disabled. Significant deficiencies in the provision of alternative text and accessible media formats were observed.15 Finding no comparable evaluations of special collections resources, in 2011 we examined the screen-reader accessibility of digitized textual materials from the special collections departments of US academic library websites.16 Our study found that 42 percent of the digitized items were accessible by screen reader, while 58 percent were not. Published at the same time, Lora J. Davis’ 2012 study evaluated accessibility of Philadelphia Area Consortium of Special Collections Libraries (PACSCL) member libraries’ special collections websites and compared their performance to popular sites such as Facebook, Wikipedia, and YouTube. Davis found that the special collections sites had error rates comparable to the popular sites, but demonstrated that a low number of error codes in automatic checkers does not necessarily mean the page is usable for nonsighted people.17 Davis concluded that it is difficult to “meaningfully assess site accessibility” AN EVALUATION OF FINDING AID ACCESSIBILITY FOR SCREEN READERS | SOUTHWELL AND SLATER 37 using only automatic accessibility checkers.18 Our current research study addresses this issue by incorporating manual tests of the special collections finding aids we examined. The results provide some insight into the screen-reader user’s experience with these materials. METHOD The researchers evaluated a single online finding aid from the websites of each of the 68 US public university and college libraries in the Association of Research Libraries. They were analyzed with automated and manual tests during the 2012 fall academic semester. The evaluated finding aids were randomly selected from each library’s manuscripts and archives collections. Selection was limited to only collections that have a container list describing manuscript or archives contents at least at box level. Evaluations were performed on the default display mode of the selected finding aids. If the library’s website required a format choice instead of a default display (such as HTML or PDF) the HTML version was selected. The automated web-accessibility checker WAVE 5.0 by WebAIM was used to perform an initial assessment of each finding aid’s conformance to Section 508 and WCAG 2.0 guidelines. The WAVE- generated report for each finding aid was used to compile a list of codes for the standard WAVE categories: Errors, Alerts, Features, Structural Elements, and WAI-ARIA landmarks. We recorded how many libraries earned each type of code, as well as how many times each code was generated during the entire evaluation process. Manual testing of each finding aid was performed with the WebbIE 3 Web Browser, a text-only browser for the blind and visually impaired. WebbIE’s Ctrl-H and Ctrl-L commands were used to examine the headings and links available on each finding aid to determine whether patrons who use screen readers could navigate the finding aid by using its headings or internal links. The study concluded with a manual test by screen reader directed by keyboard navigation. System Access to Go (SAToGo) and NVDA were used for this test. RESULTS Overview Basic descriptive data recorded during the selection process shows that 65 of the 68 finding aids tested were displayed as webpages using HTML, XHTML, or XML coding. The remaining three finding aids were displayed only in PDF, with no other viewing option available. Only 25 of 68 finding aids were offered in multiple viewing formats, while 43 were only available in a single format. Twenty of the finding aids were displayed in a state or regional database, while four used Archon, one used CONTENTdm, and four used DLXS. WAVE 5.0 Web Accessibility Evaluation Tool The three finding aids available only in PDF cannot be checked in WAVE, which is limited to webpages. Therefore 65 finding aids were evaluated with this tool. The results show that the majority of tested finding aids (58 of 65, or 89.23 percent) had at least one accessibility error (see INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2013 38 table 1). The most common errors were missing document language, missing alternative text, missing form labels, and linked images missing alternative text. Only seven of the finding aids had zero accessibility error codes. Missing document language was noted for 63 percent of finding aids. Language identification is important for screen readers or any text-to-speech applications, and it is a basic Level A conformance requirement to meet WCAG 2.0 criteria. The finding aids tested for this study contain primarily English materials, but they also describe materials in other languages, particularly Spanish and French manuscript and book titles. Without language identification, these words are spoken incorrectly with English pronunciation. Furthermore, increasing popularity of mobile devices with voicing capabilities will likely make language identification helpful for many users, whether or not they use a screen reader for a disability accommodation. Error Number of Libraries Total Number of Occurrences Broken skip link 4 6 Document language missing 41 41 Empty button 1 1 Empty form label 4 7 Empty heading 15 16 Image map area missing alternative text 2 2 Linked image missing alternative text 12 28 Missing alternative text 15 36 Missing form label 23 29 Missing or uninformative page title 7 7 Table 1. WAVE 5.0 Errors (n = 65). The number of errors found for missing alternative text (36 instances at 15 libraries), linked images missing alternative text (28 instances at 12 libraries), and image map areas missing alternative text (two instances at two libraries) is surprising. Alternative text for graphic items is one of the most basic and well-known accessibility features that can be implemented. The fact that it has not been provided when needed in more than a dozen finding aids suggests that these libraries have not performed the most rudimentary accessibility checks. Missing or empty form labels and empty buttons, found at 24 libraries, can cause significant problems for screen-reader users. Form labels and buttons allow listeners to identify and interact with forms such as search boxes. Lack of accessible descriptive information makes them challenging to use, if not impossible. Because headings are used with screen readers to facilitate quick keyboard navigation of a page, the presence of empty headings deprives screen-reader users of the information they need to scan the page the way a sighted patron does. Similarly, skip links are used to jump to the main content of a page, bypassing the repetitive information in headers and sidebars. Broken skip links were AN EVALUATION OF FINDING AID ACCESSIBILITY FOR SCREEN READERS | SOUTHWELL AND SLATER 39 present at four libraries, eliminating their intended advantage. Missing or uninformative page titles were found at seven libraries, six of which were from pages using frames for display. When frames are used, each frame must have a clear title so listeners can choose the correct frame to hear. WAVE’s Alerts category identifies items that have the potential to cause accessibility issues, particularly when not implemented properly (see table 2). A total of 43 percent of the finding aids reported missing first level headings, 30 percent had a skipped heading level, and nearly 17 percent had no heading structure. Missing and disordered headings cause confusion when screen- reader users try to navigate a page with them. Listeners may think they have missed a heading, or they may have difficulty understanding the order and relationship of the page’s sections. Alert Number of Libraries Total Number of Occurrences Accesskey 3 15 Broken same-page link 9 18 Link to PDF document 3 5 Missing fieldset 1 1 Missing first level heading 28 28 Missing focus indicator 13 13 Nearby image has same alternative text 9 1,071 No heading structure 11 16 Noscript element 8 9 Orphaned form label 2 2 Plugin 1 1 Possible table caption 3 4 Redundant alternative text 4 9 Redundant link 26 264 Redundant title text 18 1,093 Skipped heading level 20 22 Suspicious alternative text 6 6 Suspicious link text 1 5 Tabindex 8 74 Underlined text 8 142 Unlabeled form element with title 1 2 Very small text 11 20 Table 2. WAVE 5.0 Alerts (n = 65). At first glance, the most frequently encountered alerts appear to be for nearby images with the same alternative text (1,071 instances at nine libraries), and redundant title text (1,093 instances INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2013 40 at 18 libraries). On closer inspection, it is clear the vast majority of these alerts came from just three libraries using Archon and are due to the inclusion of an “add to your cart” linked arrow image at the end of each described item. This repetitive statement is read aloud by the screen reader, making for a tedious listening experience. Redundant links accounted for the next largest group of errors (264 instances at 26 libraries). Most of these came from a single library using CONTENTdm. Its finding aid included a large number of subject headings linked to a “refine your search” option. Excessive links clutter the navigational structure used by screen readers. Broken same page links, present on nine finding aids, also hamper quick navigation within a page. Other alerts reported at several libraries indicated failure to provide descriptive information or adequate alternative text for form labels, table captions, fieldsets, and links. The presence of these problems underscores the fact that descriptive information needed by screen reader users is not reliably available in finding aids. The remaining alerts for accesskey, tabindex, plugin, noscript element, and link to PDF document simply highlight areas that should be checked for correct implementation and do not confirm the presence of an access barrier. The Features, Structural Elements, and WAI-ARIA landmarks codes in WAVE identify the coding elements that make online content more accessible. Features help users with disabilities interact with the page and read all of the available information on it, such as alternative text for images and form labels (see table 3). Fully 83 percent (54 of 65) of library finding aids evaluated included at least one accessibility feature. The most commonly used features are alternative text and linked images with alternative text. A total of 53 libraries used some form of alternative text. WAVE reported that skip navigation links were available on only four finding aids, accounting for just 6 percent of libraries. A manual check of the source code, however, located a total of six finding aids Feature Number of Libraries Total Number of Occurrences Alternative text 45 142 Element language 2 2 Form label 5 16 Image button with alternative text 4 4 Image map area with alternative text 2 5 Image map with alt attribute 3 3 Linked image with alternative text 19 31 Long description 1 6 Null or empty alternative text 10 21 Null or empty alternative text on spacer 9 30 Skip link 4 4 Skip link target 5 5 Table 3. Features (n = 65). AN EVALUATION OF FINDING AID ACCESSIBILITY FOR SCREEN READERS | SOUTHWELL AND SLATER 41 with functioning skip links, all correctly located at or near the beginning of the page. This discrepancy indicates that accessibility checkers are not fail proof and must be followed by manual tests. The two added libraries raise the total to just 9 percent of libraries with skip links. Considering the value of skip links to users of assistive technology, it is unfortunate they are not present on more pages. The Structural Elements noted by WAVE are the elements that help with keyboard navigation of the page and contextualizing layout-based information, such as tables or lists (see table 4). Most libraries (64 of 65, or 98 percent) used at least one structural feature on their finding aids. Lists and heading levels 2 and 3 are the most frequently used, followed by heading levels 1 and 4. Although heading levels should be ordered sequentially to provide logical structure to the document, heading level 1 was skipped at 28 libraries (see table 2). Table header cells, included at the 9 libraries using data tables to display container lists, are key to making tables screen-reader accessible. Inline frames were used at seven libraries, as opposed to six libraries that used traditional frames. While inline frames are considered more accessible than traditional frames, using CSS is preferable to using either type. Structural Element Number of Libraries Total Number of Occurrences Definition/description list 11 86 Heading Level 1 33 54 Heading Level 2 43 150 Heading Level 3 42 295 Heading Level 4 25 108 Heading Level 5 1 2 Heading Level 6 0 0 Inline frame 7 7 Ordered list 6 16 Table header cell 9 38 Unordered list 41 715 WAI-ARIA landmarks 1 3 Table 4. Structural Elements (n = 65). WAI-ARIA landmarks are element attributes that identify areas of the page such as “banner” or “search.” They serve as navigational aids for assistive technology users in a manner similar to headings. Only one of the finding aids included three WAI-ARIA roles. While ARIA landmarks are becoming more common on the Internet in general, the data collected for this study indicates they have not yet been incorporated into library finding aids. INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2013 42 WebbIE 3 Text Browser Analysis Because screen reader users often use a webpage’s headings and links for navigating by keyboard commands, their importance to accessibility cannot be overstated. A quick check of any page in a nongraphical browser will reveal the page’s linear structure and reading order as handled by a screen reader. A text-only view of a website shows the order of headings and links within the document. WebbIE 3’s CTRL-H and CTRL-L commands were used to evaluate the 65 finding aids for the presence of headings and links for internal navigation. Finding aids were rated on a pass/fail basis in three categories: • presence of any headings • presence of headings for navigating to another key part of the finding aid (e.g., container list) • presence of internal links for navigating to another key part of the finding aid Headings/Links Yes No Finding aid has at least one heading 59 (91%) 6 (9%) Headings are used for navigation within finding aid 44 (68%) 21 (32%) Links are used for navigation within finding aid 37 (57%) 28 (43%) Headings and/or links used for navigation within finding aid 49 (75%) 16 (25%) Table 5. Use of headings and links for navigation (n = 65). While 91 percent had at least one heading, just 68 percent actually had headings that enabled navigation to another important section of the document, such as the container list. That means one-third of all finding aids encountered during this study could not be navigated by headings. Even those that did have enough headings with which to navigate did not always have the headings in proper sequential order, or were missing first-level headings. This lack of adequate structure, given the length of some manuscript-collection finding aids, can make reading them with a screen reader tedious. Finding aids with few or no headings prevent users of assistive technology from conveniently moving between sections, as a sighted reader can by visually scanning the page and selecting a relevant portion to read. Even fewer finding aids offered links for navigating between sections of the finding aid. While 57 percent included such links, 43 percent did not. A total of 25 percent of pages tested lacked both headings and links of any kind for navigation within the finding aid. Inclusion of headings or links to the standard sections of the finding aid facilitates keyboard navigation. Additional headings or links to individual series or boxes provide even more options AN EVALUATION OF FINDING AID ACCESSIBILITY FOR SCREEN READERS | SOUTHWELL AND SLATER 43 for screen reader users. This is particularly helpful for patrons whose queries aren’t easily found using a search function – for example, when a patron does not know the specific terms to use for searching. Only the most patient visitor will listen to an entire finding aid being read. Screen Reader Test A manual screen reader test of each finding aid was completed by the researchers with SAToGo and NVDA. Both screen readers were used to ensure that success or failure to read the content was not because of any particular screen reader software. Despite the 89 percent error rate noted by the automatic accessibility checker, the screen readers were able to read the main content of all 65 finding aids. The three PDF-only finding aids in the original group of 68 were also tested by opening them with the screen reader and Adobe Reader together. Adobe Reader indicated all three lacked tagging for structure and attempted to prepare them for reading. This resulted in all three being read aloud by the screen reader, but only one of the three was navigable by linked sections of the finding aid. The remaining two finding aids had no headings or links. While it is encouraging that the main content of all 68 finding aids could be read, some functioned poorly because of how the information is organized and displayed. Finding aids serve as reference works for researchers and as internal record-keeping documents for the history of the collection. As such, they typically have a substantial amount of administrative information positioned at the beginning. Biographical, acquisition, and scope and content notes are common, as are processing details and subject headings. Sighted users can glance at the administrative information and skip to the collection summary or container list as needed. Screen- reader users can bypass this administrative information by using headings or links when they are supplied. Users of the one-third of finding aids in this study that lacked these shortcuts must skim, search, or read the entire finding aid. Inclusion of extensive administrative information without providing the means to skip past it creates a significant usability barrier. The descriptive style and display format of the container list also posed problems during this test. Lengthy container lists displayed in tables are difficult to understand when spoken because tables are read row-by-row. This separates the descriptive table header cells, such as “box” and “folder” from the related information in the rows and columns below. As a result, the screen reader says “one, fifteen” before the description of the item in box 1, folder 15. It is hard to follow a long table, and the listener must remember or revisit the column and row headers to make sense of the descriptions. Most screen readers have a table-reading mode for data tables that will read the header cell with the associated content, but only if the table has been marked up with sufficient tags. Container-list-item descriptions that begin with an identification number or numeric date (e.g., 2012/01/13) are particularly unclear for listeners. These long sequences of numbers seem out of context when spoken by the screen reader, and it can be difficult to infer the relationship between the number and the item. Item descriptions that are phrased as brief sentences in plain language result in finding aids that are more easily understood. APPLICATION OF FINDINGS INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2013 44 Most special collections personnel in academic libraries are not responsible for the design of their websites, which are part of a larger organization that serves other needs. It is important that special collections librarians communicate to administrative and systems personnel that finding aids must be accessible to the visually disabled. Libraries cannot rely on a content management system’s claims of being Section 508-compliant to ensure accessibility, because that does not automatically guarantee the information displayed in the system is accessible. Proper implementation of any content management system’s accessibility features is a key factor in achieving accessibility. Librarians can take the first step toward improving accessibility of their special collections’ online finding aids by experiencing firsthand what screen reader users encounter when they use them. This can be done by conducting the same automated and manual tests described in this study. The following key checkpoints should be considered: Accessible finding aids should • be keyboard navigation-friendly; • include alternative text for all graphics; • have descriptive labels and titles for all interactive elements like forms; • offer at least one type of navigational structure: o skip links and internal navigation links, o sufficient and properly ordered headings, or o WAI-ARIA landmarks; and • linear reading order should be correct and simulate visual reading order, particularly for the container list. CONCLUSION This study indicates that special collections finding aids at US public colleges and universities can be accessed by screen-reader users, but they do not always perform well because of faulty coding and inadequate use of headings or links for keyboard navigation. It is clear that many finding aids available online today have not been evaluated for optimal performance with assistive technology. This results in usability barriers for visually impaired patrons. Special collections librarians can help ensure their electronic finding aids are accessible to screen-reader users by conducting automatic and manual tests that focus on usability. The test results can be used to initiate changes that will result in finding aids that are accessible to all users. AN EVALUATION OF FINDING AID ACCESSIBILITY FOR SCREEN READERS | SOUTHWELL AND SLATER 45 REFERENCES 1. “Disability Statistics,” Employment and Disability Institute, Cornell University, 2010, accessed December 20, 2012, www.disabilitystatistics.org/reports/acs.cfm. 2. Matthew J. Brault, “Americans with Disabilities: 2010,” US Census Bureau, 2010, accessed December 20, 2012, www.census.gov/prod/2012pubs/p70-131.pdf. 3. “Web Content Accessibility Guidelines (WCAG) 2.0,” World Wide Web Consortium (W3C), accessed December 20, 2012, www.w3.org/TR/WCAG. 4. “Screen Reader User Survey #4,” WebAIM, accessed December 20, 2012, http://webaim.org/projects/screenreadersurvey4. 5. Erica B. Lilly and Connie Van Fleet, “Wired But Not Connected,” Reference Librarian 32, no. 67/68 (2000): 5–28, doi: 10.1300/J120v32n67_02; Tim Spindler, “The Accessibility of Web Pages for Mid-Sized College and University Libraries,” Reference & User Services Quarterly 42, no. 2 (2002): 149–54. 6. Axel Schmetzke, “Web Accessibility at University Libraries and Library Schools,” Library Hi Tech 19, no. 1 (2001): 35–49; Axel Schmetzke, “Web Accessibility at University Libraries and Library Schools: 2002 Follow-Up Study,” in Design and Implementation of Web-enabled Teaching Tools, ed. Mary Hricko (Hershey, PA: Information Science, 2002); David Comeaux and Axel Schmetzke, “Web Accessibility Trends in University Libraries and Library Schools,” Library Hi Tech 25, no. 4 (2007): 457–77, doi: 10.1108/07378830710840437. 7. Michael Providenti and Robert Zai III,“Web Accessibility at Kentucky’s Academic Libraries,” Library Hi Tech 25, no. 4 (2007): 478–93, doi: 10.1108/07378830710840446. 8. Bryna Coonin, “Establishing Accessibility for E-journals: A Suggested Approach,” Library Hi Tech 20, no. 2 (2002): 207–20, doi: 10.1108/07378830210432570; Cheryl A. Riley, “Libraries, Aggregator Databases, Screen Readers and Clients with Disabilities,” Library Hi Tech 20, no. 2 (2002): 179–87, doi: 10.1108/07378830210432543; Cheryl A. Riley, “Electronic Content: Is It Accessible to Clients with ‘Differabilities’?” Serials Librarian 46, no. 3/4 (2004): 233–40, doi: 10.1300/J123v46n03_06; Jennifer Horwath, “Evaluating Opportunities for Expanded Information Access: A Study of the Accessibility of Four Online Databases,” Library Hi Tech 20, no. 2 (2002): 199–206; Suzanne L. Byerley and Mary Beth Chambers, “Accessibility and Usability of Web-based Library Databases for Non-visual Users,” Library Hi Tech 20, no. 2 (2002): 169–78, doi: 10.1108/07378830220432534; Suzanne L. Byerley and Mary Beth Chambers, “Accessibility of Web-based Library Databases: The Vendors’ Perspectives,” Library Hi Tech 21, no. 3 (2003): 347–57. 9. Ron Stewart, Vivek Narendra and Axel Schmetzke, “Accessibility and Usability of Online Library Databases,” Library Hi Tech 23, no. 2 (2005): 265–86, doi: 10.1108/07378830510605205; Suzanne L. Byerley, Mary Beth Chambers, and Mariyam Thohira, “Accessibility of Web-based http://webaim.org/projects/screenreadersurvey4/ INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2013 46 Library Databases: The Vendors’ Perspectives in 2007,” Library Hi Tech 25, no. 4 (2007): 509– 27, doi: 10.1108/07378830710840473. 10. Rebecca Power and Chris LaBeau, “How Well Do Academic Library Web Sites Address the Needs of Database Users with Visual Disabilities?” Reference Librarian 50, no. 1 (2009): 55–72, doi: 10.1080/02763870802546399. 11. Kelly Dermody and Norda Majekodunmi, “Online Databases and the Research Experience for University Students with Print Disabilities,” Library Hi Tech 29, no. 1 (2011): 149–60, doi: 10.1108/07378831111116976. 12. Peter Brophy and Jenny Craven, “Web Accessibility,” Library Trends 55, no. 4 (2007): 950–72. 13. R. Todd Vandenbark, “Tending a Wild Garden: Library Web Design for Persons with Disabilities,” Information Technology & Libraries 29, no. 1 (2010): 23–29. 14. Sue Samson, “Best Practices for Serving Students with Disabilities,” Reference Services Review 39, no. 2 (2011): 244–59, doi: 10.1108/00907321111135484. 15. Christine A. Willis, “Library Services for Persons with Disabilities: Twentieth Anniversary Update,” Medical Reference Services Quarterly 31, no. 1 (2012): 92–104, doi: 10.1080/02763869.2012.641855. 16. Kristina L. Southwell and Jacquelyn Slater, “Accessibility of Digital Special Collections Using Screen Readers,” Library Hi Tech 30, no. 3 (2012): 457–471, doi: 10.1108/07378831211266609. 17. Lora J. Davis, “Providing Virtual Services to All: A Mixed-Method Analysis of the Website Accessibility of Philadelphia Area Consortium of Special Collections Libraries (PACSCL) Member Repositories,” American Archivist 75 (Spring/Summer 2012): 35–55. 18. Davis, “Providing Virtual Services to All,” 51. 3471 ---- Evaluation and Comparison of Discovery Tools: An Update F. William Chickering and Sharon Q. Yang INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2014 5 ABSTRACT Selection and implementation of a web-scale discovery tool by the Rider University Libraries (RUL) in the 2011–2012 academic year revealed that the endeavor was a complex one. Research into the state of adoption of web-scale discovery tools in North America and the evolution of product effectiveness provided a good starting point. In the following study, we evaluated fourteen major discovery tools (three open source and ten proprietary), benchmarking sixteen criteria recognized as the advanced features of a “next generation catalog.” Some of the features have been used in previous research on discovery tools. The purpose of the study was to evaluate and compare all the major discovery tools , and the findings serve to update librarians on the latest developments and user interfaces and to assist them in their adoption of a discovery tool. INTRODUCTION In 2004, the Rider University Libraries’ (RUL) strategic planning process uncovered a need to investigate federated searching as a means to support rese arch. A tool was needed to search and access all journal titles available to RUL users at that time, including 12,000+ electronic full-text journals. Lacking the ability to provide relevancy ranking due to its real-time search operations, as well as the cost of the products then available, the decision was made to defer implementation of federated search. Monitoring developments yearly revealed no improvements strong enough to adopt the approach. By 2011, the number of electronic full-text journals had increased to 51,128, and by this time federated search as a concept had metamorphosed into web -scale discovery. Clearly, the time had come to consider implementing this more advanced approach to searching the ever-growing number of journals available to our clients. Though RUL passed on federated searching, viewing it as too cumbersome to serve our students well, we anticipated the day when improved systems would emerge. Vaughn nicely describes the ability of more highly evolved discovery systems to “provide qu ick and seamless discovery, delivery, and relevancy-ranking capabilities across a huge repository of content.” 1 Yang and Hofmann anticipated the emergence of web-scale discovery with their evaluation of next generation catalogs. 2,3 By 2011, informed by Yang and Hofmann’s research, we believed that the systems in the marketplace were sufficiently evolved to make our efforts at assessing available systems worthwhile. This coincided nicely with an important objective in our strategic plan : F. William Chickering (chick@rider.edu) is Dean of University Libraries, Rider University, Lawrenceville, New Jersey. Sharon Q. Yang (yangs@rider.edu) is Associate Professor–Librarian at Moore Library, Rider University, Lawrenceville, New Jersey. mailto:chick@rider.edu mailto:yangs@rider.edu EVALUATION AND COMPARISON OF DISCOVERY TOOLS: AN UPDATE | CHICKERING AND YANG 6 investigate link resolvers and discovery tools for federated searching and OPAC by summer 2011. Heeding Alexander Pope’s advice to “Be not the first by whom the new are tried, Nor yet the last to lay the old aside,”4 we set about discovering what systems were in use throughout North America and which features each provided. SOME HISTORY In 2006, Antelman, Lynema, and Pace observed that “library catalogs have represented stagnant technology for close to twenty years.” Better technology was needed “to leverage the rich metadata trapped in the MARC record to enhance collection browsing. The promise of online catalogs has never been realized. For more than a decade, the profession either turned a blind eye to problems with the catalog or accepted that it is powerless to fix them.” 6 Dissatisfaction with catalog search tools led us to review the VuFind Discovery Tool. While it had some useful features (spelling, “did you mean?” suggestions), it still suffered from inadequacies in full-text search and the cumbersome nature of searcher-designated Boolean searching. It did not work well in searching printed music collections and, of course, only served as a catalog front end. With this all in mind, RUL developed a set of objectives to improve information access for clients: • To provide information seekers with • an easy search option for academically valid information materials • an effective search option for academically valid information materials • a reliable search option for academically valid information materials across platforms • To recapture student academic search activity from Google • To attempt revitalizing the use of monographic collections • To provide an effective mechanism to support offerings of e -books • To build a firm platform for appropriate library support of distance learning coursework LITERATURE REVIEW Marshall Breeding first discussed broad based discovery tools in 2005, shortly after the launch of Google Scholar. He posits that federated search could not compete with the power and speed of a tool like Google Scholar. He proclaims the need for, as he describes it, a “centralized search model.”7 INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2014 7 Building on Breeding’s observations four years earlier, Diedrichs astutely observe d in 2009 that “user expectations for complete and immediate discovery and delivery of information have been set by their experiences in the Web2.0 world. Libraries must respond to the needs of those users whose needs can easily be met with Google-like discovery tools, as well as those that require deeper access to our resources.”10 In that same year, Dolski described the common situation in many academic libraries when in reference to the University of Nevada Las Vegas (UNLV) library he states, “Our library website serves as the de facto gateway to our electronic, networked content offerings. Yet usability studies have shown that findability, when given our website as a starting point, is poor. Undoubtedly this is due, at least in part, to interface fragmentation.” 11 This perfectly described the way we had come to view RUL’s situation. In 2010, Breeding reviewed the systems in the market, noting that these are not just next- generation catalogs. He stressed “equal access to content in all forms,” a concept we now take for granted. A key virtue in discovery tools, he notes, is the “blending of the full text of journal articles and books alongside citation data, bibliographic, and authority records resulting in a powerful search experience. Rather than being provided a limited number of access points selected by catalogers, each word and phrase within the text becomes a possible point of retrieval.” Breeding further points out that: “web-scale discovery platforms will blur many of the restrictions and rules that we impose on library users. Rather than having to explain to a user that the library catalog lists books and journal titles but not journal articles, users can simply begin with the concept, author, or title of interest and straightaway begin seeing results across the many formats within the library’s collection.”12 Working with freshmen at Rider University revealed that they are ahead of the professionals in approaching information this way, and we believed that web-scale discovery tools could help our users. As we began the process of selecting a discovery tool, we looked at the experiences of others. Fabbi at the University of Nevada Las Vegas (UNLV) folded in a strong component of organizational learning in a highly structured manner that was unnecessary at Rider. 13 No information was disclosed on the process of selecting a discovery vendor, though the website reveals the presence of a discovery tool (http://library.nevada.edu/). In contrast, many librarians at Rider explored a variety of libraries’ application of search tools. Following Hofmann and Yang’s work, a process of vendor demonstrations and analysis of feasibility led to a trial of EBSCO Discovery Service. What we hoped for is what Way at Grand Valley State reported in 2010 of his analysis of Serials Solutions’ Summon: http://library.nevada.edu/ EVALUATION AND COMPARISON OF DISCOVERY TOOLS: AN UPDATE | CHICKERING AND YANG 8 An examination of usage statistics showed a dramatic decrease in the use of traditional abstracting and indexing databases and an equally dramatic increase in the use of full text resources from full text database and online journal collections. The author concludes that the increase in full text use is linked to the implementation of a web‐scale discovery tool.14 METHOD Understanding both RUL’s objectives and the state of the art as reflected in the literature, we concluded that an up-to-date review of discovery tool adoptions was in order before moving forward in the process of selecting a product. 1. The resulting study included these steps: (1) compiling a list of all the major discovery tools, (2) developing a set of criteria for evaluation, (3) examining between four to seven websites where a discovery tool was deployed and evaluating each tool against each criteria, (4) Recording the findings, and (5) analyzing the data. The targeted population for the study included all the major discovery tools in use in the United States. We define a discovery tool as a library user interface independent of any library systems. A discovery tool can be used to replace the OPAC module of an integrated library system or liv e side- by-side with the OPAC. Other names for discovery tools include stand -alone OPAC, discovery layer, or discovery user interface. Lately, a discovery tool is more often called a discovery service because most are becoming subscription-based and reside remotely in a cloud-based SaaS (software as a service) model. The authors compiled a list of fourteen discovery tools based on Marshall Breeding’s “Major Discovery Products” guide published in “Library Technology Guides.”15 Those included AquaBrowser Library, Axiell Arena, BiblioCommons (BiblioCore), Blacklight, EBSCO Discovery Service, Encore, Endeca, eXtensible Catalog, SirsiDynix Enterprise, Primo, Summon, Visualizer, VuFind, and Worldcat Local. Two open-source discovery layers, SOPAC (the Social OPAC) and Scriblio, were excluded from this study because very few libraries are using them. For evaluation in this study, academic libraries were preferred over public libraries during the sample selection process. However, some discovery tools , such as BiblioCommons, were more popular among public libraries. Therefore examples of public library websites were included in the evaluation. The sites that made the final list were chosen either from the vendor’s website that maintained a customer list or Breeding’s “Library Technology Guides.”16 The following is the final list of libraries whose implementations were used in the study. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2014 9 Example Library Sites With Proprietary Discovery Tools: AquaBrowser (Serials Solutions) 1. Allen County Public Library at http://smartcat.acpl.lib.in.us/ 2. Gallaudet University Library at http://discovery.wrlc.org/?skin=ga 3. Harvard University at http://lib.harvard.edu/ 4. Norwood Young America Public Library at http://aquabrowser.carverlib.org/ 5. SELCO Southeastern Libraries Cooperating at http://aquabrowser.selco.info/?c_profile=far 6. University of Edinburgh (UK) at http://aquabrowser.lib.ed.ac.uk/ Axiell Arena (Axiell) 1. Doncaster Council Libraries (UK) at http://library.doncaster.gov.uk/web/arena 2. Lerums bibliotek (Lerums library, Sweden) at http://bibliotek.lerum.se/web/arena 3. London Libraries Consortium-Royal Kingston Library (UK) at http://arena.yourlondonlibrary.net/web/kingston 4. Norddjurs (Denmark) at https://norddjursbib.dk/web/arena/ 5. North East Lincolnshire Libraries (UK) at http://library.nelincs.gov.uk/web/arena 6. Someron kaupunginkirjasto (Finland) at http://somero.verkkokirjasto.fi/web/arena 7. Syddjurs (Denmark) at https://bibliotek.syddjurs.dk/web/arena1 BiblioCore (BiblioCommons) 1. Halton Hills Public Library at http://hhpl.bibliocommons.com/dashboard 2. New York public Library at http://nypl.bibliocommons.com/ 3. Oakville Public Library at http://www.opl.on.ca/ 4. Princeton Public Library at http://princetonlibrary.bibliocommons.com/ 5. Seattle Public Library at http://seattle.bibliocommons.com/ 6. West Perth (Australia) Public Library at http://wppl.bibliocommons.com/dashboard 7. Whatcom County library System at http://wcls.bibliocommons.com/ EBSCO Discovery Service/EDS (EBSCO) 1. Aston University (UK) at http://www1.aston.ac.uk/library/ 2. Columbia College Chicago Library at http://www.lib.colum.edu/ 3. Loyalist College at http://www.loyalistlibrary.com/ 4. Massey University (New Zealand) at http://www.massey.ac.nz/massey/research/library/library_home.cfm 5. Rider University at http://www.rider.edu/library 6. Santa Rosa Junior College at http://www.santarosa.edu/library/ 7. St. Edward's University at http://library.stedwards.edu/ Encore (Innovative Interfaces) http://smartcat.acpl.lib.in.us/ http://discovery.wrlc.org/?skin=ga http://lib.harvard.edu/ http://aquabrowser.carverlib.org/ http://aquabrowser.selco.info/?c_profile=far http://aquabrowser.lib.ed.ac.uk/ http://library.doncaster.gov.uk/web/arena http://bibliotek.lerum.se/web/arena http://arena.yourlondonlibrary.net/web/kingston https://norddjursbib.dk/web/arena/ http://library.nelincs.gov.uk/web/arena http://somero.verkkokirjasto.fi/web/arena https://bibliotek.syddjurs.dk/web/arena1 http://hhpl.bibliocommons.com/dashboard http://nypl.bibliocommons.com/ http://www.opl.on.ca/ http://princetonlibrary.bibliocommons.com/ http://seattle.bibliocommons.com/ http://wppl.bibliocommons.com/dashboard http://wcls.bibliocommons.com/ http://www1.aston.ac.uk/library/ http://www.lib.colum.edu/ http://www.massey.ac.nz/massey/research/library/library_home.cfm http://www.rider.edu/library http://www.santarosa.edu/library/ http://library.stedwards.edu/ EVALUATION AND COMPARISON OF DISCOVERY TOOLS: AN UPDATE | CHICKERING AND YANG 10 1. Adelphi University at http://libraries.adelphi.edu/ 2. Athens State University Library at http://www.athens.edu/library/ 3. California State University at http://coast.library.csulb.edu/ 4. Deakin University (Australia) at http://www.deakin.edu.au/library/ 5. Indiana State University at http://timon.indstate.edu/iii/encore/home?lang=eng 6. Johnson And Wales University at http://library.uri.edu/ 7. St. Lawrence University at http://www.stlawu.edu/library/ Endeca (Oracle) 1. John F. Kennedy Presidential Library and Museum at http://www.jfklibrary.org/ 2. North Caroline State University at http://www.lib.ncsu.edu/endeca/ 3. Phoenix Public Library at http://www.phoenixpubliclibrary.org/ 4. Triangle Research Libraries Network at http://search.trln.org/ 5. University of Technology, Sydney (Australia) at http://www.lib.uts.edu.au/ 6. University of North Carolina at http://search.lib.unc.edu/ 7. University of Ottawa (Canada) Libraries at http://www.biblio.uottawa.ca/html/index.jsp?lang=en Enterprise (SirsiDynix) 1. Cerritos College at http://cert.ent.sirsi.net/client/cerritos 2. Maricopa County Community Colleges at https://mcccd.ent.sirsi.net/client/default 3. Mountain State University/University of Charleston at http://msul.ent.sirsi.net/client/default 4. University of Mary at http://cdak.ent.sirsi.net/client/uml 5. University of the Virgin Islands at http://uvi.ent.sirsi.net/client/default 6. Western Iowa Tech Community College at http://wiowa2.ent.sirsi.net/client/default Primo (Ex Libris) 1. Aberystwyth University (UK) at http://primo.aber.ac.uk/ 2. Coventry University (UK) at http://locate.coventry.ac.uk/ 3. Curtin University (Australia) at http://catalogue.curtin.edu.au/ 4. Emory University at http://web.library.emory.edu/ 5. New York University at http://library.nyu.edu/ 6. University of Iowa at http://www.lib.uiowa.edu/ 7. Vanderbilt University at http://www.library.vanderbilt.edu Visualizer (VTLS) 1. Blinn College at http://www.blinn.edu/Library/index.htm 2. Edward Via Virginia College of Osteopathic Medicine at http://vcom.vtls.com:1177/ 3. George C. Marshall Foundation at http://gmarshall.vtls.com:6330/ 4. Scugog Memorial Public Library at http://www.scugoglibrary.ca/ http://libraries.adelphi.edu/ http://www.athens.edu/library/ http://coast.library.csulb.edu/ http://www.deakin.edu.au/library/ http://timon.indstate.edu/iii/encore/home?lang=eng http://library.uri.edu/ http://www.stlawu.edu/library/ http://www.jfklibrary.org/ http://www.lib.ncsu.edu/endeca/ http://www.phoenixpubliclibrary.org/ http://search.trln.org/ http://www.lib.uts.edu.au/ http://search.lib.unc.edu/ http://www.biblio.uottawa.ca/html/index.jsp?lang=en http://cert.ent.sirsi.net/client/cerritos https://mcccd.ent.sirsi.net/client/default http://msul.ent.sirsi.net/client/default http://cdak.ent.sirsi.net/client/uml http://uvi.ent.sirsi.net/client/default http://wiowa2.ent.sirsi.net/client/default http://primo.aber.ac.uk/primo_library/libweb/action/search.do?dscnt=1&dstmp=1326479965873&vid=ABERU_VU1&fromLogin=true http://locate.coventry.ac.uk/primo_library/libweb/action/search.do?dscnt=1&fromLogin=true&dstmp=1326480439550&vid=COV_VU1&fromLogin=true http://catalogue.curtin.edu.au/primo_library/libweb/action/search.do?dscnt=0&dstmp=1326480547980&vid=CUR&fromLogin=true http://web.library.emory.edu/ http://library.nyu.edu/ http://www.lib.uiowa.edu/ http://www.library.vanderbilt.edu/ http://www.blinn.edu/Library/index.htm http://vcom.vtls.com:1177/ http://gmarshall.vtls.com:6330/ http://www.scugoglibrary.ca/ INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2014 11 Summon (Serials Solutions) 1. Arizona State University at http://lib.asu.edu/ 2. Dartmouth College at http://dartmouth.summon.serialssolutions.com/ 3. Duke University at http://library.duke.edu/ 4. Florida State University at http://www.lib.fsu.edu/ 5. Liberty University at http://www.liberty.edu/index.cfm?PID=178 6. University of Sydney at http://www.library.usyd.edu.au/ Worldcat Local (OCLC) 1. Boise State University at http://library.boisestate.edu/ 2. Bowie State University at http://www.bowiestate.edu/academics/library/ 3. Eastern Washington University at http://www.ewu.edu/library.xml 4. Louisiana State University at http://lsulibraries.worldcat.org/ 5. Saint John's University at http://www.csbsju.edu/Libraries.htm 6. Saint Xavier University at http://lib.sxu.edu/home Examples of Open Source and Free Discovery Tools: Blacklight (the University of Virginia Library) 1. Columbia University at http://academiccommons.columbia.edu/ 2. Johns Hopkins University at https://catalyst.library.jhu.edu/ 3. North Carolina University at http://historicalstate.lib.ncsu.edu 4. Northwestern University at http://findingaids.library.northwestern.edu/ 5. Stanford University at http://www-sul.stanford.edu/ 6. University Of Hull (UK) at http://blacklight.hull.ac.uk/ 7. University of Virginia at http://search.lib.virginia.edu/ eXtensible Catalog/XC (eXtensible Catalog Organization/CARLI/University of Rochester) 1. Demo at http://extensiblecatalog.org/xc/demo 2. eXtensible Catalog Library at http://xco-demo.carli.illinois.edu/dtmilestone3 3. Kyushu University (Japan) at http://catalog.lib.kyushu-u.ac.jp/en 4. Spanish General State Authority Libraries (Spain) at http://pcu.bage.es/ 5. Thailand Cyber University/Asia Institute of Technology (Thailand) at http://globe.thaicyberu.go.th/ VuFind (Villanova University) 1. Auburn University at http://www.lib.auburn.edu/ 2. Carnegie Mellon University Libraries at http://search.library.cmu.edu/vufind/Search/Advanced http://lib.asu.edu/ http://dartmouth.summon.serialssolutions.com/ http://library.duke.edu/ http://www.lib.fsu.edu/ http://www.liberty.edu/index.cfm?PID=178 http://www.library.usyd.edu.au/ http://library.boisestate.edu/ http://www.bowiestate.edu/academics/library/ http://www.ewu.edu/library.xml http://lsulibraries.worldcat.org/search?qt=affiliate_wcl_all&q=&wcsbtn2w.x=14&wcsbtn2w.y=9 http://www.csbsju.edu/Libraries.htm http://lib.sxu.edu/home http://academiccommons.columbia.edu/ https://catalyst.library.jhu.edu/ http://historicalstate.lib.ncsu.edu/ http://findingaids.library.northwestern.edu/ http://www-sul.stanford.edu/ http://blacklight.hull.ac.uk/ http://search.lib.virginia.edu/ http://extensiblecatalog.org/xc/demo http://xco-demo.carli.illinois.edu/dtmilestone3 http://catalog.lib.kyushu-u.ac.jp/en http://pcu.bage.es/ http://globe.thaicyberu.go.th/ http://www.lib.auburn.edu/ http://search.library.cmu.edu/vufind/Search/Advanced EVALUATION AND COMPARISON OF DISCOVERY TOOLS: AN UPDATE | CHICKERING AND YANG 12 3. Colorado State University at http://lib.colostate.edu/ 4. Saint Olaf College at http://www.stolaf.edu/library/index.cfm 5. University of Michigan at http://mirlyn.lib.umich.edu 6. Western Michigan University at https://catalog.library.wmich.edu/vufind/ 7. Yale University Library at http://yufind.library.yale.edu/yufind/ The following list of criteria was used for the purpose of the evaluation. Some were based on those used by the previous studies on discovery tools.17, 18, 19 The list embodied the librarians’ vision for the next-generation catalog and contained some of the most desirable features for a modern OPAC. The authors were aware of other desirable features for a discovery layer, and the following list was by no means the most comprehensive, but it served the purpose of the study well. 1. One-stop search for all library resources. A discovery tool should include all library resources in its search including the catalog with books and videos, journal articles in databases, and local archives and digital repository. This can be accomplished by the unified index or federated search, an essential component for a discovery tool. Some of the discovery tools are described as web-scale because of their potential to search seamlessly across all library resources. 2. State-of-the-art web interface. A discovery tool should have a modern design similar to e-commerce sites, such as Google, Netflix, and Amazon. 3. Enriched content. Discovery tools should include book cover images, reviews, and user - driven input, such as comments, descriptions, ratings, and tag clouds. The enriched content can be either from library patrons, commercial sources, or both. 4. Faceted navigation. Discovery tools should allow users to narrow down the search results by categories, also called facets. The commonly used facets include locations, publication dates, authors, formats, and more. 5. Simple keyword search box with a link to advanced search at the start page. A discovery tool should start with a simple keyword search box that looks like that of Google or Amazon. A link to the advanced search should be present. 6. Simple keyword search box on every page. The simple keyword search box should appear on every page of a discovery tool. 7. Relevancy. Relevancy results criteria should take into consideration circulation statistics and books with multiple copies. More frequently circulated books indicate popularity and usefulness, and they should be ranked higher on the top of the display. A book of multiple copies may also be an indication of importance. http://lib.colostate.edu/ http://www.stolaf.edu/library/index.cfm http://mirlyn.lib.umich.edu/ https://catalog.library.wmich.edu/vufind/ http://yufind.library.yale.edu/yufind/ INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2014 13 8. “Did you mean . . . ? spell-checking. When an error appears in the search, the discovery tool should correct the query spelling as a link so that users can simply click on it to get the search results. 9. Recommendations/related materials. A discovery tool should recommend resources for readers in a similar manner to Amazon or other e -commerce sites, based on transaction logs. This should take the form of “readers who borrowed this item also borrowed the following . . . ” or a link to recommended readings. It would be ideal if a discovery tool can recommend the most popular articles, a service similar to Ex Libris ’ bX Usage-based Services. 10. User contribution. User input includes descriptions, summaries, reviews, criticism, comments, rating and ranking, and tagging or folksonomies. 11. RSS feeds. A modern OPAC should provide RSS feeds. 12. Integration with social networking sites. When a discovery tool is integrated with social networking sites, patrons can share links to library items with their friends on social networks like Twitter, Facebook, and Delicious. 13. Persistent links. Records in a discovery tool contain a stable URL capable of being copied and pasted and serving as a permanent link to that record. They are also called permanent URLs. 14. Auto-completion/stemming. A discovery tool is equipped with the computational algorithm that it can auto-complete the search words or supply a list of previously used words or phrases for users to choose from. Google has stemming algorithms. 15. Mobile compatibility. There is a difference between being “mobile compatible” and a “custom mobile website.” The former indicates a website can be viewed or used on a mobile phone, and the later denotes a different version of the user interface specially built for mobile use. In this study we include both as “yes.” 16. Functional Requirements for Bibliographic Retrieval (FRBR). The latest development of RDA certainly makes a discovery tool more desirable if it can display FRBR relationships. For instance, a discovery tool may display and link different versions, editions or formats of a work, what FRBR refers to as expressions and manifestations. For record keeping and analysis, a Microsoft Excel file with sixteen fields based on the above criteria was created. The authors checked the discovery tools on the websites of the selected libraries and recorded those features as present or absent. EVALUATION AND COMPARISON OF DISCOVERY TOOLS: AN UPDATE | CHICKERING AND YANG 14 RDA compatibility is not used as a criterion in the study because most discovery tools allow users to add RDA fields in MARC. By now, all the discovery tools should be able to display, index, and search the new RDA fields. FINDINGS One Stop searching for all library resources—This is the most desirable feature when acquiring a discovery tool. Unfortunately it also presented the biggest challenge for vendors. Both librarians and vendors have been struggling with this issue for the past several years, yet no one has worked out a perfect solution. Based on the examples the authors examined, this study found that only five out of fourteen discovery tools can retrieve articles from databases along with books, videos, and digital repositories. Those include EBSCO Discovery Service, Encore, P rimo, Summon, and WorldCat Local. Whereas Encore uses an approach similar to federated search performing live searches of databases, the other discovery tools build a single unified index. While the single unified index requires the libraries to send their catalog data and local information to the vendor for update and thus the discovery tools may fall behind in reflecting up to the minute accuracy in local holdings, federated search does real-time searching and does not lag behind in displaying current information. Both approaches are limited in what they cover. Both need permission from content providers for inclusion in the unified index or to develop a connection to article databases for real-time searching. For those discovery tools that do not have their own unified index or real-time searching capability, they provide web-scale searching through other means. For instance, VuFind has developed connectors to application programming interfaces (APIs) by Serials Solutions or OCLC to pull search results from Summon and Worldcat Local. Encore not only developed its own real- time connection to electronic databases but is enhancing its web -scale search by incorporating the unified index from other discovery tools such as the EBSCO Discovery service. AquaBrowse r is augmented by 360 Federated Search for the same purpose. Despite of those possibilities, the authors did not find the article level retrieval in the sample discovery tools other than the main five mentioned above. Comparing the coverage of each tools’ web-scale index can be challenging. EBSCO, Summon, and WorldCat Local publicize their content coverage on the web while Primo and Encore only share this information with their customers. This makes it hard to compare and evaluate the content coverage without contacting vendors and asking for that information. At present, none of the five discovery tools (EBSCO Discovery Service, Encore, Primo, Summon, and WorldCat Local) can boast 100% coverage of all library resources. In fact, none of the Internet search engines, including Google or Google Scholar, can retrieve 100% of all resources. Therefore web -scale searching is more a goal than a possibility. Apart from political and economic reasons, this is in part due to the nonbibliographic structure of the contents in databases such as SciFinder and some others. One stop searching is still a work in progress because discovery tools provide students with a quick and simple way to retrieve a large number, but still an incomplete list of resources held by a INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2014 15 library. For more in-depth research, students are still encouraged to search the catalog, discipline specific databases, and digital repositories separately. State of the art interface—All the discovery tools are very similar in appearance to amazon.com. Some are better than others. This study did not rate each discovery tool based on a scale and thus did not distinguish their fine degrees in appearance. Rather each discovery tool is given a “Yes” or “No.” The designation was based on subjective judgment. All the discovery tools received “Yes” because they are very similar in appearance. Enriched content—All the discovery tools have embedded book cover images or video jacket images, but some have displayed more, such as ratings and rankings, user -supplied or commercially available reviews, overviews, previews, comments, descriptions, title discussion, excerpts, or age suitability, just to name a few. A discovery tool may display enriched content by default out of box, but some may need to be customized to include it. The following is a list of enriched content implemented in each discovery tool that the authors found in the sample. The number in the last column indicates how many types of enriched content were found in the discovery tool at the time of the study. BiblioCommons and AquaBrowser stand out from the rest and made the top two on the list based on the number of enriched content from noncataloging sources (see figure 1). It is debatable how much nontraditional data a discovery tool should incorporate into its display. It warrants another discussion as to how useful such data is for users. Faceted navigation—Faceted navigation has become a standard feature in discovery tools over the last two years. It allows users to further divide search results into subsets based on pre- determined terms. Facets come from a variety of fields in MARC records. Some discovery tools have more facets than others. The most commonly seen facets include location or collections, publication dates, formats, author, genre, and subjects. Faceted navigation is highly configurable as many discovery tools allow libraries to decide on their own facets. Faceted navigation has become an integral part of a discovery tool. Simple keyword search box on the starting page with a link to advanced search—The original idea is to allow a library’s user interface to resemble Google by displaying a simple keyword search box with a link to advanced search at the starting page. Most discovery tools provide the flexibility for libraries to choose or reject this option. However, many librarians find this approach unacceptable as they feel it lacks precision in searching and thus may mislead users. As the keyword box is highly configurable and up to the library to decide how they will present it, many libraries have added a pull down menu with options to search keywords, authors, titles, and locations. In doing so, the original intention for a Google like simple search box is lost. Therefore only a few libraries follow the Goo gle-like box style at the starting page. Most libraries altered the simple keyword search box on the starting page to include a dropdown menu or radio buttons, so the simple keyword search box is neither simple nor limited to keyword search only. Nevertheless, this study gave all the discovery tools a “Yes.” All the systems are capable of this feature even though libraries may choose not to use it. EVALUATION AND COMPARISON OF DISCOVERY TOOLS: AN UPDATE | CHICKERING AND YANG 16 Rank Discovery Tool Enriched Content Total 1 BiblioCommons Cover images, tags, similar title, private note, notices, age suitability, summary, quotes, video, comments, and rating 11 2 AquaBrowser Cover images, previews, reviews, summary, excerpts, tags, author notes & sketches, full text from Google, rating/ranking 9 3 Enterprise Cover images, reviews, Google previews, summary, excepts 5 4 Axiell Arena Cover images, tags, reviews, and title discussion 4 VuFind Cover images, tags, reviews, comments 4 5 Primo Cover images, tags, previews 3 WorldCat Local Cover images, tags, reviews 3 6 Encore Cover images, tags 2 Visualizer Cover images, reviews 2 Summon Cover images, reviews 2 7 Blacklight Cover images 1 EBSCO Discovery Service Cover images 1 Endeca Cover images 1 eXtensible Catalog Cover images 1 Figure 1. The Ranked List of Enriched Content in Discovery Tools . Simple keyword search box on every page—The feature enables a user to start a new search at every step of navigation in the discovery tool. Most of the discovery tools provide such a box on the top of the screen as users navigate through the search results and record displays except eXtensible Catalog and Enterprise by SirsiDynix. The feature is missing from the former while the latter almost has this feature except when displaying bib records in a pop-up box. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2014 17 Relevancy—Traditionally, relevancy is uniformly based on a computer algorithm that calculates the frequency and relative position of a keyword (field weighting) in a record and displays the search results based on the final score. Other factors have never been a part of the decision in the display of search results. In the discussion on next-generation catalogs, relevancy based on circulation statistics and other factors came up as a desirable possibility, and no discovery tool has met this challenge until now. Primo by Ex Libris is the only one among the discovery tools under investigation that can sort the final results by popularity. “Primo’s popularity ranking is calculated by use. This means that the more an item record has been clicked and viewed, the more popular it is.”20 Even though those are not real circulation statistics, this is considered to be a revolutionary step and a departure from traditional relevancy. Three years ago none of the discovery tools provided this option.21 To make relevancy ranking even more sophisticated, ScholarRank, another service by Ex Libris, can work with Primo to sort the search results not only based on a query match but also an item’s value score (its usage and number of citations) and a user’s characteristics and information needs. This shows the possibility of more advanced relevancy ranking in discovery tools. Other vendors will most likely follow in the future incorporating more sophistication in their relevancy algorithms. Spell checker/“Did you mean . . . ?”—The most commonly observed way of correcting a misspelling in a query is, “Did you mean . . . ?” but there are other variations providing the same or similar services. Some of those variations are very user-friendly. The following is a list of different responses when a user enters misspelled words (see figure 2). “xxx” represents the keyword being searched. EVALUATION AND COMPARISON OF DISCOVERY TOOLS: AN UPDATE | CHICKERING AND YANG 18 Discovery tools Responses for Misspelled Search Words Notes AcquaBrowser Did you mean to search: xxx, xxx, xxx? The suggested words are hyper- links to execute new searches. Axiell Arena Your original search for xxx has returned no hits. The fuzzy search returned N hits. Automatically displays a list of hits based on fuzzy logic. “N” is a number. BiblioCommons Did you mean xxx (N results)? Displays suggested word along with the number of results as a link. Blacklight No records found. No spell checker, but possible to add by local technical team. EBSCO Discovery Service Results may also be available for xxx. The suggested word is a link to execute a new search. Encore Did you mean xxx? The suggested word is a link to execute a new search. Endeca Did you mean xxx? The suggested word is a link to execute a new search. Enterprise Did you mean xxx? The suggested word is a link to execute a new search. eXtensible Catalog Sorry, no results found for: xxx. No spell checker, but possible to add by local technical team. Primo Did you mean xxx? The suggested word is a link to execute a new search. Summon Did you mean xxx? The suggested word is a link to execute a new search. Visualizer Did you mean xxx? The suggested word is a link to execute a new search. VuFind 1. No results found in this category. Search alternative words: xxx, xxx, xxx. 2. Perhaps you should try some spelling variation: xxx, xxx, xxx. 3. Your search xxx did not match any resources. What should I do now? A list of suggestions including checking a web dictionary. 1. Alternative words are links to execute new searches. 2. Suggested words are links to execute new searches. 3. Suggestions what to do next. WorldCat Local Did you mean xxx? The suggested word is a link to execute a new search. Figure 2. Spell Checker. Most of the discovery tools on the list provide this feature except Blacklight and eXtensible Catalog. Open-source solutions sometimes provide a framework that you add features to. This leaves many INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2014 19 possibilities for local developers to add and develop. For instance, a diction ary or spell checker may be easily installed even if a discovery tool does not come with one out of the box. This feature may be configurable. 9. Recommendation—Amazon has one of those search engines with a recommendation system such as “customers who bought item A also bought item B.” The ecommerce recommendation algorithms analyze the activities of shoppers on the web and build a database of buyer profiles. The recommendations are made based on shopper behavior. When this applies to the library content, it could become “readers who were interested in item A were also interested in item B .” However, most discovery tools do not have a recommendation system. Instead, they have adopted different approaches. Most discovery tools make recommendations from bibliographic data in MARC records such as subject headings for similar items. Primo is one of the few discovery tools with a recommendation system similar to those used by Amazon and other Internet commercial sites. Its bX Article Recommender Service is based on usage patterns collected from its link resolver, SFX. Developed by Ex Libris, bX is an independent service that integrates with Primo well, but can serve as an add-on function for other discovery tools. bX is an excellent example that discovery tools can suggest new leads and directions for scholars in their research. The authors counted all the discovery tools that provide some kind of recommendations regardless of their technological approaches using MARC data or algorithms. Ten out of fourte en discovery tools provide this feature in various forms (see figure 3). Those include Axiell Arena, BiblioCommons, EBSCO Discovery Service, Encore, Endeca, eXtensible Catalog, Primo, Summon, WorldCat Local, and VuFind. The following are some of the recommendations found in those discovery tools. The authors did not find any recommendation in the libraries that use AquaBrowser, Enterpri se, Visualizer, or Blacklight. Discovery Tools Language Used for Recommending or linking to Related Items Axiell Arena “See book recommendations on this topic” “Who else writes like this?” BiblioCommons “Similar titles & subject headings & lists that include this title” EBSCO Discovery Service “Find similar results” Encore “Other searches you may try” “Additional Suggestions” Endeca “Recommended titles for. . . . View all recommended titles that match your search” EVALUATION AND COMPARISON OF DISCOVERY TOOLS: AN UPDATE | CHICKERING AND YANG 20 “More like this” eXtensive Catalog “More like this” “Searches related to . . . ” Primo “Suggested new searches by this author” “Suggested new searches by this subject” “Users interested in this article also expressed an interest in the following:” Summon “Search related to . . . ” Worldcat Local “More like this” “Similar items” “Related subjects” “User lists with this item” VuFind “More like this” “Similar items” “Suggested topics” “Related subjects” Figure 3. Language Used for Recommendation. Some discovery tool recommendations are designed in a more user friendly manner than others. Most recommendations exist exclusively for items. Ideally, a discovery tool should provide an article recommendation system like Ex Libris’ bX Usage-based Service that will show users the most frequently used and most popular articles. At the time of this evaluation, no discovery tool has incorporated an article recommendation system except Primo. Research is needed to evaluate how patrons utilize recommendation services or if they find recommendations beneficial in discovery tools. User contribution—Traditionally, bibliographic data has been safely guarded by cataloging librarians for quality control. It has been unthinkable that users would be allowed to add data to library records. The Internet has brought new perspectives on this issue. Half of the discovery tools (7) under evaluation provide this feature to varying degrees (see figure 4). Designed primarily for public libraries, BiblioCommons seems the most open to user -supplied data among INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2014 21 all the discovery tools. Many other discovery tools (7) allow users to contribute tags and reviews. All the discovery tools allow librarians to censor user -supplied data before releasing it for public display. The following figure is a summary of the types of data these discove ry tools allow users to enter. Ranking Discovery Tool User Contribution 1 BiblioCommons Tags, similar title, private note, notices, age suitability, summary, quotes, video, comments, and ratings (10) 2 AquaBrowser Tags, reviews, and ratings/rankings (3) Axiell Arena Tags, reviews, and title discussions (3) VuFind Tags, reviews, comments (3) 3 Primo Tags and reviews (2) WorldCat Local Tags and reviews (2) 4 Encore tags (1) 5 Blacklight (0) Endeca (0) Enterprise (0) Extensible Catalog (0) Summon (0) Visualizer (0) Figure 4. Discovery Tools Based on User Contribution. Past research indicates that folksono mies or tags are highly useful.22 They complement library- controlled vocabularies, such as Library of Congress Subject Headings, and increase access to library collections. A few discovery tools allow user entered tags to form “word clouds.” The relative importance of tags in a word cloud is emphasized by font color and size. A tag list is another way to organize and display tags. In both cases , tags are hyperlinked to a relevant list of items. Some tags serve as keywords to start new searches, while others narrow search results. Only four discovery tools, AquaBrowser, Encore, Primo, and WorldCat Local, provide both tag clouds and lists. BiblioCommons provides only tag lists for the same purpose. The rest of the discovery tools do not have either. One setback of user-supplied tags for subject access is their EVALUATION AND COMPARISON OF DISCOVERY TOOLS: AN UPDATE | CHICKERING AND YANG 22 incomplete nature. They may lead users to partial retrieval of information as users add tags only to items that they have used. The coverage is not systematic and inclusive of all collections. Therefore data supplied by users in discovery tools remains controversial. It is possible to seed systems with folksonomies using services like LibraryThing for Libraries, which could reduce the impact of this issue. RSS Feed/email Alerts—This feature can automatically send a list of new library resources to users based on his or her search criteria. It can be useful for experienced researchers or frequent library users. Some discovery tools may use email alerts as well. Eight out of fourteen discovery tools in this evaluation provide RSS feeds. Those with RSS feeds include AquaBrowser, Axiell Arena, EBSCO Discovery Service, Endeca, Enterprise, Primo, Summon, and VuFind. An RSS feed can be added as a plug-in in some discovery tools if it does not come as part of the base system. Integration with social networking sites—As most of the college students participate in social networking sites, this feature provides an easy way to share resources among college s tudents on social networking sites. Users can place the link to a resource by clicking on an icon in the discovery tool and share the resource with friends on Facebook, Twitter, Delicious and many other social network sites. Nine out of the fourteen discovery tools provide this feature. Some discovery tools provide integration possibilities with many more social networking sites than others. Those with this feature include AquaBrowser, Axiell Arena, BiblioCommons, EBSCO Discovery Service, Encore, Endeca, Primo, WorldCat Local, and eXtensible Catalog. So far , the interaction between discovery tools and social networking sites is limited to sharing resources. Social networking sites should be carefully evaluated for the possibility of integra ting some of their popular features into discovery tools. Persistent link—This is also called permanent link or permURL. Not all the links displayed in a browser location box are persistent links, therefore some discovery tools specifically provid e a link in the records for users to copy and keep. Five out of fourteen discovery tools explicitly listed this link in records. Those include AquaBrowser, Axiell Arena, Blacklight, EBSCO Discovery Service, and WorldCat Local. The authors marked a system a s “No” when a permanent link is not prominently displayed in a discovery tool. In other words, only those discovery tools that explicitly provide a persistent link are counted as “Yes.” However, the URL in a browser’s location box during the display of a record may serve as a persistent link in some cases. For instance, VuFind does not provide a permanent URL in the record, but indicates on the project site that URL in the location box is a persistent link. Auto-completion/stemming—When a user types in keywords in the search box, the discovery tool will supply a list of words or phrases that she or he can choose readily. This is a highly useful feature that Google excels at. Stemming not only automatically completes the spelling of a keyword, but also supplies a list of phrases that point to existing items. The authors found this feature in six out of fourteen discovery tools. They include Axiell Arena, Endeca, Enterprise, Extensible Catalog, Summon, and WorldCat Local. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2014 23 Mobile Interface—The terms “mobile compatible” and “mobile interface” are two different concepts. A mobile interface is a simplified version of a normal browser version of a discovery tool interface so it is optimized for use on mobile phones , and the authors only counted those discovery tools that have a separate mobile interface. A discovery tool may be mobile friendly or compatible and does not necessarily need a separate mobile interface. Many discovery tools, such as EBSCO, can detect the request from a mobile phone and automatically direct the request to the mobile interface. Eleven out of fourteen claim to provide a separate mobile interface. Blacklight, Enterprise, and eXtensible Catalog do not seem to have a separate mobile interface even though they may be mobile friendly. FRBR—FRBR groupings denote the relationships between work, manifestation, expression, and items. For instance, a search will not only retrieve a title, but different editions and formats of the work. Only three discovery tools can display FRBR relationships: eXtensible Catalog (open source), Primo by Ex Libris, and Worldcat Local by OCLC. So far , most discovery tools are not capable of displaying the manifestations and expressions of a work in a meaningful way. From the user’s point of view, this feature is highly desirable. Figure 5 is a screenshot from Primo demonstrating displays indicating a large number of different adaptations of the work “Romeo and Juliet.” Figure 6 displays the same intellectual work in different manif estations such as DVD, VHS, books, and more. Figure 5. Display of FRBR Relationships in Primo . EVALUATION AND COMPARISON OF DISCOVERY TOOLS: AN UPDATE | CHICKERING AND YANG 24 Figure 6. Different Versions of the Same Work in Primo . SUMMARY The following are the summary tables of our comparison and evaluation. Proprietary and open source programs are listed separately in these tables. The total number of features the authors found in a particular discovery tool is displayed at the end of the column. Proprietary discovery tools seem to have more advanced characteristics of a modern d iscovery tool than the open- source counterparts. The open-source program Blacklight displays fewer advanced features, but seems flexible for users to add features. See figures 7, 8, and 9. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2014 25 Figure 7. Proprietary Discovery Tools. Aqua- Brower Axiell Arena Biblio- commons EBSCO/ EDS Encore Endeca 1. Single point of search No No No Yes Yes No 2. State of the art interface Yes Yes Yes Yes Yes Yes 3. Enriched content Yes Yes Yes Yes Yes Yes 4. Faced Navigation Yes Yes Yes Yes Yes Yes 5. Simple keyword search box on the starting page Yes Yes Yes Yes Yes Yes 6. Simple keyword search box on every page Yes Yes Yes Yes Yes Yes 7. Relevancy No No No No No No 8. Spell checker/ “Did you mean . . . ?” Yes Yes Yes Yes Yes Yes 9. Recommendation No Yes Yes Yes Yes Yes 10. User contribution Yes Yes Yes No Yes No 11. RSS Yes Yes No Yes No Yes 12. Integration with social network sites Yes Yes Yes Yes Yes Yes 13. Persistent links Yes Yes No Yes No No 14. Stemming/auto- complete No Yes No No No Yes 15. Mobile interface Yes Yes Yes Yes Yes Yes 16. FRBR No No No No No No Total 11/16 13/16 10/16 12/16 11/16 11/16 EVALUATION AND COMPARISON OF DISCOVERY TOOLS: AN UPDATE | CHICKERING AND YANG 26 Enterprise Primo Summon Visualizer Worldcat Local 1. Single point of search No Yes Yes No Yes 2. State of the art interface Yes Yes Yes Yes Yes 3. Enriched content Yes Yes Yes Yes Yes 4. Faced Navigation Yes Yes Yes Yes Yes 5. Simple keyword search box on the starting page Yes Yes Yes Yes Yes 6. Simple keyword search box on every page No Yes Yes Yes Yes 7. Relevancy No Yes No No No 8. Spell checker/ Did you mean...? Yes Yes Yes Yes Yes 9. Recommendation No Yes Yes No Yes 10. User contribution No Yes No No Yes 11. RSS Yes Yes Yes No No 12. Integration with social network sites No Yes No No Yes 13. Persistent links No No No No Yes 14. Stemming/auto- complete Yes No Yes No Yes 15. Mobile interface No Yes Yes Yes Yes 16. FRBR No Yes No No Yes Total 7/16 14/16 11/16 7/16 14/16 Figure 8. Proprietary Discovery Tools (Continued). Blacklight eXtensible Catalog VuFind 1. One point of search No No No 2. State of the art interface Yes Yes Yes 3. Enriched content Yes Yes Yes 4. Faceted Navigation Yes Yes Yes 5. Simple keyword search box on the starting page Yes Yes Yes 6. Simple keyword search box on every page Yes Yes Yes 7. Relevancy No No No 8. Spell checker/Did you mean ...? No No Yes 9. Recommendation No Yes Yes 10. User contribution No No Yes 11. RSS No No Yes 12. Integration with social network sites No Yes No INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2014 27 13. Persistent links Yes No No 14. Stemming/auto-complete No Yes No 15. Mobile interface No No Yes 16. FRBR No Yes No Total 6/16 9/16 10/16 Figure 9. Free and Open-Source Discovery Tools. As one-stop searching is the core of a discovery tool, this consideration placed five discovery tools above the rest: Encore, EBSCO Discovery Service, Primo, Summon, and WorldCat Local ( see figure 10). These five are web-scale discovery services. All of them use their native unified index except Encore, which has incorporated the EBSCO Unified Index in its search. Despite of great progress made in the past three years in one-stop searching, none of the discovery to ols can truly search across all library resources—all of them have some limitations as to the coverage of content. Each unified index may cover different databases as well as overlap each other in many areas. One possible solution may lie in a hybrid approach that combines a unified index with federated search (also called real-time discovery). Those old and new technologies may work well when complementing each other. It remains a challenge if libraries will ever have one-stop searching in its true sense. Discovery Tools One-Stop Searching Encore Yes EBSCO Discovery Service Yes Primo Yes Summon Yes WorldCat Local Yes Figure 10. The Discovery Tools Capable of One Stop Searching . It is also worth mentioning that one-stop searching is a vital and central piece of discovery tools. Those discovery tools without a native unified index or connectors to databases for real -time searching are at a disadvantage. Therefore discovery tools that do not provide web -scale searching are investigating various possibilities to incorporate one-stop searching. Some are drawing on the unified indexes of those discovery tools that have them through connectors to the application programming interfaces (APIs) of those products. For instance, VuFind in cludes connectors to the APIs of a few other systems that have a unified index or vast resources such as Summon and Worldcat. Blacklight may provide one-stop searching through the Primo API. Such a practice may present other problems such as calculating relevancy ranking across resources that may not live in the same centralized index, thus not achieving fully balanced relevancy ranking. Nevertheless, discovery tool developers are working hard to achieve one-stop searching. As a unified index can be shared across discovery tools, in the next few years, more and more discovery services may offer one-stop searching. EVALUATION AND COMPARISON OF DISCOVERY TOOLS: AN UPDATE | CHICKERING AND YANG 28 Based on the count of the sixteen criteria in the checklist, we ranked Primo and WorldCat Local as the top two discovery tools. Based on our criteria , Primo has two unique features that make it stand out: relevancy enhanced by usage statistics and value score and the FRBR relationship display. WorldCat Local and Extensible Catalog are the other two discov ery tools that can display FRBR relationships (see figure 11). Rank Discovery tools Number of Advanced Features 1 Primo and WorldCat Local 14/16 2 Axiell Arena 13/16 3 EBSCO Discovery Service 12/16 4 AquaBrowser, Encore, and Endeca 11/16 5 BiblioCommons, Summon, and VuFind 10/16 6 eXtensible Catalog 9/16 6 Enterprise and Visualizer 7/16 7 Blacklight 6/16 Figure 11. Ranked Discovery Tools. LIMITATIONS As discovery tools are going through new releases and improvements, what is true today may b e false tomorrow. Discovery tools constantly improve and evolve , and many features are not included in this evaluation, such as integration with Google Maps for the location of an item and user-driven acquisitions. Innovations are added to discovery tools constantly. This study only covers the most common features that the library community agreed upon as those that a discovery tool should have. Some open-source discovery tools may provide a skeleton of an application that leaves the code open for users to develop new features. Therefore different implementations of an open-source discovery tool may encompass totally different features that are not part of the core application. For instance, the University of Virginia developed Virgo based on Blacklight, adding many advanced features. Thus it is quite a challenge to distinguish what comes with the software and what are local developments. This study focused on the user interface of discovery tools. What are not included are content coverage, application administration, and searching capability of the discovery tools. Those three are important factors when choosing a discovery tool. CONCLUSION Search technology has evolved far beyond federated searching. The concept of a “Next Generation Catalog” has merged with this idea, and spawned a generation of discovery tools bringing almost Google-like power to library searching. The problems facing libraries now are the intelligent INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2014 29 selection of a tool that fits their contexts, and structuring a process to adopt a nd refine that tool to meet the objectives of the library. Our findings indicate that Primo and WorldCat Local have better user interfaces, displaying more advanced features of a Next Generation Catalog than their peers. For RUL, EBSCO Discovery Service (EDS) provides something approaching the ease of Google searching from either a single search box or a very powerful advanced search. Being aware of the limitations noted above, Rider’s libraries elected to continue displaying traditional search options in addition to what we’ve branded “Library One Search.” Another issue we discovered in this process is when negotiating for a vendor-hosted test, libraries must be sure that the test period begins when the configuration is complete rather than only whe n the data load begins. All phases of the project took far more time than anticipated. The client institution’s implementation coordinator or team needs to be reviewing the progress on a daily basis and communicating often with the vendor-based implementation team. With the evaluative framework this study provides, libraries moving toward discovery tools should consider changing capabilities of the available discovery tools to make informed choices. REFERENCES 1. Jason Vaughan, “Investigations into Library Web-Scale Discovery Services,” Information Technology & Libraries 31, no. 1 (2012): 32–33, http://dx.doi.org/10.6017/ital.v31i1.1916. 2. Sharon Q. Yang and Melissa A. Hofmann, “Next Generation or Current Generation? A Study of the OPACs of 260 Academic Libraries in the USA and Canada,” Library Hi Tech 29 no. 2 (2011): 266–300. 3. Melissa A. Hofmann and Sharon Q. Yang, “‘Discovering’ What’s Changed: A Revisit of the OPACs of 260 Academic Libraries,” Library Hi Tech 30, no. 2 (2012): 253–74. 4. Alexander Pope, “Alexander Pope Quotes,” http://www.brainyquote.com/quotes/authors/a/alexander_pope.html. 5. F. William Chickering, “Linking Information Technologies: Benefits and Challenges,” Proceedings of the 4th International Conference on New Information Technologies, Budapest, Hungary, December 1991, http://web.simmons.edu/~chen/nit/NIT%2791/019-chi.htm. 6. Kristin Antelman, Emily Lynema, and Andrew K. Pace, “Toward a Twenty-First Century Library Catalog,” Information Technology & Libraries 25, no. 3, (2006): 128-39, http://dx.doi.org/10.6017/ital.v25i3.3342. 7. Marshall Breeding, “Plotting a New Course for Metasearch,” Computers in Libraries 25, no. 2 (2005): 27–29. http://www.brainyquote.com/quotes/authors/a/alexander_pope.html http://web.simmons.edu/~chen/nit/NIT%2791/019-chi.htm EVALUATION AND COMPARISON OF DISCOVERY TOOLS: AN UPDATE | CHICKERING AND YANG 30 8. Judith Carter, “Discovery: What Do You Mean by That?” Information Technology & Libraries 28, no. 4 (2009): 161–63, http://dx.doi.org/10:6017/ital.v28i4.3326. 9. Priscilla Caplan, “On Discovery Tools, OPACs and the Motion of Library Language,” Library Hi Tech 30, no. 1 (2012): 108–15. 10. Carol Pitts Diedrichs, “Discovery and Delivery: Making it Work for Users,” Serials Librarian 56, no. 1–4 (2009): 79, http://dx.doi.org/10.1080/03615260802679127. 11. Alex A. Dolski, “Information Discovery Insights Gained from MultiPAC, a Prototype Library Discovery System,” Information Technology & Libraries 28, no. 4, (2009): 173, http://dx.doi.org/10.6017/ital.v28i4.3328. 12. Marshall Breeding, “The State of the Art in Library Discovery,” Computers in Libraries 30, no. 1 (2010): 31–34. 13. Jennifer L. Fabbi, “Focus as Impetus for Organizational Learning,” Information Technology & Libraries 28, no. 4 (2009): 164–71, http://dx.doi.org/10.6017/ital.v28i4.3327. 14. Douglas Way, “The Impact of Web-scale Discovery on the Use of a Library Collection,” Serials Review 36, no. 4: (2010): 214–20, http://dx.doi.org/10.1016/j.serrev.2010.07.002. 15. Marshall Breeding, “Library Technology Guides: Discovery Products,” http://www.librarytechnology.org/discovery.pl. 16. Ibid. 17. Sharon Q. Yang and Kurt Wagner, “Evaluating and Comparing Discovery Tools: How Close Are We towards Next Generation Catalog?” Library Hi Tech 28, no. 4 (2010): 690–709. 18. Yang and Hofmann, “Next Generation or Current Generation? ” 266–300. 19. Melissa A. Hofmann and Sharon Q. Yang, “How Next-Gen R U? A Review of Academic OPACS in the United States and Canada,” Computers in Libraries 31, no. 6 (2010): 26–29. 20. Brown Library of Virginia Western Community College, “Primo-Frequently Asked Questions,” http://www.virginiawestern.edu/library/primo -faq.php#popularity_ranking. 21. Yang and Wagner, “Evaluating and Comparing Discovery Tools,” 690–709. 22. Yanyi Lee and Sharon Q. Yang, “Folksonomies as Subject Access—A Survey of Tagging in Library Online Catalogs and Discovery Layers,” paper presented at IFLA post-conference “Beyond Libraries-Subject Metadata in the Digital Environment and Semantic Web ,” Tallinn, EstoniaI, 18 August 2012, http://www.nlib.ee/html/yritus/ifla_jarel/papers/4-1_Yan.docx http://athena.rider.edu:2054/eds/viewarticle?data=dGJyMPPp44rp2%2fdV0%2bnjisfk5Ie42eiK6tmvSK6k63nn5Kx94um%2bSa2otkewpq9Lnqe4SK%2bws0yexss%2b8ujfhvHX4Yzn5eyB4rOrSbGutEq1r7U%2b6tfsf7vb7D7i2Lt94unjhO6c8nnls79mpNfsVdGmrlG2rbdJsaeuSK6mtlCwnOSH8OPfjLvc84Tq6uOQ8gAA&hid=20 http://www.librarytechnology.org/discovery.pl http://www.virginiawestern.edu/library/primo-faq.php#popularity_ranking http://www.nlib.ee/html/yritus/ifla_jarel/papers/4-1_Yan.docx 3793 ---- Editorial Board Thoughts: Libraries as Makerspace? Tod Colegrove INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2013 2 Recently there has been tremendous interest in “makerspace” and its potential in libraries: from middle school and public libraries to academic and special libraries, the topic seems very much top of mind. A number of libraries across the country have been actively expanding makerspace within the physical library and exploring its impact; as head of one such library, I can report that reactions to the associated changes have been quite polarized. Those from the supported membership of the library have been uniformly positive, with new and established users as well as principal donors immediately recognizing and embracing its potential to enhance learning and catalyze innovation; interestingly, the minority of individuals that recoil at the idea have been either long-term librarians or library staff members. I suspect the polarization may be more a function of confusion over what makerspace actually is. This piece offers a brief overview of the landscape of makerspace—a glimpse into how its practice can dramatically enhance traditional library offerings, revitalizing the library as a center of learning. Been Happening for Thousands of Years . . . Dale Dougherty, founder of MAKE magazine and Maker Faire, at the “Maker Monday” event of the 2013 American Library Association Midwinter Meeting framed the question simply, “whether making belongs in libraries or whether libraries can contribute to making.” More than one audience member may have been surprised when he continued, “It’s already been happening for hundreds of years—maybe thousands.”1 The O’Reilly/DARPA Makerspace Playbook describes the overall goals and concept of makerspace (emphasis added): “By helping schools and communities everywhere establish Makerspaces, we expect to build your Makerspace users' literacy in design, science, technology, engineering, art, and math. . . . We see making as a gateway to deeper engagement in science and engineering but also art and design. Makerspaces share some aspects of the shop class, home economics class, the art studio and science lab. In effect, a Makerspace is a physical mashup of these different places that allows projects to integrate these different kinds of skills.”2 Building users’ literacies across multiple domains and a gateway to deeper engagement? Surely these are core values of the library; one might even suspect that to some degree libraries have long been makerspace. A familiar example of maker activity in libraries might include digital media: still/video photography and audio mastering and remixing. YOUmedia network, funded by the Macarthur Patrick “Tod” Colegrove (pcolegrove@unr.edu), a LITA member, is Head of the DeLaMare Science & Engineering Library at the University of Nevada, Reno, Nevada. mailto:pcolegrove@unr.edu EDITORIAL BOARD THOUGHTS: LIBRARIES AS MAKERSPACE? | COLEGROVE 3 Institute through the Institute of Museum and Library Services, is a recent example of such effort aimed at creating transformative spaces; engaged in exploring, expressing, and creating with digital media, youth are encouraged to “hang out, mess around, and geek out.” A more pedestrian example is found in the support of users with first-time learning or refreshing of computer programming skills. As recently as the 1980s, the singular option the library had was to maintain a collection of print texts. Through the 1990s and into the early 2000s, that support improved dramatically as publishers distributed code examples and ancillary documents on accompanying CD or DVD media, saving the reader the effort of manually typing in code examples. The associated collections grew rapidly, even as the overhead associated with the maintenance and weeding of a collection that was more and more rapidly obsoleted grew more. Today, e-book versions combined with ready availability of computer workstations within the library, and the rapidly growing availability of web-based tutorials and support communities, render a potent combination that customers of the library can use to quickly acquire the ability to create or “make” custom applications. With the migration of the supporting print collections online, the library can contemplate further support in the physical spaces opened up. Open working areas and whiteboard walls can further amplify the collaborative nature of such making; the library might even consider adding popular hardware development platforms to its collection of lendable technology, enabling those interested to check out a development kit rather than purchase on their own. After all, in a very real sense that is what libraries do—and have done, for thousands of years: buy sometimes expensive technology tailored to the needs and interest of the local community and make it available on a shared basis. Makerspace: a continuum Along with outreach opportunities, the exploration of how such examples can be extended to encompass more of the interests supported by the library is the essence of the maker movement in libraries. Makerspace encompasses a continuum of activity that includes “co-working,” “hackerspace,” and “fab lab”; the common thread running through each is a focus on making rather than merely consuming. It is important to note that although the terms are often incorrectly used as if they were synonymous, in practice they are very different: for example, a fab lab is about fabrication. Realized, it is a workshop designed around personal manufacture of physical items— typically equipped with computer controlled equipment such as laser cutters, multiple axis Computer Numerical Controlled (CNC) milling machines, and 3D printers. In contrast, a “hackerspace” is more focused on computers and technology, attracting computer programmers and web designers, although interests begin to overlap significantly with the fab lab for those interested in robotics. Co-working space is a natural evolution for participants of the hackerspace; a shared working environment offering much of the benefit of the social and collaborative aspects of the informal hackerspace, while maintaining a focus on work. As opposed to the hobbyist that might be attracted to a hackerspace, co-working space attracts independent contractors and professionals that may work from home. INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2013 4 It is important to note that it is entirely possible for a single makerspace to house all three subtypes and be part hackerspace, fab lab, and co-working space. Can it be a library at the same time? To some extent, these activities are likely already ongoing within your library, albeit informally; by recognizing and embracing the passions driving those participating in the activity, the library can become central to the greater community of practice. Serving the community’s needs more directly, opportunities for outreach will multiply even as it enables the library to develop a laser-sharp focus on the needs of that community. Depending on constraints and the community of support, the library may also be well-served by forming collaborative ties with other local makerspace; having local partners can dramatically improve the options available to the library in day-to-day practice, and better inform the library as it takes well-chosen incremental steps. With hackerspace/co-working/fab lab resources aligned with the traditional resources of the library, engagement with one can lead naturally to the other in an explosion of innovation and creativity. Renaissance In addition to supporting the work of the solitary reader, “today's libraries are incubators, collaboratories, the modern equivalent of the seventeenth-century coffeehouse: part information market, part knowledge warehouse, with some workshop thrown in for good measure.”3 Consider some of the transformative synergies that are already being realized in libraries experimenting with makerspace across the country: • A child reading about robots able to go hands-on with robotics toolkits, even borrowing the kit for an extended period of time along with the book that piqued the interest; surely such access enables the child to develop a powerful sense of agency from early childhood, including a perception of self as being productive and much more than a consumer. • Students or researchers trying to understand or make sense of a chemical model or novel protein strand able not only to visualize and manipulate the subject on a two-dimensional screen, but to relatively quickly print a real-world model to be able and tangibly explore the subject from all angles. • Individuals synthesizing knowledge across disciplinary boundaries able to interact with members of communities of practice in a non-threatening environment; learning, developing, and testing ideas—developing rapid prototypes in software or physical media, with a librarian at the ready to assist with resources and dispense advice regarding intellectual property opportunities or concerns. The American Libraries Association estimates that as of this printing there are approximately 121,169 libraries of all kinds in the United States today; if even a small percentage recognize and begin to realize the full impact that makerspace in the library can have, the future looks bright indeed. EDITORIAL BOARD THOUGHTS: LIBRARIES AS MAKERSPACE? | COLEGROVE 5 REFERENCES 1. Dale Dougherty, “The New Stacks: The Maker Movement Comes to Libraries” (presentation at the Midwinter Meeting of the American Library Association, Seattle, Washington, January 28, 2013). http://alamw13.ala.org/node/10004. 2. Michele Hlubinka et al., Makerspace Playbook, December 2012, accessed February 13, 2012, http://makerspace.com/playbook. 3. Alex Soojung-Kim Pang, "If Libraries did not Exist, It Would be Necessary to Invent Them," Contemplative Computing, February 6, 2012, http://www.contemplativecomputing.org/2012/02/if-libraries-did-not-exist-it-would-be- necessary-to-invent-them.html. http://alamw13.ala.org/node/10004 http://makerspace.com/playbook http://www.contemplativecomputing.org/2012/02/if-libraries-did-not-exist-it-would-be-necessary-to-invent-them.html http://www.contemplativecomputing.org/2012/02/if-libraries-did-not-exist-it-would-be-necessary-to-invent-them.html 3670 ---- Automatic Extraction of Figures from Scientific Publications in High-Energy Physics Piotr Adam Praczyk, Javier Nogueras-Iso, and Salvatore Mele INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2013 25 ABSTRACT Plots and figures play an important role in the process of understanding a scientific publication, providing overviews of large amounts of data or ideas that are difficult to intuitively present using only the text. State-of-the-art digital libraries, which serve as gateways to knowledge encoded in scholarly writings, do not yet take full advantage of the graphical content of documents. Enabling machines to automatically unlock the meaning of scientific illustrations would allow immense improvements in the way scientists work and the way knowledge is processed. In this paper, we present a novel solution for the initial problem of processing graphical content, obtaining figures from scholarly publications stored in PDF. Our method relies on vector properties of documents and, as such, does not introduce additional errors, unlike methods based on raster image processing. Emphasis has been placed on correctly processing documents in high-energy physics. The described approach distinguishes different classes of objects appearing in PDF documents and uses spatial clustering techniques to group objects into larger logical entities. Many heuristics allow the rejection of incorrect figure candidates and the extraction of different types of metadata. INTRODUCTION Notwithstanding the technological advances of large-scale digital libraries and novel technologies to package, store, and exchange scientific information, scientists’ communication pattern has changed little in the past few decades, if not the past few centuries. The key information of scientific articles is still packaged in a form of text and, for several scientific disciplines, in a form of figures. New semantic text-mining technologies are unlocking the information in scientific discourse, and there exist some remarkable examples of attempts to extract figures from scientific publications,1 but current attempts do not provide a sufficient level of generality to deal with figures from high- energy physics (HEP) and cannot be applied in a digital library like INSPIRE, which is our main Piotr Adam Praczyk (piotr.praczyk@gmail.com) is a PhD student at Universidad de Zaragoza, Spain, and research grant holder at the Scientific Information Service of CERN, Geneva, Switzerland. Javier Nogueras-Iso (jnog@unizar.es) is Associate Professor, Computer Science and Systems Engineering Department, Universidad de Zaragoza, Spain. Salvatore Mele (Salvatore.Mele@cern.ch) is leader of the Open Access section at the Scientific Information Service of CERN, Geneva, Switzerland. AUTOMATIC EXTRACTION OF FIGURES FROM SCIENTIFIC PUBLICATIONS IN HIGH-ENERGY PHYSICS | PRACZYK, NOGUERAS-ISO, AND MELE 26 point of interest. Scholarly publications in HEP tend to contain highly specific types of figures (as any type of graphical content illustrating the text and referenced from it). In particular, they contain a high volume of plots, which are line-art images illustrating a dependency of a certain quality on a parameter. The graphical content of scholarly publications allows much more efficient access to the most important results presented in a publication.2,3 The human brain perceives the graphical content much faster than reading an equivalent block of text. Presenting figures with the publication summary when displaying search results would allow more accurate assessment of the article content and in turn lead to a better use of researchers’ time. Enabling users to search for figures describing similar quantities or phenomena could become a very powerful tool for finding publications describing similar results. Combined with additional metadata, it could provide knowledge about evolution of certain measurements or ideas over time. These and many more applications created an incentive to research possible ways to integrate figures in INSPIRE. INSPIRE is a digital library for HEP,4 the application field of this work. It provides a large-scale digital library service (1 million records, fifty-thousand users), which is starting to explore new mechanisms of using figures in articles of the field to index, retrieve, and present information.5,6 As a first step, direct access to graphical content before accessing the text of a publication can be provided. Second, a description of graphics (“blue-band plot,” “the yellow shape region”) could be used in addition to metadata or full-text queries to retrieve a piece of information. Finally, articles could be aggregated into clusters containing the same or similar plots in a possible alternative automated answer to a standing issue in information management. The indispensable step to realize this vision is an automated, resilient, and high-efficiency extraction of figures from scientific publications. In this paper, we present an approach that we have developed to address this challenge. The focus has been put on developing a general method allowing the extraction of data from documents stored in Portable Document Format (PDF). The results of the algorithm consist of metadata, raster images of a figure, but also vector graphics, which allows easier further processing. The PDF format has been chosen as the input of the algorithm because it is a de facto standard in scientific communication. In the case of HEP, mathematics, and other exact sciences, the majority of publications are prepared using the Latex document formatting system and later compiled into a PDF file. The electronic versions of publications from outstanding scientific journals are also provided in PDF. The internal structure of PDF files does not always reveal the location of graphics. In some cases, images are included as external entities and easily distinguishable from the rest of a document’s content, but other times they are mixed with the rest of the content. Therefore, to miss any figures, the low-level structure of a PDF had to be analyzed. The work described in this paper focuses on the area of HEP. However, with minor variations, the described methods could be applicable to a different area of knowledge. INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2013 27 RELATED WORK Over years of development of digital libraries and document processing, researchers came up with several methods of automatically extracting and processing graphics appearing in PDF documents. Based on properties of the processed content, these methods can be divided into two groups. The attempts of the first category deal with PDF documents in general, not making any assumptions about the content of encoded graphics or document type. The methods from the second group are more specific to figures from scientific publications. Our approach belongs to the second group. Tools include command line programs like PDF-Images (http://sourceforge.net/projects/pdf- images/) or web-based applications like PDF to Word (http://www.pdftoword.com/). These solutions are useful for general documents, but all suffer from the same difficulties when processing scientific publications: graphics that are recognized by such tools have to be marked as graphics inside PDF documents. This is the case with raster graphics and some other internally stored objects. In the case of scholarly documents, most graphics are constructed internally using PDF primitives and thus cannot be correctly processed by tools from the first group. Moreover, general tools do not have the necessary knowledge to produce metadata describing the extracted content. With respect to specific tools for scientific publications it must be noted first that important scientific publishers like Springer or Elsevier have created services to allow access to figures present in scientific publications: the improvement of the SciVerse Science Direct site (http://www.sciencedirect.com) for searching images in the case of Elsevier7 and the SpringerImages service (http://www.springerimages.com/) in the case of Springer.8 These services allow searches triggered from a text box, where the user can introduce a description of the required content. It is also possible to browse images by categories such as types of graphics (image, table, line art, video, etc.). The search engines are limited to searches based on figure captions. In this sense, there is little difference between the image search and text search implemented in a typical digital library. Most of existing works aiming at the retrieval and analysis of figures use the rasterized graphical representation of source documents as its basis. Browuer et al. and Kataria et al. describe a method of detecting plots by means of wavelet analysis.9,10 They focus on the extraction of data points from identified figures. In particular, they address the challenge of correctly identifying overlapping points of data in plots. This problem would not manifest itself often in the case of vector graphics, which is the scenario proposed in our extraction method. Vector graphics preserve much more information about the documents content than simple values of pixel colours. In particular, vector graphics describe overlapping objects separately. Raster methods are also much more prone to additional errors being introduced during the recognition/extraction phase. The methods described in this paper could be used with Kataria’s method for documents resulting from a digitization process.11 http://sourceforge.net/projects/pdf-images/) http://sourceforge.net/projects/pdf-images/) http://www.pdftoword.com/). http://www.sciencedirect.com/ http://www.springerimages.com/ AUTOMATIC EXTRACTION OF FIGURES FROM SCIENTIFIC PUBLICATIONS IN HIGH-ENERGY PHYSICS | PRACZYK, NOGUERAS-ISO, AND MELE 28 Liu et al. present a page box-cutting algorithm for the extraction of tables from PDF documents.12 Their approach is not directly applicable, but their ideas of geometrical clustering of PDF primitives are similar to the ones proposed in our work. However, our experiments with their implementation and HEP publications have shown that the heuristics used in their work cannot be directly applied to HEP, showing the need for an adapted approach, even in the case of tables. A different category of work, not directly related to graphics extraction but useful when designing algorithms, has been devoted to the analysis of graph use in scientific publications. The results presented by Cleveland describe a more general case than HEP publications.13 Even if the data presented in the work came from scientific publications before 1984, included observations—for example, typical sizes of graphs—were useful with respect to general properties of figures and were taken into account when adjusting parameters of the presented algorithm. Finally, there exist attempts to extract layout information from PDF documents. The knowledge of page layout is useful to distinguish independent parts of the content. The approach of layout and content extraction presented by Chao and Fan is the closest to the one we propose in this paper.14 The difference lies in the fact that we are focusing on the extraction of plots and figures from scientific documents, which usually follow stricter conventions. Therefore we can make more assumptions about their content and extract more precise data. For instance, our method emphasizes the role of detected captions and permits them to modify the way in which graphics are treated. We also extract portions of information that are difficult to be extracted using more general methods, such as captions of figures. METHOD PDF files have a complex internal structure allowing them to embed various external objects and to include various types of metadata. However, the central part of every PDF file consists of a visual description of the subsequent pages. The imaging model of PDF uses a language based on a subset of the PostScript language. PostScript is a complete programming language containing instructions (also called operators) allowing the rendering of text and images on a virtual canvas. The canvas can correspond to a computer screen or to another, possibly virtual, device used to visualize the file. The subset of PostScript, which was used to describe content of PDFs, had been stripped from all the flow control operations (like loops and conditional executions), which makes it much simpler to interpret than the original PostScript. Additionally, the state of the renderer is not preserved between subsequent pages, making their interpretation independent. To avoid many technical details, which are irrelevant in this context, we will consider a PDF document as a sequence of operators (also called the content stream). Every operator can trigger a modification of the graphical state of the PDF interpreter, which might be drawing a graphical primitive, rendering an external attached object, or modifying a position of the graphical pointer15 or a transformation matrix.16 The outcome of an atomic operation encoded in the content stream depends not only on parameters of the operation, but also on the way previous operators modified INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2013 29 the state of the interpreter. Such a design makes a PDF file easy to render but not necessarily easy to analyze. Figure 1 provides an overview of the proposed extraction method. At the very first stage, the document is pre-processed and operators are extracted (see “Pre-processing of Operators” below). Later, graphical17 and textual18 operators are clustered using different criteria (see “Inclusion of Text Parts” and “Detection and Matching of Captions” below), and the first round of heuristics rejects regions that cannot be considered figures. In the next phase, the clusters of graphical operators are merged with text operators representing fragments of text to be included inside a figure (see “Inclusion of Text Parts” below). The second round of heuristics detects clusters that are unlikely to be figures. Text areas detected by the means of clustering text operations are searched for possible figure captions (see “Detection and Matching of Captions” below). Captions are matched with corresponding figure candidates, and geometrical properties of captions are used to refine the detected graphics. The last step generates data in a format convenient for further processing (see “Generation of the Output” below). Figure 1. Overview of the figure extraction method. Additionally, it must be noted that another important pre-processing step of the method consists of the layout detection. An algorithm for segmenting pages into layout elements called page divisions is presented later in the paper. This considerably improves the accuracy of the extraction method because elements from different page divisions can no longer be considered to belong to the same cluster (and subsequently figure). This allows the method to be applied separately to different columns of a document page. AUTOMATIC EXTRACTION OF FIGURES FROM SCIENTIFIC PUBLICATIONS IN HIGH-ENERGY PHYSICS | PRACZYK, NOGUERAS-ISO, AND MELE 30 Pre-processing of Operators The proposed algorithm considers only certain properties of a PDF operator rather than trying to completely understand its effect. Considered properties consist of the operators’ type, the region of the page where the operator produces output and, in the case of textual operations, the string representation of the result. For simplicity, we suppress the notion of coordinate system transformation, inherent for the PDF rendering, and describe all operators in a single coordinate system of a virtual 2-dimensional canvas where operations take effect. Transformation operators19 are assigned an empty operation region as they do not modify the result directly but affect subsequent operations. In our implementation, an existing PDF rendering library has been used to determine boundaries of operators. Rather than trying to understand all possible types of operators, we check the area of the canvas that has been affected by an operation. If the area is empty, we consider the operation to be a transformation. If there exists a non-empty area that has been changed, we check if the operator belongs to a maintained list of textual operators. This list is created based on the PDF specification. If so, the operators argument list is scanned searching for a string and the operation is considered to be textual. An operation that is neither a transformation nor a textual operation is considered to be graphical. It might happen that text is generated using a graphical operator. However, such a situation is unusual. In the case of operators triggering the rendering of other operators, which is the case when rendering text using type-3 fonts, we consider only the top-level operation. In most cases, separate operations are not equivalent to logical entities considered by a human reader (such as a paragraph, a figure, or a heading). Graphical operators are usually responsible for displaying lines or curve segments while humans think in terms of illustrations, data lines, etc. Similarly, in the case of text, operators do not have to represent complete or separate words or paragraphs. They usually render parts of words and sometimes parts of more than one word. The only assumption we make about the relation between operators and logical entities is that a single operator does not trigger rendering of elements from different detected entities (figures, captions). This is usually true because logical entities tend to be separated by a modification of the context—there is a distance between text paragraphs or an empty space between curves. Clustering of Graphical Operators The Clustering Algorithm The representation of a document as a stream of rectangles allows the calculation of more abstract elements of the document. In our model, every logical entity of the document is equivalent to a set of operators. The set of all operators of the document is divided into disjoint subsets in the process called clustering. Operators are decided to belong to the same cluster based on the position of their boundaries. The criteria for the clustering are based on a simple but important observation: INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2013 31 operations forming a logical entity have boundaries lying close to each other. Groups of operations forming different entities are separated by empty spaces. Algorithm 1. The clustering algorithm. The clustering of textual operations yields text paragraphs and smaller objects like section headings. However, in the case of graphical operations, we can obtain consistent parts of images, but usually not complete figures yet. Outcomes of the clustering are utilized during the process of figures detection. Algorithm 1 shows the pseudo-code of the clustering algorithm. The input of the algorithm consists of a set of pre-processed operators annotated with their affected area. The output is a division of the input set into disjoint clusters. Every cluster is assigned a boundary equal to the smallest rectangle containing boundaries of all included operations. In the first stage of the algorithm (lines 6–20), we organize all input operations in a data structure of forest of trees. Every tree describes a separate cluster of operations. The second stage (lines 21– 29) converts the results (clusters) into a more suitable format. 1: Input: OperationSet input_operations {Set of operators of the same type} 2: Output: Map {Spatial clusters of operators} 3: IntervalTree tx ← IntervalTree() 4: IntervalTree ty ← IntervalTree() 5: Map parent ← Map() 6: for all Operation op ∈ input_operations do 7: Rectangle boundary ← extendByMargins(op.boundary) 8: repeat 9: OperationSet int_opsx ← tx.getIntersectingOps(boundary) 10: OperationSet int_opsy ← ty.getIntersectingOps(boundary) 11: OperationSet int_ops ← int_opsx ∩ int_opsy 12: for all Operation int_op ∈ int_ops do 13: Rectangle bd ← tx[int_op] × ty[int_op] 14: boundary ← smallestEnclosing(bd, boundary) 15: Parent[int_op] ← op 16: tx.remove(int_op); ty.remove(int_op) 17: end for 18: until int_ops = ∅ 19: tx.add(boundary, op); ty.add(boundary, op) 20: end for 21: Map results ← Map() 22: for all Operation op ∈ input_operations do 23: Operation root_ob ← getRoot(parent, op) 24: Rectangle rec ← tx[int_ob] × ty[int_ob] 25: if not results.has_key(rec) then 26: results[rec] ← List() 27: end if 28: results[rec].add(op) 29: end for 30: return results AUTOMATIC EXTRACTION OF FIGURES FROM SCIENTIFIC PUBLICATIONS IN HIGH-ENERGY PHYSICS | PRACZYK, NOGUERAS-ISO, AND MELE 32 The clustering of operations is based on the relation of their rectangles being close to each other. Definition 1 formalizes the notion of being close, making it useful for the algorithm. Definition 1: Two rectangles are considered to be located close to each other if they are intersecting after expanding their boundaries in every direction by a margin. The value by which rectangles should be extended is a parameter of the algorithm and might be different in various situations. To detect if rectangles are close to each other, we needed a data structure allowing the storage a set of rectangles. This data structure was required to allow retrieving all stored rectangles that intersect a given one. We have constructed the necessary structure using an important observation about the operation result areas. In our model all bounding rectangles have their edges parallel to the edges of the reference canvas on which the output of the operators is rendered. This allowed us to reduce our problem from the case of 2-dimensional rectangles to the case of 1-dimensional intervals. We can assume that edges of the rectangular canvas define the coordinates system. It is easy to prove that two rectangles of edges parallel to the axis of the coordinates system intersect only if both their projections in the directions of axis intersect. The projection of a rectangle into an axis is always an interval. The observation made above has allowed us to build the required 2-dimensional data structure by remembering two 1-dimensional data structures that recall a number of intervals and for a given interval return the set of intersecting ones. Such a 1-dimensional data structure has been provided by interval-trees.20 Every interval inside the tree has an arbitrary object assigned to it, which in this case is a representation of the PDF operator. This object can be treated as an identifier of the interval. The data structure also implements a dictionary interface, mapping objects to actual intervals. At the beginning, the algorithm initializes two empty interval trees representing projections on the X and Y axes, respectively. Those trees store values about projections of the biggest so-far calculated areas rather than about particular operators. Each cluster is represented by the most recently discovered operation belonging to it. During the algorithm execution, each operator from the input set is considered only once. The order of processing is not important. The processing of a single operator proceeds as follows (the interior of the outermost “for all” loop of the algorithm). 1. The boundary of the operation is extended by the width of margins. The spatial data structure described earlier is utilized to retrieve boundaries of all already detected clusters (lines 9–10) 2. The forest of trees representing clusters is updated. The currently processed operation is added without a parent. Roots of all trees representing intersecting clusters (retrieved in previous step) are attached as children of the new operation. INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2013 33 3. The boundary of the processed operation is extended to become the smallest rectangle containing all boundaries of intersecting clusters and the original boundary. Finally, all intersecting clusters are removed from the spatial data structure. 4. Lines 9–17 of the algorithm are repeated as long as there exist areas intersecting the current boundary. In some special cases, more than one iteration may be necessary. 5. Finally, the calculated boundary is inserted into the spatial data structure as a boundary of a new cluster. The currently processed operation is designed to represent the cluster and so is remembered as a representation of the cluster. After processing all available operations, the post–processing phase begins. All the trees are transformed into lists. The resulting data structure is a dictionary having boundaries of detected clusters as keys and lists of belonging operations as values. This is achieved in lines 21–29. During the process of retrieving the cluster to which a given operation belongs, we use a technique called path compression, known from the union-find data structure.21 Filtering of Clusters Graphical areas detected by a simple clustering usually do not directly correspond to figures. The main reason for this is that figures may contain not only graphics, but also portions of text. Moreover, not all graphics present in the document must be part of a figure. For instance, common graphical elements not belonging to a figure include logos of institutions and text separators like lines and boxes; various parts of mathematical formulas usually include graphical operations; and in the case of slides from presentations, the graphical layout should not be considered part of a figure. The above shows that the clustering algorithm described earlier is not sufficient for the purpose of figures detection and it yields a results set wider than expected. In order to take into account the aforementioned characteristics, pre-calculated graphical areas are subject to further refinement. This part of the processing is highly domain-dependent as it is based on properties of scientific publications in a particular domain, in this case publications of HEP. In the course of the refinement process, previously computed clusters can be completely discarded, extended with new elements, or some of their parts might be removed. In this subsection we discuss the heuristics applied for rejecting and splitting clusters of graphical operators. There are two main reasons for rejecting a cluster. The first of them is a size being too small compared to a page size. The second is the figure candidate having its aspect ratio outside a desired interval of values. The first heuristic is designed to remove small graphical elements appearing for example inside mathematical formulas, but also small logos and other decorations. The second one discards text separators and different parts of mathematical equations, such as a line-separating numerator from a denominator inside a fraction. The thresholds used for filtering are provided as AUTOMATIC EXTRACTION OF FIGURES FROM SCIENTIFIC PUBLICATIONS IN HIGH-ENERGY PHYSICS | PRACZYK, NOGUERAS-ISO, AND MELE 34 configurable properties of the algorithm and their values are assigned experimentally in a way maximising the accuracy of figures detection. Additionally, the analysis of the order of operations forming the content stream of a PDF document may help to split clusters that were incorrectly joined by Algorithm 1. Parts of the stream corresponding to logical parts of the document usually form a consistent subsequence. This observation allows the construction of a method of splitting elements incorrectly clustered together. We can assign content streams not only to entire PDF documents or pages, but also to every cluster of operations. The clustering algorithm presented in Algorithm 1 returns a set of areas with a list of operations assigned to each of them. The content stream of a cluster consists of all operations from such a set ordered in the same manner as in the original content stream of the PDF document. The usage of the original content stream allows us to define a distance in the content stream as follows: Definition 2. If o 1 and o 2 are two operations appearing in the content stream of the PDF document, by the distance between these operations we understand the number of textual and graphical operations appearing after the first of them and before the second of them. To detect situations when a figure candidate contains unnecessary parts, the content stream of a figure candidate is read from the first to the last operation. For every two subsequent operations, the distance between them in the sense of the original content stream is calculated. If the value is larger than a given threshold, the content stream is split into two parts, which become separate figure candidates. For both candidates, a new boundary is calculated. This heuristic is especially important in the case of less formal publications such as slides from presentations at conferences. Presentation slides tend to have a certain number of graphics appearing on every page and not carrying any meaning. Simple geometrical clustering would connect elements of page style with the rest of the document content. Measuring the distance in the content stream and defining a threshold on the distance facilitates the distinction between the layout and the rest of the page. This technique also might be useful to automatically extract the template used for a presentation, although this transcends the scope of this publication. Clustering of Textual Operators The same algorithm that clusters graphical elements can cluster parts of text. Detecting larger logically consistent parts of text is important because they should be treated as single entities during subsequent processing. This comprises, for example, inclusion inside a figure candidate (e.g., captions of axes, parts of a legend) and classification of a text paragraph as a figure caption. Inclusion of Text Parts The next step in figures extraction involves the inclusion of lost text parts inside figure candidates. INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2013 35 At the stage of operations clustering, only the operations of the same type (graphical or textual) were considered. The results of those initial steps become the input to the clustering algorithm that will detect relations between previously detected entities. By doing this, we move one level farther in the process of abstracting from operations. We start from basic meaningless operations. Later we detect parts of graphics and text, and finally we are able to see the relations between both. Not all clusters detected at this stage are interesting because some might consist uniquely of text areas. Only those results that include at least one graphical cluster may be subsequently considered figure candidates. Another round of heuristics marks unnecessary intermediate results as deleted. Applied methods are very similar to those described in “Filtering of Clusters” (above), only thresholds deciding on the rejections must change because we operate on geometrically much larger entities. Also the way of application is different—candidates rejected at this stage can be later restored to the status of a figure. Instead of permanently removing, heuristics of this stage only mark figure candidates as rejected. This happens in the case of the candidates having incorrect aspect ratio, incorrect sizes or consisting only of horizontal lines (which is usually the case with mathematical formulas but also tables). In addition to using the aforementioned heuristics, having clusters consisting of a mixture of textual and graphical operations allows the application of new heuristics. During the next phase, we analyze the type of operations rather than their relative location. In some cases, steps described earlier might detect objects that should not be considered a figure, such as text surrounded by a frame. This situation can be recognized by the calculation of a ratio between the number of graphical and textual operations in the content stream of a figure candidate. In our approach we have defined a threshold that indicates which figure candidates should be rejected because they contain too few graphics. This allows the removal of, for instance, blocks of text decorated with graphics for aesthetic reasons. The ratio between numbers of graphical and textual operations is smaller for tables than for figures, so extending the heuristic with an additional threshold could improve the table–figure distinction. Another heuristic analyzes ratio between the total area of graphical operations and the area of the entire figure candidate. Subsequently, we mark as deleted the figure candidates containing horizontal lines as the only graphical operations. These candidates describe tables or mathematical formulas that have survived previous steps of the algorithm. Tables can be reverted to the status of figure candidates in later stages of processing. Figure candidates that survive all the phases of filtering are finally considered to be figures. Figure 2 shows a fragment of a publication page with indicated text areas and final figure candidates detected by the algorithm. AUTOMATIC EXTRACTION OF FIGURES FROM SCIENTIFIC PUBLICATIONS IN HIGH-ENERGY PHYSICS | PRACZYK, NOGUERAS-ISO, AND MELE 36 Figure 2. A fragment of the PDF page with boxes around every detected text area and each figure candidate. Dashed rectangles indicate figure candidates. Solid rectangles indicate text areas. Detection and Matching of Captions The input of the part of the algorithm responsible for detecting figure captions consists of previously determined figures and all text clusters. The observation of scientific publications shows that, typically, captions of figures start with a figure identifier (for instance see the grammar for figure captions proposed by Bathia, Lahiri, and Mitra.22 The identifier usually starts with a word describing a figure type and is followed by a number or some other unique identifier. In more complex documents, the figure number might have a hierarchical structure reflecting, for example, the chapter number. The set of possible figure types is very limited. In the case of HEP publications, the most usual combinations include words “figure”, “plot,” and different variations of their spelling and abbreviating. INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2013 37 During the first step of the caption detection, all text clusters from the publication page are tested for the possibility of being a caption. This consists of matching the beginning of the text contained in a textual cluster with a regular expression determining what is a figure caption. The role of the regular expression is to elect strings starting with one of the predefined words, followed by an identifier or beginning of a sentence. The identifier is subsequently extracted and included in the metadata of a caption. The caption detection has to be designed to reject paragraphs of the type “Figure 1 presents results of (. . .)”. To achieve this, we reject the possibility of having any lowercase text after the figure identifier. Having the set of all the captions, we start searching for corresponding figures. All previous steps of the algorithm take into account the division of a page into text columns (see “Detection of the Page Layout” below). When matching captions with figure candidates, we do not take into account the page layout. Matching between figure candidates and captions happens at every document page separately. We consider every detected caption once, starting with those located at the top of the page and moving down toward the end. For every caption we search figure candidates lying nearby. First we search above the caption and, in the case of failure, we move below the caption. We take into account all figure candidates, including those rejected by heuristics. In the case of finding multiple figure candidates corresponding to a caption, we merge them into a single figure, treating previous candidates as subfigures of a larger figure. We also include small portions of text and graphics previously rejected from figure candidates that lie between figure and caption and between different parts of a figure. These parts of text usually contain identifiers of the subfigures. The amount of unclustered content that can be included in a figure is a parameter of the extraction algorithm and is expressed as a percentage of the height of the document page. It might happen that captions are located in a completely different location, but this case is rare and tends to appear in older publications. The distance from the figure is calculated based on the page geometry. The captions should not be too distant from the figure. Generation of the Output The choice of the format in which data should be saved at the output of the extraction process should take into account further requirements. The most obvious use case of displaying figures to end users in response to text-based search queries does not yield very sophisticated constraints. A simple raster graphic annotated with captions and possibly some extracted portions of metadata would be sufficient. Unfortunately, the process of generating raster representations of figures might lose many important pieces of information that could be used in the future for an automatic analysis. AUTOMATIC EXTRACTION OF FIGURES FROM SCIENTIFIC PUBLICATIONS IN HIGH-ENERGY PHYSICS | PRACZYK, NOGUERAS-ISO, AND MELE 38 To store as much data as possible, apart from storing the extracted figures in a raster format (e.g., PNG), we also decided to preserve their original vector character. Vector graphics formats, similarly to PDF documents, contain information about graphical primitives. Primitives can be organized in larger logical entities. Sometimes rendering of different primitives leads to a modification of the same pixel of resulting image. Such a situation might happen, for example, when circles are used to draw data points lying nearby on the same plot. To avoid such issues, we convert figures into scalable vector graphics (SVG) format.23 On the implementation level, the extraction of vector representation of a figure proceeds in a manner similar to regular rendering of a PDF document. The interpreter preserves the same elements of the state and allows their modification by transformation operations. A virtual canvas is created for every detected figure. The content stream of the document is processed and all the transformation operations are executed modifying the interpreter’s state. The textual and graphical operators are also interpreted, but they affect only the appropriate canvas of the figure to which the operation belongs. If a particular operation does not belong to any figure, no canvas is affected. The behaviour of graphical canvases used during the SVG generation is different from the case of raster rendering. Instead of creating graphical output, every operation is transformed into a corresponding primitive and saved within an SVG file. PDF was designed in such a manner that the number of external dependencies of a file is minimized. This design decision led to the inclusion of the majority of fonts in the document itself. It would be possible to embed font glyphs in the SVG file and use them to render strings. However, for the sake of simplicity, we decided to omit font definitions in the SVG output. A text representation is extracted from every text operation, and the operation is replaced by a SVG text primitive with a standard font value. This simplification affects what the output looks like, but the amount of formatting information that is lost is minimal. Moreover, this does not pose a problem because vector representations are intended to be used during automatic analysis of figures rather than for displaying purposes. A possible extension of the presented method could involve embedding complete information about used glyphs. Finally, the generation of the output is completed with some metadata elements. An exhaustive categorization of the metadata that can be compiled for figures could be the customization of the one proposed by Liu et al. for table metadata.24 In the case of figures, the following categories could be distinguished: (1) environment/geography metadata (information of the document where the figure is located); (2) affiliated metadata (e.g., captions, references, or footnotes); (3) layout metadata (information about the original visualization of the figure); (4) content data; and (5) figure type metadata. For the moment, we compile only environment/geography metadata and affiliated metadata. The geography/environment metadata consists of the document title, the document authors, the document date (creation and publication), and the exact location of a figure inside a publication INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2013 39 (page and boundary). Most of these elements are provided by simply referencing the original publication in the INSPIRE repository. The affiliated metadata consists of the text caption and the exact location of the caption in the publication (page and boundary). In the future, metadata from other categories will be annotated for each figure. Detection of the Page Layout Figure 3. Sample page layouts that might appear in a scientific publication. The black color indicates areas where content is present. In this section we discuss how to detect the page layout, an issue which has been omitted in the main description of the extraction algorithm, but which is essential for an efficient detection of figures. Figure 3 depicts several possibilities of organising content on the page. As mentioned in previous sections, the method of clustering operations based on their geometrical position may fail in the case of documents having a complex page layout. The content appearing in different columns should never be considered belonging to the same figure. This cannot be assured without enforcing additional constrains during the clustering phase. To address this difficulty, we enhanced the figure extractor with a pre-processing phase of detecting the page layout. Being able to identify how the document page is divided into columns enables us to execute the clustering within every column separately. It is intuitively obvious, what can be understood as a page layout, although to provide a method of calculating such, we need a more formal definition, which we provide below. By the layout of a page, we understand a particular division of a page into areas called columns. Each area is a sum of disjoint rectangles. The division of a page into areas must satisfy a set of conditions summarized in definition 3. AUTOMATIC EXTRACTION OF FIGURES FROM SCIENTIFIC PUBLICATIONS IN HIGH-ENERGY PHYSICS | PRACZYK, NOGUERAS-ISO, AND MELE 40 Definition 3: Let P be a rectangle representing the page. The set D containing subareas of a page is called a page division if and only if � 𝑄 = 𝑃 𝑄∈𝐷 ∀𝑥,𝑦∈𝐷𝑥 ∩ 𝑦 = ∅ ∀𝑄∈𝐷𝑄 ≠ ∅ ∀𝑄∈𝐷∃𝑅=�𝑥:𝑥 𝑖𝑠 𝑎 𝑟𝑒𝑐𝑡𝑎𝑛𝑔𝑙𝑒,∀𝑦∈R\{x} 𝑦∩x=∅ �𝑄 = � 𝑥𝑥∈𝑅 Every element of a division is called a page area. To be considered a page layout, borders of areas from the division must not intersect the content of the page. Definition 3 does not guarantee that the layout is unique. A single page might be assigned different divisions satisfying the definition. Additionally, not all valid page layouts are interesting from the point of view of figures detection. The segmentation algorithm calculates one of such divisions, imposing additional constraints on the detected areas. The layout-calculation procedure utilizes the notion of separators, introduced by definition 4. Definition 4: A vertical (or horizontal) line inside a page or on its borders is called a separator if its horizontal (vertical) distance from the page content is larger than a given constant value. The algorithm consists of two stages. First, the vertical separators of a sufficient length are detected and used to divide the page into disjoint rectangular areas. Each area is delimited by two vertical lines, each of which forms a consistent interval inside of one of the detected vertical separators. At this stage, horizontal separators are completely ignored. Figure 4 shows a fragment of a publication page processed by the first stage of the layout-detection. The upper horizontal edge of one of the areas lies too close too close to two text lines. With the constant of the definition 4 chosen to be sufficiently large, this edge would not be a horizontal separator and thus the generated division of the page would require additional processing to become a valid page layout. The second stage of the algorithm transforms the previously detected rectangles into a valid page layout by splitting rectangles into smaller parts and by joining appropriate rectangles to form a single area. INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2013 41 Figure 4. Example of intermediate layout-detection results requiring the refinement. Algorithm 2 shows the pseudo-code of the detection of vertical separators. The input of the algorithm consists of the image of the publication page. The output is a list of vertical separators aggregated by their x-coordinates. Every element of this list consists of two elements: an integer indicating the x-coordinate and the list of y-coordinates describing the separators. The first element of this list indicates the y-coordinate of the beginning of the first separator. The second element is the y-coordinate of the end of the same separator. The third and fourth elements describe the second separator and the same mechanism is used for the remaining separators (if they exist). The algorithm proceeds according to the sweeping principle known from the computational geometry.25 The algorithm reads the publication page starting from the left. For every x- coordinate value, a set of corresponding vertical separators is detected (lines 9–18). Vertical separators are searched as consistent sequences of blank points. A point is considered blank if all the points in its horizontal surrounding of the radius defined by the constant from definition 5 are of the background colour. Not all blank vertical lines can be considered separators. Short, empty spaces usually delimit lines of text or different small units of the content. In line 11 we test detected vertical separators for being long enough. AUTOMATIC EXTRACTION OF FIGURES FROM SCIENTIFIC PUBLICATIONS IN HIGH-ENERGY PHYSICS | PRACZYK, NOGUERAS-ISO, AND MELE 42 If a separator has been detected in a particular column of a publication page, the adjacent columns also tend to contain similar separators. Lines 19–31 of the algorithm are responsible for electing the longest candidate among the adjacent columns of the page. The maximization is performed across a set of adjacent columns for which at least one separator exists. Algorithm 2. Detecting vertical separators. The detected separators are used to create the preliminary division of the page, similar to the one from the example of figure 4. As with the previous step, separators are considered one by one in the order of increasing x coordinate. At every moment of the execution, the algorithm maintains a division of the page into rectangles. This division corresponds only to the already detected vertical separators. Updating the previously considered division is facilitated by processing separators in a particular well-defined order. 1: Input: the page image 2: Output: vertical separators of the input page 3: List> separators ← ∅ 4: int max_weight ← 0; 5: boolean maximizing ← false 6: for all x ∈ {minx … maxx} do 7: emptyb ← 0, current_eval ← 0 8: empty_areas ← List() 9: for all y ∈ {0 … page_height} do 10: if point at (x, y) is not blank then 11: if y – emptyb – 1 > heightmin then 12: empty_areas.append(emptyb) 13: empty_areas.append(y = page_height? y : y-1) 14: current_eval ← current_eval + y - emptyb 15: end if 16: emptyb ← y + 1 17: end if 18: end for {We have already processed the entire column. Now we are comparing with adjacent already processed columns} 19: if max_weight < current_eval then 20: max_weight ← current_eval 21: max_separators ← empty_areas 22: maxx ← x 23: end if 24: if maximising then 25: if empty_areas = ∅ then 26: separators.add() 27: maximising ← false, max_weight ← 0 28: end if 29: else 30: maximising ← (empty_areas ≠ ∅) 31: end if 32: end for 33: return separators INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2013 43 Before presenting the final outcome, the algorithm must refine the previously calculated division. This happens in the second phase of the execution. All the horizontal borders of the division are then moved along adjacent vertical separators until they become horizontal separators in the sense of definition 4. Typically, moving the horizontal borders result in dividing already existing rectangles into smaller ones. If such a situation happens, both newly created parts are assigned to different page layout areas. Sometimes when moving separators is not possible, different areas are combined together, forming a larger one. Tuning and Testing The extraction algorithm described here has been implemented in Java and tested on a random set of scientific articles coming from the Inspire repository. The testing procedure has been used to evaluate the quality of the method, but also allowed to tweak the parameters of the algorithm to maximize the outcomes. Preparation of the Testing Set To prepare the testing set, we randomly selected 207 documents stored in INSPIRE. In total, these documents consisted of 37,28 pages which contained 1,697 figures altogether. The records have been selected according to a uniform probability distribution across the entire record space. This way, we have created a collection that is representative for the entire INSPIRE including historical entries. Currently, INSPIRE consists of: 1,140 records describing publications written before 1950; 4,695 between 1950 and 1960; 32,379 between 1960 and 1970; 108,525 between 1970 and 1980; 167,240 between 1980 and 1990; 251,133 between 1990 and 2000; and 333,864 in the first decade of the twenty-first century. In total, up to July 2012, INSPIRE manages 952,026 records. It can be seen that the rate of growth has increased with time and most of INSPIRE documents come from the last decade. The results on such a testing set should accurately estimate the efficiency of extraction for existing documents but not necessarily for new documents, being ingested into INSPIRE. This is because INSPIRE contains entries describing old articles which were created using obsolete technologies or scanned and encoded in PDF. The extraction algorithm is optimized for born-digital objects. To test the hypothesis that the extractors provides better results for newer papers, the testing set has been split into several subsets. The first set consists of publications published before 1980. The rest of the testing set has been split into subsets corresponding to decades of publication. To simplify the counting of correct figure detections and to provide a more reliable execution and measurement environment, every testing document has been split into many of PDF documents consisting of a single page. Subsequently, every single page document has been manually annotated with the number of figures appearing inside. AUTOMATIC EXTRACTION OF FIGURES FROM SCIENTIFIC PUBLICATIONS IN HIGH-ENERGY PHYSICS | PRACZYK, NOGUERAS-ISO, AND MELE 44 Execution of the Tests The efficient execution of the testing was possible thanks to a special script executing the plots extractor on every single page separately and then computing the total number of successes and failures. The script allows the execution of tests in a distributed heterogeneous environment and allows dynamic connection and disconnection of computing nodes. In the case of a software failure, the extraction request is resubmitted to a different computation node, allowing the avoidance problems related to a worker node configuration rather than to the algorithm implementation itself. During the preparation of the testing set, we manually annotated all the expected extraction results. Subsequently, the script compared these metadata with the output of the extractor. Using aggregated numbers from all extracted pages allowed us to calculate efficiency measures of the extraction algorithm. As quality measures, we used recall and precision.26 Their definitions are included in the following equations: At every place where we needed a single comparable quality measure rather than two semi- independent numbers, we have used a harmonic average of the precision and the recall.27 Table 1 summarizes the results obtained during the test execution for every subset of our testing set. Figure 5 shows the dependency of recall and precision on the time of publication. The extractor parameters used in this test execution were chosen based on intuition and small number of manually triggered trials. In the next section we describe an automatic tuning procedure we have used to find the most optimal algorithm arguments. testsettheinpresentfigures figuresextractedcorrectly recall # # = figuresextracted figuresextractedcorrectly precision # # = INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2013 45 –1980 1980–90 1990–2000 2000–10 2010–12 Number of existent figures 114 60 170 783 570 Number of correctly detected figures 59 53 164 703 489 Number of incorrectly detected figures 26 78 65 40 73 Total number of pages 85 136 760 1919 828 Number of correctly processed pages 20 44 712 1816 743 Table 1. Results of the test execution. Figure 5. Recall and precision as functions of decade of the date of the publication. It can be seen that, as expected, the efficiency increases with the increasing time of publication. A total recall and precision for all samples since 1990, which constitutes a majority of the INSPIRE corpus, were both 88 percent. Precision and recall based on the correctly detected figures do not give a full image of the algorithm efficiency because the extraction has been executed on a number of pages not containing any figures. The correctly extracted pages not having any figures do not appear in the recall and precision statistics because in their case the expected and detected number of figures are both equal to 0. AUTOMATIC EXTRACTION OF FIGURES FROM SCIENTIFIC PUBLICATIONS IN HIGH-ENERGY PHYSICS | PRACZYK, NOGUERAS-ISO, AND MELE 46 Besides recall and precision, figure 5 depicts also the fraction of pages that have been extracted correctly. Taking into account the samples since 1990, 3,271 pages out of 3,507 have been detected completely correctly, which makes 93 percent success rate counted by number of pages. As it can be seen, this measure is higher than both the precision and the recall. The analysis of the extractor results in the case of failure shows that in many cases, even if results are not completely correct, they are not far from the expectation. There are different reasons of the algorithm failing. Some of them may result from non-optimal choice of algorithm parameters, others from document layout being too far from the assumed one. In some rare cases, even manual inspection of the document does not allow an obvious identification of figures. The Automatic Tuning of Parameters In previous section we have shown the results obtained by executing the extraction algorithm on a sample set. During this execution we were using extractor arguments which seemed to be the most correct based on our observation but also on other research (typical sizes of figures, margin sizes, etc.).28 This way of algorithm configuration was useful during the development, but is not likely to yield the best possible results. To find better parameters, we have implemented a method of automatic tuning. Metrics described in the previous section provided a good method of measuring the efficiency of the algorithm running based on given parameters. The choice of optimal parameters can be relative to the choice of documents on which the extraction is to be performed. The way in which the testing set has been selected, allowed us to use it as representative for the HEP publications. To tune the algorithm, we have used a described subset of testing set from the previous step as a reference. The subset consisted of all entries created after 1990. This allowed us to minimize the presence of scanned documents which, by design, cannot be correctly processed by our method. The adjustment of parameters has been performed by a dedicated script which has executed the extraction using various parameter values and has read results. The script has been configured with a list of tuneable parameters together with their type and allowed values range. Additionally, the script had the knowledge of the believed best value, which was the one used in previous testing. To decrease the complexity of training, we have made several assumptions about the parameters. These assumptions are only an approximation of real nature of parameters, but the practice has shown that they are good enough to permit the optimization: • We assume that the precision and recall are continuous with respect to the parameters. This allows us to assume that efficiency of the algorithm for parameter values close to a given one will be close. The optimization has proceeded by sampling the parametric space in a number of points and executing tests using the selected points as parameter values. INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2013 47 Having N parameters to optimize and dividing the space of every parameter into M regions leads to the execution of MN tests. Execution of every test is a timely operation due to the size of the training set. • We assume that parameters are independent from each other. This means that we can divide the problem of finding an optimal solution in the N-dimensional space of N configuration arguments into finding N solutions in 1-dimensional subspaces. Such an assumption seems to be intuitive and considerably reduces the number of necessary tests from O(MN) to O(M⋅N), where M is the number of samples taken from a single dimension. In our tests, the parametric space has been divided into 10 equal intervals in every direction. In addition to checking the extraction quality in those points, we have executed one test for the so-far best argument. In order to increase the level of fine-tuning of the algorithm, each test has been re- executed in the region, where chances of finding a good solution were considered the highest. This consisted of a region centred around the highest result and having a radius of 10 percent of the parameter space. Figure 6 and figure 7 show the dependency of the recall and the precision on an algorithm parameter. The parameter depicted in figure 6 indicates what minimal aspect ratio the figure candidate must have in order to be considered a correct figure. It can be seen that tuning this heuristic increases the efficiency of the extraction. Moreover, the dependency of recall and precision on the parameter is monotonic which is the most compatible with the chosen optimization method. The parameter of figure 7 specifies which fraction of the area of the entire figure candidate has to be occupied by graphical operations. This parameter has a lower influence on the extraction efficiency. Such a situation can happen when more than one heuristic influences the same aspect of the. This is contradictory with the assumption of parameter independence, but we have decided to use the present model for the simplicity. Figure 6. Effect of the minimal aspect ratio on precision and recall. AUTOMATIC EXTRACTION OF FIGURES FROM SCIENTIFIC PUBLICATIONS IN HIGH-ENERGY PHYSICS | PRACZYK, NOGUERAS-ISO, AND MELE 48 Figure 7. Effect on the precision and recall of the area fraction occupied by graphical operations. After executing the optimization algorithm, we have managed to achieve a recall of 94.11 percent and a precision of 96.6 percent, which is a considerable improvement compared to previous results of 88 percent. CONCLUSIONS AND FUTURE WORK This work has presented a method for extracting figures from scientific publications in a machine- readable format, which is the main step toward the development of services enabling access and search of images stored in scientific digital libraries. In recent years, figures have been gaining increasing attention in the digital libraries community. However, little has been done to decipher the semantics of these graphical representations and to bridge the semantic gap between content, which can be understood by machines and this which is managed by digital libraries. Extracting figures and storing them in uniform and machine-readable format constitutes the first step towards the extraction and the description of the internal semantics of figures. Storing semantically described and indexed figures would open completely new possibilities of accessing the data and discovering connections between different types of publishing artefacts and different resources describing related knowledge.29 Our method of detecting fragments of PDF documents that correspond to figures is based on a series of observations of the character of publications. However, tests have shown that additional work is needed to improve the correctness of the detection. Also, the performance should be re- evaluated after we have a large set of correctly annotated figures, confirmed by users of our system. The heuristics used by the algorithm are based on a number of numeric parameters that we have tried to optimize using automatic techniques. The tuning procedure has made several arbitrary assumptions on the nature of the dependency between parameters and extraction results. A future approach to the parameter optimization, requiring much more processing, could INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2013 49 involve the execution of a genetic algorithm that would treat the parameters as gene samples.30 This could potentially allow a discovery of a better parameter set because a smaller set of assumptions would be imposed on the parameters. A vector of algorithm parameters could play the role of a gene and random mutations could be introduced to previously considered and subsequently crossed genes. The evaluation and selection of surviving genes could be performed by the usage of the metrics described previously. Another approach to improving the quality of the tuning could involve extending the present algorithm by a discovery of mutually dependent parameters and usage of special techniques (relaxing the assumptions) to fine-tune in subspaces spanned by these parameters. All of our experiments have been performed using a corpus of publications from HEP. The usage of the extraction algorithm on a different corpus would require tuning the parameters for the specific domain of application. For the area of HEP, we can also consider preparing several sets of execution parameters varying by decade of document publication or by other easy to determine characteristics. Subsequently, we could decide which extraction method to run, based on those metrics. In addition to a better tuning of the existing heuristics, there are improvements that can be made at the level of the algorithm. For example, we could mention extending the process of clustering text parts. In the current implementation, the margins by which textual operations are extended during the clustering process are fixed as algorithm parameters. This approach proved to be robust in most cases. In fact, distances between text lines tend to be different depending on the currently utilized style. Every text portion tends to have one style that dominates. An improved version of the text-clustering algorithm could use local rather than global properties of the content. This would not only allow to correctly handle the entire document written using different text styles, but also help to manage cases of single paragraphs differing from the rest of the content. Another important, not-yet-implemented improvement related to figure metadata is the automatic extraction of figure references from the text content. Important information about figure content might be stored in the surroundings of the place where publication text refers to a figure. Furthermore, the metadata could be extended by the usage of some type of classifier that would assign a graphics type to the extracted result. Currently, we are only distinguishing between tables and figures based on simple heuristics involving number and type of graphical areas and the text inside of the detected caption. In the future, we could detect line-plots from photos, histograms and so on. Such a classifier could be implemented using artificial intelligence techniques such as support vector machines.31 Finally, partial results of the figures extraction algorithm might be useful in performing other PDF analyses: • The usage of clustered text areas could allow a better interpretation and indexing of textual content stored in digital libraries with full-text access. Clusters of text tend to describe AUTOMATIC EXTRACTION OF FIGURES FROM SCIENTIFIC PUBLICATIONS IN HIGH-ENERGY PHYSICS | PRACZYK, NOGUERAS-ISO, AND MELE 50 logical parts like paragraphs, section and chapter titles, etc. A simple extension of the current schema could allow the extraction of predominant formatting style of the text encoded in a page area. Text parts written in different styles could be indexed in a different manner giving for instance more importance to segments written with larger font. • We mentioned that the algorithm detects not only figures, but also tables. A heuristic is being used in order to distinguish tables from different types of figures. Our present effort concentrates on correct treatment of figures, but a useful extension could allow extraction of different types of entities. For instance, another common type of content ubiquitous in HEP documents are mathematical formulas. Thus, in addition to figures, it would be important to extract tables and formulas in structured format allowing a further processing. The internal architecture of the implemented prototype of the figure extractor allows easy implementation of extension modules which can compute other properties of PDF documents. ACKNOWLEDGEMENTS This work has been partially supported by CERN, and the Spanish Government through the project TIN2012-37826-C02-01. REFERENCES 1. Saurabh Kataria, “On Utilization of Information Extracted From Graph Images in Digital Documents,” Bulletin of IEEE Technical Comittee on Digital Libraries 4, no. 2 (2008), http://www.ieee-tcdl.org/Bulletin/v4n2/kataria/kataria.html. 2. Marti A. Hearst et al., “Exploring the Efficacy of Caption Search for Bioscene Journal Search interfaces,” Proceedings of the Workshop on Bio NLP 2007: Biological, Translational and Clinical Language Processing: 73–80, http://dl.acm.org/citation.cfm?id=1572406. 3. Lisa Johnston, “Web Reviews: See the Science: Scitech Image Databases,” Sci-Tech News 65, no. 3 (2011), http://jdc.jefferson.edu/scitechnews/vol65/iss3/11. 4. Annette Holtkamp et al., “INSPIRE: Realizing the Dream of a Global Digital Library in High- Energy Physics,” 3rd Workshop Conference: Towards a Digital Mathematics Library, Paris, France (July 2010): 83–92. 5. Piotr Praczyk et al., “Integrating Scholarly Publications and Research Data—Preparing for Open Science, a Case Study from High-Energy Physics with Special Emphasis on (Meta)data Models,” Metadata and Semantics Research—CCIS 343 (2012): 146–57. 6. Piotr Praczyk et al., “A Storage Model for Supporting Figures and Other Artefacts in Scientific Libraries: the Case Study of Invenio,” 4th Workshop on Very Large Digital Libraries (VLDL 2011), Berlin, Germany (2011). http://www.ieee-tcdl.org/Bulletin/v4n2/kataria/kataria.html http://dl.acm.org/citation.cfm?id=1572406 http://jdc.jefferson.edu/scitechnews/vol65/iss3/11 INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2013 51 7. “SciVerse Science Direct: Image Search,” Elsevier, http://www.info.sciverse.com/sciencedirect/using/searching-linking/image. 8. Guenther Eichhorn, “Trends in Scientific Publishing at Springer,” in Future Professional Communication in Astronomy II (New York: Springer, 2011), doi: 10.1007/978-1-4419-8369- 5_5. 9. William Browuer et al., “Segregating and Extracting Overlapping Data Points in Two- dimensional Plots,” Proceedings of the 8th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL 2008), New York: 276–79. 10. Saurabh Kataria et al., “Automatic Extraction of Data Points and Text Blocks from 2- Dimensional Plots in Digital Documents,” Proceedings of the 23rd AAAI Conference on Artificial Intelligence, (2008) Chicago: 1169–1174. 11. Saurabh Kataria, “On Utilization of Information Extracted From Graph Images in Digital Documents,” Bulletin of IEEE Technical Committee on Digital Libraries 4, no. 2 (2008), http://www.ieee-tcdl.org/Bulletin/v4n2/kataria/kataria.html. 12. Ying Liu et al., “Tableseer: Automatic Table Metadata Extraction and Searching in Digital Libraries,” Proceedings of the 7th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL’07), Vancouver (2007): 91–100. 13. William S. Cleveland, “Graphs in Scientific Publications,” American Statistician, 38, no. 4, (1984): 261–69, doi: 10.1080/00031305.1984.10483223. 14. Hui Chao and Jian Fan, “Layout and Content Extraction for PDF Documents,” Document Analysis Systems VI, Lecture Notes in Computer Science 3163 (2004): 213–24. 15. At every moment of the execution of a PostScript program, the interpreter maintains many variables. Some of them encode current positions within the rendering canvas. Such positions are used to locate the subsequent character or to define the starting point of the subsequent graphical primitive. 16. Transformation matrices are encoded inside the interpreters’ state. If an operator requires arguments indicating coordinates, these matrices are used to translate the provided coordinates to the coordinate system of the canvas. 17. Graphical operators are those that trigger the rendering of a graphical primitive. 18. Textual operations are the PDF instructions that cause the rendering of the text. Textual operations receive the string representation of the desired text and use the current font, which is saved in the interpreters’ state. 19. Operations that do not produce any visible output, but solely modify the interpreters’ state. 20. Herbert Edelsbrunner and Hermann A. Maurer, “On the Intersection of Orthogonal Objects,” Information Processing Letters 13, nos. 4, 5 (1981): 177–81. http://www.info.sciverse.com/sciencedirect/using/searching-linking/image http://www.ieee-tcdl.org/Bulletin/v4n2/kataria/kataria.html AUTOMATIC EXTRACTION OF FIGURES FROM SCIENTIFIC PUBLICATIONS IN HIGH-ENERGY PHYSICS | PRACZYK, NOGUERAS-ISO, AND MELE 52 21. Thomas H. Cormen, Charles E. Leiserson, and Ronald L. Rivest, Introduction to Algorithms, (Cambridge: MIT Electrical Engineering and Computer Science Series, 1990). 22. Sumit Bhatia, Shibamouli Lahiri, and Prasenjit Mitra, “Generating Synopses for Document- Element Search,” Proceedings of the 18th ACM Conference on Information and Knowledge Management, New York (2009): 2003–6, doi: 10.1145/1645953.1646287. 23. Jon Ferraiolo, ed., “Scalable Vector Graphics (SVG) 1.0 Specification,” W3C Recommendation 01 September 2001, http://www.w3.org/TR/SVG10/. 24. Liu et al., “Tableseer.” 25. Cormen, Leiserson, and Rivest, Introduction to Algorithms. 26. Ricardo A. Baeza-Yates and Berthier Ribeiro-Neto, Modern Information Retrieval,” (Boston: Addison-Wesley, 1999). 27. Ibid. 28. Cleveland, “Graphs in Scientific Publications.” 29. Praczyk et al., “A Storage Model for Supporting Figures and Other Artefacts in Scientific Libraries.” 30. Stuart Russell and Peter Norvig, Artificial Intelligence: A Modern Approach (Third Edition) (Prentice Hall, 2009). 31. Sergios Theodoridis and Konstantinos Koutroumbas, Pattern Recognition (Third Edition) (Boston, Academic Press, 2006). http://www.w3.org/TR/SVG10/ Pre-processing of Operators Clustering of Graphical Operators The Clustering Algorithm Filtering of Clusters Clustering of Textual Operators Inclusion of Text Parts Detection and Matching of Captions Generation of the Output Detection of the Page Layout Preparation of the Testing Set Execution of the Tests The Automatic Tuning of Parameters 3811 ---- Digital Native Librarians, Technology Skills, and Their Relationship with Technology Jenny Emanuel INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2013 20 INTRODUCTION A new generation of academic librarians, who are a part of the Millennial Generation born between 1982 and 2001,1 are now of the age to either be in graduate school or embarking on their careers. Often referred to as “digital natives” because their generation is believed to have always grown up online and with technology ubiquitous in their daily lives,2 many agree that this generation is poised to revolutionize library services with their technology skills.3 Younger academic librarians believe that their technology knowledge makes them more flexible and assertive in libraries compared to their older colleagues, and they have different ways of completing their work. They refuse to be stereotyped into the traditional “bookish” idea of librarianship and want to transform libraries into technology-enhanced spaces that meet the needs of students in the digital age, redefining librarianship.4 This paper, as part of a larger study examining Millennial academic librarians, their career selection, their attitudes, and their technology skills, looks specifically at the technology skills and attitudes toward technology among a group of young librarians and library school students. The author initially wanted to learn if the increasingly high-tech nature of academic librarianship attracted Millennials to the career, but results showed that they had a much more complex relationship with technology than the author assumed. LITERATURE REVIEW The literature concerning the Millennial Generation focuses on their use of technology in their daily lives. Millennials are using technology to create new ways of doing things, such as creating a digital video for a term project, playing video games instead of traditional board games, and connecting with friends and extended family worldwide through email, instant messaging, and social networking.5 They use technology to create new social and familial networks with friends that are based on the music they listen to, the books they read, the pictures they take, and the products they consume.6 They believe that their relationship with technology will change the way society views and relates to technology.7 With technology at their fingertips on a nearly constant basis, Millennials have gained an expectation of instant gratification for all of their wants and needs.8 Jenny Emanuel (emanuelj@illinois.edu) is Digital Services & Reference Librarian, University of Illinois, Urbana. mailto:emanuelj@illinois.edu DIGITAL NATIVE ACADEMIC LIBRARIANS, TECHNOLOGY SKILLS, AND THEIR RELATIONSHIPS WITH TECHNOLOGY | EMANUEL 21 Millennials believe that technology is not a passive experience, as it was for previous generations.9 To them, technology is active and an experience by which they live their lives.10 They have grown up with reality television, which means anyone can have his or her fifteen minutes of fame. In turn, this means being heard, having their say, and becoming famous online are all natural experiences that can be shared by anyone.11 Because they can create their own customized media and make media consumption an interactive, as opposed to a passive and hierarchical, experience, they believe that everyone’s opinion counts and deserves to be heard.12 Even though they believe they are the greatest generation and expert users of technology, others have a different view. For example, Bauerlein argues that they are not intellectually curious, are anti-library, and blindly accept technology at face value while not understanding the societal implications or context of technology. They also consume technology without understanding how it works.13 Within libraries, technology skills related to new librarians have been studied by Del Bosque and Lampert, who surveyed librarians from a variety of library settings with less than nine years experience working as professional librarians. The survey found the majority (55 percent) understood that technology played a large part of their library education, but a similar percent (57 percent) did not expect to work in a technical position upon graduation. Respondents also thought there was a disconnect between the technology skills taught in library school and what was needed on the job, with job responsibilities being much more technical than they expected. Thus, even though more experienced librarians expected recent graduates to fill highly technical roles, library school did not prepare them for these roles and students did not opt to go to library school to gain strong technology skills. Based on survey comments, the researchers noted two categories of new librarians: those who have a high level of technical experience, usually from a previous job in a technology related industry, and those who struggle with technology. For those who struggle with technology, technology was not the reason they decided to become librarians, and they wish their library school had more hands-on opportunities for technology instruction instead of teaching theoretical applications.14 METHOD To understand, in part, the technology skills of Millennial academic librarians and their attitudes toward technology, the author developed a two-part research study including an online survey and individual interviews with Millennial librarians and library school students. First, an exhaustive three-part survey was created covering multiple aspects of Millennial academic librarians, including demographics, career choice, specialization, generational attitudes, management, and technology skills. Although the survey focused on many areas of data collection, this paper focuses only on technology skills. The survey was disseminated in May 2012 to 50 American Library Association (ALA) accredited library schools in the United States as well as online outlets geared toward new librarians, including the New Members Round Table (NMRT) electronic discussion list, NextGen-l (Next Generation Librarians list), the ALA Emerging Leaders program alumni electronic discussion list, and the ALA Think Tank on Facebook. The survey was INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2013 22 open for 10 days. The survey also asked participants if they would be willing to participate in a follow-up interview. A total of 161 participants volunteered for a follow-up interview. Interviews began once the survey closed, and individuals were contacted via email to schedule an interview at their convenience. A total of 20 interviews were conducted in May and June 2012. The interviews were conducted using the audio-only function of Skype and were recorded using the MP3 Skype Recorder software. The author then transcribed all of the interviews and coded the transcripts. The interview utilized open-ended questions to gather individual stories and offer support to the quantitative demographic and qualitative survey questions.15 The interview questions were semi- structured and asked participants to explain in detail their path to becoming an academic librarian and their attitudes toward technology. RESULTS There were 315 valid survey responses. The birth years of participants ranged from 1982 to 1990 (see figure 1). The respondents were nearly evenly divided between library school students (45.5 percent) and individuals having already obtained a MLS degree (52.1 percent). Concerning the format of their library school program, 38.4 percent earned the degree at an institution entirely in person, 19.6 percent completed the degree entirely online, and 42.0 percent went to a program that was a mix of in person and online courses. Figure 1. Birth-year distribution of survey participants. 41 35 64 50 45 39 33 22 2 1982 1983 1984 1985 1986 1987 1988 1989 1990 0 10 20 30 40 50 60 70 DIGITAL NATIVE ACADEMIC LIBRARIANS, TECHNOLOGY SKILLS, AND THEIR RELATIONSHIPS WITH TECHNOLOGY | EMANUEL 23 QUANTITATIVE DATA Millennials believe it is very important for librarians to understand technology, with 99 percent reporting that it is important or very important. Data on skills related to technology were gathered through several questions, notably by using a list of technologies commonly used in academic libraries and asking respondents to rate their comfort level before starting library school, after library school, and at the present time. The results are illustrated in table 1. Technology Before Library School After Library School At the Present Time Adobe Dreamweaver 1.93 2.5 2.46 Adobe Flash 2.28 2.61 2.66 Adobe Photoshop 2.66 3.15 3.22 Computer Hardware 3.03 3.27 3.32 Computer Networking 2.54 2.85 2.83 Computer Security 2.56 2.96 2.91 Content Management Systems (CMS) 2.34 3.32 3.29 Course Management Systems (Blackboard, Moodle, etc.) 3.37 4.22 4.22 File Management Issues 3.00 3.72 3.67 HTML 2.56 3.56 3.48 Image Editing/Scanning 3.47 3.87 N/A Information Architecture 1.86 2.67 2.58 Integrated Library Systems—Back End N/A 3.05 2.93 Integrated Library Systems—Front End N/A 3.53 3.39 Linux/Unix 1.58 1.83 1.86 Mac OS X 2.92 3.31 3.45 Microsoft Access 2.55 3.19 3.26 Microsoft Excel 3.94 4.37 4.40 Microsoft Windows 4.57 4.67 4.71 Microsoft Word 4.66 4.76 4.79 Mobile Devices 4.27 4.51 4.60 PowerPoint 4.43 4.62 4.65 Programming Languages (C++, .Net, etc.) 1.53 1.94 1.84 Relational Databases 1.87 2.66 2.66 Screen Capture Software (Camtasia, Captivate, etc.) 2.10 3.26 3.32 Server Set Up/Maintenance 1.56 1.85 1.84 Video Conferencing 2.61 3.36 3.54 Video Editing 2.28 2.90 2.94 Web 2.0 (RSS, Blogs, Social Networking, Wikis, etc.) 3.79 4.54 4.49 Web Programming Languages 1.55 1.99 1.92 XML 1.60 2.40 N/A Table 1. Average comfort level with technologies before and after library school and at the current time. Scale: 1 = very uncomfortable to 5 = very comfortable. This list can be split into categories based on the level of technical skill required. Individuals were most comfortable with technologies that are used rather than technologies that enable people to create content, which generally require a higher level of skill. For example, people were comfortable with using content management systems (CMS) and software used to create INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2013 24 webpages, such as Dreamweaver, but they were not comfortable with the information architecture skills, CSS, and HTML needed to create more complex websites. There was also a lack of understanding about relational databases, which serve as the back end of many online library resources that all librarians use to accomplish most reference work. Other deficiencies include Linux, an operating system commonly used to run servers, as well as server set up and administration, which run all web-based library resources and services. There is also a strong lack of computer programming understanding and skills, including C++ and .Net, as well as web programming languages such as PHP, ASP, and Perl. However, when asked which technologies they would like to learn, respondents listed computer and web programming languages the most often, along with other high-level technology skills, including XML, database software and vocabularies, geographic information systems (GIS), Adobe Photoshop, and statistical software, such as SPSS. Data from the technology questions also show that people are learning about technology in library school, but they are learning more about technology they already know how to use than technologies that are new to them. There are a couple of exceptions, including CMS, course management systems, HTML, and screen casting software, with which respondents grew notably more comfortable while in library school (see table 1). More than 84 percent of respondents were required to take a technology course in library school, and they generally believed library school prepared them well to deal with the technological side of librarianship, rating 3.23 on a 1–5 scale. However, respondents did note that most of their technology skill was self-taught (81.7 percent), with only 47.5 percent stating that coursework contributed to their skills. An open-ended question asked what specific technology skills individuals wanted to learn. The results indicate that Millennial librarians desire to learn more of the higher-level technology skills, especially programming, which was indicated in 28 of the 97 responses. Other skills that were frequently noted include various elements of web programming, including scripting, XML, HTML, Photoshop, Microsoft Access, SPSS, and GIS. All of these skills either involve content creation (such as scripting, XML and HTML) or are complicated software that can require a great deal of training to master. See figure 2 for a tag cloud of technologies respondents want to learn. Figure 2. Coded tag cloud indicating responses to question, “Are there any other technologies you want to learn?” DIGITAL NATIVE ACADEMIC LIBRARIANS, TECHNOLOGY SKILLS, AND THEIR RELATIONSHIPS WITH TECHNOLOGY | EMANUEL 25 Clear trends emerged when Millennial librarians were asked about technologies that will be most important to libraries in five year. Mobile devices, including e-readers (such as the Amazon Kindle), apps, and tablet computers were the most common category of responses, followed by social media and social applications aimed at libraries. CMSs for managing website content was also very popular, and website design was also common. Advanced knowledge of database design, including relational database design, and the storage of library data frequently were mentioned, skills that were on a higher level than simply using databases to retrieve information online. Web 2.0 applications were also commonly mentioned, but it is unknown if these overlapped with social media. E-books, not unexpectedly, were very popular. The most popular technology individuals wanted to learn was programming, which came up 25 times, indicating there may be a gap in the technical skills that librarians have and the skills they need to have. See figure 3 for a visualization of coded responses. Figure 3. Coded responses to question, “What three technologies will be most important to libraries in five years?” QUALITATIVE DATA Interview participants exhibited a wide variety of diversity and roughly matched the demographics of survey participants. Demographic information for survey participants was gathered from their survey responses. Ten of the interviewees were born in 1984 or 1985, with the remaining ten born during the remaining years between 1982 and 1989. Three participants were male, one did not indicate sex on the survey, and the remaining 16 participants were female. Fifteen identified their race as white, two African American, one Middle Eastern, one Hispanic, and one from multiple races. Interview participants were from 14 different states. All participants were given pseudonyms for purposes of data analysis. The interview transcripts were coded into INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2013 26 broad categories, including attitudes about being digital natives, and technical skills relating to career choice. Digital Native Issues related to being digital natives came up often when Millennial librarians were asked to talk about their experiences using technology before they began library school, in library school, and on the job. However, not all considered themselves a digital native, very tech savvy, or able to pinpoint exactly what their tech skills are. Most, however, did believe that there were differences in technology use and attitudes between librarians who were younger versus older librarians. Childhood Technology Most remember when they first had a computer in their home as a child, so it was not a part of their lives from birth, but rather from a young age that most participants remember. Betty and Diana recall always having technology in their homes growing up because their parents worked in technology careers or had an interest in it as a hobby. As Diana stated, “Both [parents] worked in the IT field, so when I was really little, they spent an astronomical amount of money on a computer back in the mid to late 1980s, so I’ve always grown up with technology.” Others remember first being exposed to computers in school, with Catharine saying, “I can remember being in elementary school and being on a computer and having specialized training. Not just in typing but they even pulled people out of class to learn how computers work.” Heather vividly remembers her family getting their first computer: “We got one in my house when I was like in the sixth grade and that was a huge thing.” Participants also remember having Internet access as a child. Betsy noted, “I had a Prodigy (online service) account when I was seven, when most people did not even know what the Internet was at that point.” Gabby said, “I think they call people between 21 and 30 the ‘in between’, because they knew what it was like before technology, but they also know how to use technology . . . because I remember before computers.” Kelly talked extensively about how she grew up with technology: I think we got our first computer when I was in the fifth grade. I definitely grew up with it. I used it in school. I remember what life was like before computers, though. I have that little bit of perspective there. But it was definitely part of my daily life. And in college I joined Facebook back when it was only for college students and now people cannot remember that now. But I used email, was one of the first users of Gmail. I got a little more into it in college. Olivia also talked about her use of technology as a child: We had a computer in my house. We were very fortunate because my dad was on top of that. So we had a computer since I was a little kid. So I would play around on that a lot, like AOL and Prodigy. I had the basic skills. And in high school we were taught basic word processing and Excel. So I’ve always been in front of a computer. DIGITAL NATIVE ACADEMIC LIBRARIANS, TECHNOLOGY SKILLS, AND THEIR RELATIONSHIPS WITH TECHNOLOGY | EMANUEL 27 Concept of the Digital Native Most people believed they are digital natives because they have been working with technology for a long time, which sets them apart from older generations who they thought did not work extensively with technology until they were adults. Catharine stated, “I know it has been a part of my life forever so probably my age does have something to do with it.” Catharine talked about the differences in technology skill between herself and her older colleagues, but added, “I don’t feel there is an unwillingness for them to learn technology. I just don’t think they had experiences at the time, where maybe we’re just afforded more opportunities.” However, when pressed, not all considered themselves digital natives. Abby recalls a class discussion about the idea of digital natives and how younger people may not be as good with technology as they perceive. Because of this, she was hesitant to refer to herself as a digital native, even though growing up she believed technology was a part of her life. Others, such as Betsy, are reluctant to call themselves digital natives because they remember when their family first got a computer and it was not always in their household. There were also a couple of outliers who were reluctant to call themselves digital natives because they did not grow up with technology in the same ways as did many of their peers. Rachel grew up in a poorer home that always got technology second-hand, and she always thought they were behind others. Although her family first had an Apple Computer in the 1980s, she did not recall using it, and just thought of it as a sort of “new appliance” in her house. Her family did not emphasize technology use and saw it as something not worth investing in until they had to, which gave her a different perspective of using technology only as necessary and as “one of those things that sometimes I just don’t want to deal with.” Samantha grew up in a rural area that only had dial-up Internet, which embarrassed her and did not work as well as she thought it should, so she did not use it, leading to a belief that she did not grow up on the Internet in the same way as her peers. Because of this, she did not consider herself a digital native: I’m still able to relate to those in a different generation who have no idea where to start [with technology], because I was at that state recently. . . . I’m at the in-between stage, so I can handle both ends of the [technology use] spectrum. But yeah, I’m not a digital native. Technology Reaction Participants assumed that, because of their age, they were not as scared or intimidated by technology as they thought some of their colleagues were. Heather talked about how learning new things would initially make her nervous, but then would get excited about what the new program or application could do for herself or her work. Francis stated, “I’m not afraid of the technology.” She also talked about the differences between herself and her older colleagues: If you ask them something different or to learn something new, they will make it more complicated. I’m so used to exploring my options, I don’t think about it. Those 20–30 years older than me are comfortable knowing what they know how to do but not necessarily exploring new ways of doing something that they already know how to do. They feel pretty comfortable and confident in their skills INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2013 28 but aren’t really looking to test the waters to see if there is a different way to do something. . . . I’m willing to try. I see a lot of people that are afraid they are going to break something and don’t want to click on it. And I have the confidence that if I click on something, then I can pretty much undo whatever that does. So not necessarily skills, but a different mindset or something. As Francis inferred above, younger librarians, because they have always used technology, believe they can quickly learn new technologies. Quinn, a current student, also talked extensively about this: I definitely think my age has a lot to do with how comfortable I am with it. Because there are various ages within [my library school] and I have definitely noticed that older people fear it a bit more. I guess I can attribute my age to being embedded in technology. Because I’ve always had it, well I haven’t always had it, but I had it young enough to feel like it is a part of me, as opposed to new fangled and wasn’t with it in the beginning. . . . I’m not afraid of it, I’m not afraid to mess around with it and mess things up. Because you can always reboot or start over. I think that’s the biggest thing, like I will work on something and mess around with it until I figure it out as opposed to someone who is older who wants to know something exactly the right way so they don’t want to do anything bad to it. Heather stated: I think I’m a bit more open to new technologies than some of my older colleagues. . . . I have the feeling I know a little bit more. . . . I’m not sure it is just because my comfort level was higher or maybe their experiences make them more cautious about new things, but I think the younger librarians are more quick to latch on to new things. Other participants shared this same belief when talking about the difference in work styles and technology use among different ages in their workplace. Technology Skills The individual technology skills individuals described focus on the use of technology, not the creation of it. Francis described this: I don’t have any programming or coding [experience] or building physical computers or anything like that, but just general using a variety of devices like the iPad, iPhone, everything is all integrated. I like being able to use technology in my personal life. No one responded that they knew how to program and work with servers, though Edward said he had “fiddled with Linux as a server” but did not spend a lot of time with it. Olivia and Quinn, however, did express interest in learning how to program, understand the back end, and create emerging technologies. Betty mentioned using SQL and XML in her workplace and aired her frustrations that people just expect to be able to use technology without learning how it actually works and what went into making that device or service. Several people mentioned working in web design, but only a couple people mentioned creating webpages with HTML and CSS, though several had experience using tools such as Dreamweaver or FrontPage. Ian mentioned that it was part of her public library job to assist patrons with using their personal devices, while others DIGITAL NATIVE ACADEMIC LIBRARIANS, TECHNOLOGY SKILLS, AND THEIR RELATIONSHIPS WITH TECHNOLOGY | EMANUEL 29 stated that when they have technology problems, they simply contact their IT departments. Many participants mentioned using social media and various Web 2.0 applications such as Facebook and Twitter, both personally and professionally. When asked to compare their tech skills from before they became librarians to after, some described minor changes in skills, such as learning HTML, but others mostly indicated that library school helped them learn new applications, existing technologies, or new technology resources, most without going into detail. Quinn talked about her tech skills in relation to what she is learning in library school: I think they [technology skills] are actually above average. I’ve taken a few of the courses that are offered in terms of tech, and they are totally below what I already know. But other classmates have thought it was really hard. But I’ve had prior knowledge of it. Patricia stated she started using online tools more extensively after learning about them in library school. One talked extensively about using webinar software and LibGuides to deliver instruction online, while another stated library school inspired her to start a blog that she did not keep up, and another became an extensive Twitter user. Jan focused on digital librarianship while in library school because she saw it as the future of libraries. She thought that library school helped her do some “encoding on some projects and how to do webpages,” but it barely touched on the skills needed to actually perform a job within digital librarianship. She would like to get more into the development side of library technology, but in her current job there is not the time or support to further advance those skills. A couple of participants talked about learning about usability and the evaluation of technologies. A few interview participants mentioned the tech skills of people even younger than they are, or current college students they work with. Betty did not see younger coworkers understanding what is needed to develop or understand the back end of technology and believed younger workers do not use technology to communicate as effectively as they could. Edward, who works at a for-profit career college that has many poorer and nontraditional students, stated, it is “not just the 50 year olds, but the 18 year olds who don’t know how to attach documents to an email.” When pressed as to why she thinks young students struggle with basic technology tasks, he stated, “At times I think that has a lot more to do with their K–12 experience and if they had access to computers and stuff. I don’t know. It just blows me away sometimes.” Gabby, currently working in Appalachia, said, “Not everyone here has computer skills, not everyone has access to it at home or maybe can’t make it to the library. . . . I think it is awesome to have those things at your fingertips, but not everyone does.” On the other hand, Diana believed that she does not “have the same relationship with technology like I’m seeing some of the college students now where they are hooked in all the time and they are just going for it”. She also said that she “wouldn’t call myself a digital immigrant, but I’m very comfortable using technology but not to the extent I’m seeing many people I see now.” INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2013 30 Tech Skills Related to Career Choice The researcher sought to determine the role of technology in determining the career choice of librarianship. Those interview participants who talked about using technology did not mention it as a reason they became a librarian. Survey responses indicated that opportunities to use technology were an important reason to become a librarian, but participants did not stress technology during the interviews. Participants were much more likely to specify their love of the academic atmosphere or their general interest in research first and then maybe think of technology as an afterthought. Gabby mentioned, after a long list of things that influenced her career choice, “and technology and stuff.” Only Taylor talked extensively about how technology influenced her choice. A current library school student, she wanted to go into archives and is really excited about how much information is being digitized and put online: You know how everything on microfiche is now digital? Everything seems to be digitized as well, you know books and e-books and journals. Being able to take something and scan it and put it online for users to access. It is definitely an important thing. So yeah, that definitely influenced me on becoming a librarian. Jan decided to specialize in digital librarianship while in library school because she saw it was the future of library work. Rachel, who has observed similar attitudes among her classmates, shared this thinking as well. However, Heather admitted she did not have a lot of technology experience before going to library school and did not believe that her master’s program prepared her to go into the technology oriented digital librarianship. Several participants talked about how their background using search engines such as Google and doing research online would make them better librarians, but none talked about these as factors related to choosing librarianship as a career. Abby talked about how she always uses Google to look things up, and that it is nice to have found a career that rewards such use. Diana discussed how she had always been good at finding information online since she was a child, which helped her narrow her career choice to academic librarianship, as she believed it was the best match for these skills. Instead of talking about how technology influenced their career choice, participants were more likely to talk about the fact that technology did not influence them. Abby stated, “I don’t think [technology influenced me] because I didn’t really know that librarians needed a lot of technology skill.” Edward stated he, “didn’t do any technology in library school because I didn’t want to go in that direction,” reiterated this. Rachel, who strongly did not consider herself a digital native, stated she was drawn to librarianship, specifically access services, because she liked working with print books rather than using online resources to find information. She commented, I really liked looking for books and I used the card catalog when I was a kid, but I can use a computer to help people find things, but it was like, I really just liked finding the books rather than electronic DIGITAL NATIVE ACADEMIC LIBRARIANS, TECHNOLOGY SKILLS, AND THEIR RELATIONSHIPS WITH TECHNOLOGY | EMANUEL 31 information. I guess I feel like it feels comfortable and safe, like books. And you can hold them and you can touch them. And sometimes I feel like they should always be a part of the library. I took a digital libraries course this past semester and I felt like I was the only one being like, “No, we still need physical books,” so I was actually realizing how intimidated I am with technology. I’m totally willing to adapt, and I’m willing to work on these issues, but I do feel like I want the library to still be a place that has the traditional feel. Samantha also did not feel like a digital native, as she grew up in a rural area that only had access to dial-up Internet. She went on to describe how she did not work with online tools until college and she was relieved when she did not have to use such tools during a year off between college and graduate school. Although she recognized technology use by librarians is helping libraries not becoming obsolete, she only learned what she needed to learn in order to complete library school, so it did not influence her career choice. IMPLICATIONS Millennials are very comfortable with technology, though there are limitations to their skills. For the most part, they have a lifetime attachment to technology,16 but they do remember a time without having a computer in their homes or when computers were something only used at school and for basic instruction. As interview participant Frances put it, “nothing like how students get to use them now.” Millennial grew up with computers, but early on, they were not advanced enough to do the multimedia creation and application building that is done now, and they mostly use resources that were developed by others. However, Millennial librarians in this study do see the utility that computers have in everyday life, and by high school, many stated that computer use was required for them to go about their academic and personal lives, but they thought that technology in its current state with online research resources and social networking did not come about until they were in college. Additionally, most interview respondents said that library school helped acculturate them into using technology more frequently. However, not everyone in the study grew up with a computer or Internet access in his or her home. Two interview respondents refused to call themselves digital natives. One said she grew up in an environment without much money, and the only technology her family had access to was often secondhand and several years behind. The other participant grew up in a rural area that did not have access to high-speed Internet, and as a result, she was rarely online until college. Both individuals believed that technology was definitely not a factor in them being drawn to librarianship, and they were more interested in the circulation and the print resources than in specializations that require a high level of technical knowledge. Other participants were quick to acknowledge that there are many members of their generation who, for one reason or another, do not have an interest in technology and may not have had the resources growing up to have incorporated it into their daily lives. Some participants noted there was some computer instruction starting in elementary school, but it was very basic computer literacy, and most of their technology learning occurred at home when there was the time to focus on tasks that were more complicated. INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2013 32 Even though study participants remember a time without technology in their homes and they believe that technology did not mature to its current state until they were in college, they have used it for a much larger percentage of their lives than older generations. For that reason, they are quick to learn new technologies as they become available or are required based on professional needs. They also believe that because computers had matured alongside them, they are not afraid to break them. Interview participant Abby states, “I have a lot of faith in technology.” Millennials believe that they can experiment with technology without fear that it will become inoperable or cause additional headaches in the future. They are also not wedded to particular technologies and do not get frustrated by current technologies and applications because they think something newer and better is always around the corner. One disconnect in the technology skills of Millennials is that most of them are accustomed to using technology, not creating it or understanding the back end infrastructure. As one interview participant said, “they expect everything to be easy, but they don’t understand what went into trying to make it easy.” Although many librarians indicated they use tools such as Camtasia to create multimedia projects, many thought they had weak skills in this area and desired to learn more. They are also most likely to edit content on webpages using a CMS system such as Drupal or LibGuides instead of creating more elaborate websites utilizing information architecture principles or more complex web programming languages (such as PHP) or relational databases (such as MySQL). They rely on dedicated tech people to set these up and maintain the servers that house these services, but they desire to learn more about these technologies themselves. There is also a strong desire to learn more traditional computer programming languages such as C++, C#, and Perl. Many participants thought library school only affected their technology skills marginally, and they desire to learn higher-order skills that can be applied to their job. Millennials are comfortable learning front-end technologies on their own, but they need help understanding the technology behind the tools they use in their daily lives. CONCLUSION This mixed-methods study examined many characteristics of Millennial librarians, and this article noted their technology skills and attitudes toward technology. The findings indicate that technology did not play a major role in their decision to become academic librarians. The data also reveal that, although Millennial librarians mostly grew up with technology and believe this sets their skills apart from older librarians, their skills are mostly in using technology tools and not in creating them. They also believe their status as digital native has allowed them to recognize that librarianship is changing as a career. However, Millennial librarians still respect their older colleagues and the skills associated with traditional librarianship and are firmly rooted in traditions. Millennial librarians just want to be able to shape the profession in their own way. DIGITAL NATIVE ACADEMIC LIBRARIANS, TECHNOLOGY SKILLS, AND THEIR RELATIONSHIPS WITH TECHNOLOGY | EMANUEL 33 REFERENCES 1. William Strauss and Neil Howe, Millennial and the Pop Culture: Strategies for a New Generation of Consumers in Music, Movies, and Video Games (Great Falls, VA: Life Course Associates, 2006). 2. Haidee E. Allerton, “Generation Why: They Promise to be the Biggest Influence since the Baby Boomers,” Training and Development 55, no. 11 (2001): 56–60; Don Tapscott, Growing Up Digital: The Rise of the Net Generation (New York: McGraw-Hill, 2008). 3. Rachel Singer Gordon, The NextGen Librarian’s Survival Guide (Medford, NJ: Information Today, 2006); Sophia Guevara, “Generation Y what can we do for You?” Information Outlook 11, no. 6 (2007): 81–82; Diane Zabel, “Trends in Reference and Public Services Librarianship and the Role of RUSA: Part Two,” Reference & User Services Quarterly 45, no. 2 (2005): 104–7. 4. Gordon, The NextGen Librarian’s Survival Guide. 5. Gordon, The NextGen Librarian’s Survival Guide; Lisa Johnson, Mind Your X’s and Y’s: Satisfying the 10 Cravings of a New Generation of Consumers (New York: Free Press, 2006); William Strauss & Neil Howe, Millennial Rising: The Next Great Generation (New York: Vintage, 2000); Tapscott, Growing up Digital; Ron C. Zemke, Claire Raines, and Bob Filipczak, Generations at Work: Managing the Clash of Veterans, Boomers, Xers, and Nexters in Your Workplace (New York: Amacom, 2000). 6. Johnson, Mind your X’s and Y’s; Tapscott, Growing up Digital. 7. Strauss and Howe, Millennial and the Pop Culture. 8. Zemke, Raines, and Filipczak, Generations at Work. 9. Tapscott, Growing up Digital. 10. Strauss and Howe, Millennial and the Pop Culture; Tapscott, Growing up Digital. 11. L. P. Morton, “Targeting Generation Y,” Public Relations Quarterly 47, no. (2002): 46–48; P. Paul, “Getting Inside Gen Y,” American Demographics 23, no. 9 (2001): 42–49. 12. Paul, “Getting Inside Gen Y”; Tapscott, Growing up Digital. 13. Mark Bauerlein, The Dumbest Generation: How the Digital Age Stupefies Young Americans and Jeopardizes our Future (Or, Don’t Trust Anyone under 30) (New York: Penguin, 2008). 14. Darcy Del Bosque and Cory Lampert, “A Chance of Storms: New Librarians Navigating Technology Tempests,” Technical Services Quarterly 26, no. 4 (2009): 261–86. 15. Carol H. Weiss, Evaluation: Methods for Studying Programs and Policies (Upper Saddle River, NJ: Prentice Hall, 1998). 16. Allerton, “Generation Why ”; Tapscott, Growing up Digital. Tech Skills Related to Career Choice 4303 ---- Microsoft Word - ital_march_gerrity.docx Editor’s Comments Bob Gerrity   INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  MARCH  2013   1       With  this  issue,  Information  Technology  and  Libraries  (ITAL)  begins  its  second  year  as  an  open-­‐ access,  e-­‐only  publication.  There  have  been  a  couple  of  technical  hiccups  related  to  the  publication   of  back  issues  of  ITAL  previously  only  available  in  print:  the  publication  system  we’re  using  (Open   Journal  System)  treats  the  back  issues  as  new  content  and  automatically  sends  notifications  to   readers  who  have  signed  up  to  be  notified  when  new  content  is  available.  We’re  working  to   correct  that  glitch,  but  hope  that  the  benefit  of  having  the  full  ITAL  archive  online  will  outweigh   the  inconvenience  of  the  extra  e-­‐mail  notifications.  Overall  though,  ITAL  continues  to  chug  along   and  the  wheels  aren’t  in  danger  of  falling  off  any  time  soon.  Thanks  go  to  Mary  Taylor,  the  LITA   Board,  and  the  LITA  Publications  Committee  for  supporting  the  move  to  the  new  model  for  ITAL.   Readership  this  year  appears  to  be  healthy—the  total  download  count  for  the  thirty-­‐three  articles   published  in  2012  was  42,166,  with  48,160  abstract  views.  Unfortunately  we  don’t  have  statistics   about  online  use  from  previous  years  to  compare  with.    The  overall  number  of  article  downloads   for  2012,  for  new  and  archival  content,  was  74,924.    We  continue  to  add  to  the  online  archive:  this   month  the  first  issues  from  March  1969  and  March  1981  were  added.  If  you  haven’t  taken  the   opportunity  to  look,  the  back  issues  offer  an  interesting  reminder  of  the  technology  challenges  our   predecessors  faces.     In  this  month’s  issue,  ITAL  Editorial  Board  member  Patrick  “Tod”  Colegrove  reflects  on  the   emergence  of  makerspace  phenomenon  in  libraries,  providing  an  overview  of  the  makerspace     landscape.  LITA  member  Danielle  Becker  and  Lauren  Yannotta  describe  the  user-­‐centered  website   redesign  process  used  at  the  Hunter  College  Libraries.  Kathleen  Weessies  and  Daniel  Dotson   describe  GIS  lite  provide  examples  of  its  use  at  the  Michigan  State  University  Libraries.  Vandana   Singh  presents  guidelines  for  adopting  an  open-­‐source  integrated  library  system,  based  on   findings  from  interviews  with  staff  at  libraries  that  have  adopted  open-­‐source  systems.  Danijela   Boberić  Krstićev  from  the  University  of  Novi  Sad  describes  a  software  methodology  enabling   sharing  of  information  between  different  library  systems,  using  the  Z39.50  and  SRU  protocols.   Beginning  with  the  June  issue  of  ITAL,  articles  will  be  published  individually  as  soon  as  they  are   ready.  ITAL  issues  will  still  close  on  a  quarterly  basis,  in  March,  June,  September,  and  December.   By  publishing  articles  individually  as  they  are  ready,  we  hope  to  make  ITAL  content  more  timely   and  reduce  the  overall  length  of  time  for  our  peer-­‐review  and  publication  processes.   Suggestions  and  feedback  are  welcome,  at  the  e-­‐mail  address  below.       Bob  Gerrity  (r.gerrity@uq.edu.au)  is  University  Librarian,  University  of  Queensland,  Australia.   4308 ---- Editorial Board Thoughts: “India does not exist.” Mark Cyzyk INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2013 4 Often, I find myself trolling online forums, searching for and praying I find a bona-fide solution to a technical problem. Typically, my process begins with the annoying discovery that many others are running into the same, or very similar, difficulty. Many others. Once I get over my initial frustration ("Why isn't this problem fixed by now?"), I proceed to read, to attempt to determine which of the often conflicting and even contradictory suggestions for fixing the problem might actually work. I thought it would be instructive to step back for a moment and examine this experience. To do so, I want to use as my example, as my straw man, not a technical question, but a more generic question, the sort of question anyone might conceivably ask. I'll ask this question, then I'll list what I think might be answers, in form and substance, from the technical forums had it been asked there: "I want to go to India. How best to get there?" Why would you want to go there? You could fly. You could take a ship. Why go to India? Iceland is much better. I went to India once and it wasn't that great. You never specify where in India you want to go. We can't help you until you tell us where in India you want to go. I am sick and tired of these people who don't read the forums. Your query has been answered before. The only way to get there is to fly first class on Continental. You could ride a mule to India. New Zealand is much better. You should go there instead. It is impossible to go to India. You can get from India to anywhere in Europe very easily via India Air. You should read A Passage to India, I forget who wrote it. I read it as an undergraduate. It was very good. You are an idiot for wanting to go to India. India does not exist. Mark Cyzyk (mcyzyk@jhu.edu), a member of LITA and the ITAL editorial board, is the Scholarly Communication Architect in The Sheridan Libraries, The Johns Hopkins University, Baltimore, Maryland. mailto:mcyzyk@jhu.edu EDITORIAL BOARD THOUGHTS: INDIA DOES NOT EXIST | CYZYK 5 I think it's safe to say that the signal to noise ratio here is high. If we truly want to answer a question, we don't want to add noise. Pontificating, posturing, and automatically posing as a mentor in a mentor/protégé relationship will typically be construed as adding nothing but noise to the signal. In most cases, we who answer such questions are not here to educate, except insofar as we provide a clear and concise answer to a technical query issued by one of our peers. What should we assume? First off, we should assume that the person writing the question is sincere: He truly does want to go to India. We need not question his motives. The best way to think about this is that the query is a hypothetical: If he were to want to go to India, how best to do it? If you were to want to go to India, how best to do it? This requires a certain level of empathy on the part of the one answering the question, a level of empathy of which the technical forums are all but devoid. Many answers on those forums are so tone-deaf to human need they may as well have been written by robots. "How best to get there" is tricky because you must make some assumptions. Assumptions are fine as long as you're explicit about them. One assumption might be: He is leaving from the East Coast of the United States. Another assumption might be: He is going to India only for a short while, for a conference or vacation. Yet another one might be: By "best" he means "quickest, most efficient, least expensive." Stating these assumptions, then stating your answer to the question, is appropriate and is what is most helpful. Stating your assumptions is tantamount to stating your understanding of the original question, its scope and context. This is always a helpful thing to do when attempting to communicate with another human being. Now, communication and plumbing the depth of human need, at least with respect to informational and bibliographic needs, has always been a strong suit of librarians, so what I write here is not really directed at librarians. It is, though, directed at we who straddle both the library world and the technology world, if that distinction is not a false one and can be usefully made. I think it important for those of us split between two cultures to ensure that we fall to one side and not the other, in particular that we do not fall into the oftentimes loutish and ultimately unproductive communication mores exhibited by many of the online technical forums. Whenever my wife and I hear a news story on TV or radio openly wondering why more women do not go into I.T., I blurt out something like: "You wanna know why? Just go read the comments section of most posts at Slashdot.com. Why on earth would anyone who didn't have to put up with that kind of culture actually choose to put up with it?" Isn't "India does not exist" exactly the kind of response one would find on Slashdot.com if the initial question was, "I want to go to India -- how best to get there?"? With all this in mind, I hereby issue my own question, this time a technical one: INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2013 6 "I want to programmatically convert a largish set of documents from PDF to DOCX format. How best to do it?" I hope you don't think I'm an idiot. 4454 ---- Adding Value to the University of Oklahoma Libraries History of Science Collection through Digital Enhancement Maura Valentino INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2014 25 “In getting my books, I have been always solicitous of an ample margin; this not so much through any love of the thing itself, however agreeable, as for the facility it affords me of penciling suggested thoughts, agreements and differences of opinion, or brief critical comments in general.” —Edgar Allan Poe ABSTRACT Much of the focus of digital collections has been and continues to be on rare and unique materials, including monographs. A monograph may be made even rarer and more valuable by virtue of hand written marginalia. Using technology to enhance scans of unique books and make previously unreadable marginalia readable increases the value of a digital object to researchers. This article describes a case study of enhancing the marginalia in a rare book by Copernicus. BACKGROUND The University of Oklahoma Libraries History of Science Collections holds many rare books and other objects pertaining to the history of science. One of the rarest holdings is a copy of Nicolai Copernici Torinensis De revolvtionibvs orbium coelestium (On the Revolutions of the Heavenly Spheres), libri VI, a book famous for Copernicus’ revolutionary astronomical theory that rejected the Ptolemaic earth-centered universe and promoted a heliocentric, sun-centered model. The History of Science Collections’ copy of this manuscript contains notes added to the margins. Similar notes were made in eight different existing copies, and the astrophysicist Owen Gingerich determined that these notes were created by a group of astronomers in Paris known as the Offusius group.1 The notes are of significant historical importance as they offer information on the initial reception of Copernicus’ theories by the Catholic community. Having been created almost five hundred years ago in 1543, the handwriting is faded and the ink has absorbed into the paper. Maura Valentino (maura.valentino@oregonstate.edu) is Metadata Librarian, Oregon State University, Corvalis, Oregon. Previously she was Digital Initiatives Coordinator at the University of Oklahoma. mailto:maura.valentino@oregonstate.edu ADDING VALUE TO COLLECTIONS THROUGH DIGITAL ENHANCEMENT | VALENTINO 26 Written in cursive script, the letters have merged as the ink has dispersed, adding to the difficulties inherent in reading these valuable annotations. The book had previously been digitized, and while some of the margin notes were readable, many of the notes were barely visible. Therefore much of the value of the book was being lost in digital form. To rectify this situation the decision was made to enhance the marginalia. It was further decided that once the margin notes were enhanced, two digital representations of each page that contained notes would be included in the digital collection. One copy would present the main text in the most legible fashion (figure 1) and the second copy would highlight the marginalia and ensure that these margin notes were as legible as possible even if in doing so the readability of the main text was diminished (figure 2). Figure 1. Text readable. Figure 2. Marginalia enhanced. While creating a written transcript of the marginalia was considered and would have added some value to the digital object, this solution was rejected in favor of digital enhancement for the following reasons. Many of the notes contained corrections with lines drawn to the area of text that was being changed, or bracket numbers (figure 3). In addition, some of the notes are corrections of numbers or tables, so a transcript of the text would do little to demonstrate the writer’s intentions in creating the margin note (figure 4). Figure 3 .Bracketed corrections. Figure 4. Numerical corrections. INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2014 27 Also, sometimes there was bleed through from the reverse page, further disrupting the clarity of the marginalia (figures 5 and 6). Therefore it was determined that making the notes more readable through digital enhancement would provide the collection’s users with the most useful resource. Figure 5. Highlighted—bleed through reduced Figure 6. Bleed through behind. marginalia. The book can be viewed in its entirety here: http://digital.libraries.ou.edu/cdm/landingpage/collection/copernicus LITERATURE REVIEW “Modification of photographs to enhance or change their meaning is nothing new. However, the introduction of techniques for digitizing photographic images and the subsequent development of powerful image editing software has both broadened the possibilities for altering photographs and brought the means of doing so with the reach of anyone with imagination and patience.”2 —Richard S. Croft The primary goal of this project was to give researchers in the history of science the ability to clearly decipher the marginalia created by the astronomers of the Offusius group as they annotated the book using the margins as an editing space. The literature agrees that marginalia is an important piece of history worth preserving. Hauptman states, “The thought that produces the necessity for a citation or remark leads directly into the marginal notation.”3 He also adds, “Their close proximity to the text allows for immediate visual connection.”4 Howard asserts, “For writers and scholars, the margins and endpapers became workshops in which to hammer out their own ideas, and offered spaces in which to file and store information.”5 She also adds that marginalia can “serve as a form of opposition.”6 This is true in this case as some of the marginalia http://digital.libraries.ou.edu/cdm/landingpage/collection/copernicus ADDING VALUE TO COLLECTIONS THROUGH DIGITAL ENHANCEMENT | VALENTINO 28 contradicts Copernicus. Nikolova-Houston argues for the historical aspect: “Each of the marginalia and colophons is a unique production by its author, and exists in only one copy.”7 She goes on to add, “Manuscript marginalia and colophons possess historical value as primary historical sources. They are treated as historical evidence along with other written and oral traditions.”8 Such ideas provide a strong justification for the implementation of marginalia enhancement in digital collections. As mentioned above, it was determined that a transcription would not have had the same effect as digital enhancement of the margin notes. This approach is also supported by the literature. For example, Ferrari argues for the digital publication of the marginalia that Fernando Pessoa, the Portuguese writer, made while reading. One of the cornerstones of his argument is that digital representation of marginalia allows the reader not only to see the words but also the underlining and other symbols that are not easily put into a transcript. In this way, the user of the digital collection obtains a more complete view of the author of the marginalia’s intent.9 Another goal of this project was the general promotion of the University of Oklahoma’s History of Science Collections. Johnson, in his New York Times article, notes that marginalia lend books an historical context while enabling users to infer other meanings from their texts.10 He also quotes David Spadafora, president of the Newberry Library in Chicago, who proclaims that “the digital revolution is a good thing for the physical object.” As more people access historical artifacts in electronic form, he notes, “The more they’re going to want to encounter the real object.”11 In this way, enhancement of the marginalia in digital collections can lead to further exposure for the collection and to greater use of the physical objects themselves. Using digital enhancement is not a new idea. Morgan asserts, “The innovation of the World Wide Web is its exciting capacity for space that, while not limitless, is weightless and far less limited that that of the printed book.”12 Le, Anderson and Agarwala also add, “Local manipulation of color and tone is one of the most common operations in the digital imaging workflow.”13 The literature shows that other projects have used enhancement of the digital object to increase the usefulness of the original artifact. One of the projects pursued during the Library of Congress’s American Memory initiative involved the digitization of the work of photographer Solomon Butcher. In this case, technicians were able to enhance an area of one photograph that was blurry in normal photographic processes and allow the viewer to see inside a building.14 The Archivo General de Indias also used digital enhancement to remove stains and bleed-through from ancient manuscripts and render previously unreadable manuscripts readable.15 In an article advocating for a digital version of William Blakes’s poem The Four Zoas, Morgan notes that some features of the manuscript can only be seen in the digital version rather than a transcription: “Sections of the manuscript show intense revision, with passages rubbed out, moved earlier or later in the manuscript, and often, added in the margins.”16 INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2014 29 Digital processing is not limited to the use of photo editing software. Although Giralt asserts that it is a common method, “the ample potential for image control and manipulation provided by digital technology has stirred a great interest in postproduction, and digital editing.”17 Other projects have used various technologies to enhanced images to give added meaning to a digital image. Once again, in her article advocating for the digitizing of William Blake’s The Four Zoas, Morgan asserts that various enhancement technologies would help readers obtain the greatest benefit from the manuscript. For example, providing “the added benefit of infra-red photography,” would allow “readers to see many of the erased illustrations.”18 She even hopes coding will enhance the usefulness of a digital object: “Our impulse to use XML in order to richly encode a text works against passivity. With coding we clarify a work down to its smallest units, and illuminate specific aspects of its structure, aspects that are often less obvious when the work is presented in the form of a printed book.”19 METHOD Locating the Marginalia Each page of the book had been previously scanned and was stored in Tagged Image File Format (TIFF). Each digital page (TIFF image) was carefully examined for marginalia. This was achieved by examining the image in Adobe Photoshop using the Zoom Tool to enlarge the image as necessary. As many notes were barely visible, the entirety of each page had to be examined in detail to ensure that margin notes were not overlooked. Enlargement of the image in Photoshop greatly facilitated this process. Enhancing the Marginalia Once all the pages with marginalia were identified, each page was loaded into Adobe Photoshop for digital processing and enhancement. The following procedure was used (Note: The specific directions that follow reference Adobe Photoshop CS4 for Windows but can be generally applied to most software programs intended for photo editing): 1. Using the Zoom Tool, the image was enlarged to facilitate examination and interaction with the marginalia. 2. Individual margin notes were selected using the Rectangular Marquee Tool. The area selected included any lines that were drawn from the notes to the original text so it would be clear to what text the margin note referred. 3. As the handwritten margin notes were orange in tone, a blue filter was applied (as blue is the contrasting color to orange) by selecting Adjustments from the Image menu and then choosing Black and White to display the Black and White Dialog box. In the Black and White dialog box, Blue Filter was selected from the Preset drop-down menu. This small adjustment greatly enhanced the readability of the margin notes. 4. With the area still selected, Adjustments was again selected from the Image menu. From that Adjustments submenu, Brightness and Contrast was selected. Adjustments were made to both these values using the sliders presented by the resulting dialog box to ADDING VALUE TO COLLECTIONS THROUGH DIGITAL ENHANCEMENT | VALENTINO 30 further enhance the margin notes legibility. For this particular project, the values selected were generally negative twenty for contrast and positive twenty for brightness. File Naming Conventions Each enhanced image was saved with the same filename as the digital image of the original manuscript page, but with an A (for annotated) added to the end of the filename. This naming scheme enabled a distinction between pages with and without enhanced marginalia. This series of steps was repeated for each page (see table 1). Page Name Explanation Book Spine Pictures of the covers Book Cover Inside Cover Blank Page With Ruler to Measure Page Folio - 001 Page 1 as originally scanned Folio - 001 Verso Page 1, reverse side, as originally scanned Folio - 001 Verso A Page 1, reverse side, with highlighted marginalia Folio - 002 Page 2 as originally scanned Folio - 002A Page 2 with highlighted marginalia Folio - 002 V Page 2, reverse side Folio - 002 V A Page 2, reverse side, with highlighted marginalia Table 1. Filenames. Importing into the Digital Management System CONTENTdm was the digital management system selected for this project. All original manuscript page images and enhanced marginalia page images were imported into CONTENTdm following their creation. The next step was to bring all the pages into CONTENTdm as one compound object. A Microsoft Excel spreadsheet was created with a line for each page, annotated or not. Only three fields were used: title, rights, and filename. A description of the book was placed on its History of Science Digital Collections webpage with a link to the compound object in CONTENTdm, so further metadata was not necessary and can always be added later. The first row only contained the title of the book (no filename). There were tiffs available of the cover, the bookend, the inside cover, and the book with a ruler. These were the next rows. Then we began with the pages and titled them as the pages were numbered. There were ten pages numbered with Roman numerals and then the pages began with alphanumeric page numbers. Each page that had handwritten notes had the original page (page 2, for example) and the page with the INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2014 31 notes highlighted (page 2 Annotated). This would allow the viewer to view the pages in their original form or with the notes highlighted or both, depending on each user’s research interests. Once the Excel file was complete with each page and its filename entered in a row, the file was saved as a tab-delimited file. Import into CONTENTdm required that all the TIFF files be in one folder. Once the files were moved, the CONTENTdm Compound Object Wizard was used to import. This book was imported as a compound object with no hierarchy. As this book was published in 1593, it has no chapters. To specify page names, the choice to label pages using tab-delimited object for printing was used. The filenames did not contain page numbers, and the choice to label pages in a sequence was not an option, as two copies of each annotated page existed. Each object imported into CONTENTdm has a thumbnail image associated with it. CONTENTdm will create this image, but the cover of this book is not attractive, so a JPEG file was created using an image from the book that is often associated with Copernicus (see figure 3). CONCLUSIONS This project resulted in a digital representation of the physical book that is much more useful to researchers than the original, unenhanced digital object. This History of Science Collection holds not only the first edition of books important to the history of science, but the subsequent editions so that researchers can see how the ideas of science have changed over time. This new digital edition of De Revolutionibus allows researchers to see how another scientist made corrections in Copernicus’ book as one step in the change in theory over time and insight into the reaction of the Catholic Church. The format that CONTENTdm creates for the object and a clear naming scheme allow the user to view the pages with or without the marginalia, thus making this a useful object for many types of users (see figure 4). However, using Photoshop to highlight areas of a page allowed the digital initiatives department to understand the power of this tool. In understanding the utility and power of Photoshop, the digital initiatives department has determined it to be a useful tool in other projects. A project to eliminate some images of people’s fingers that inadvertently were photographed along with pages in a book or manuscript has been added to the queue. In future, digitized books or manuscripts with useful notes will undergo these enhancement processes. ADDING VALUE TO COLLECTIONS THROUGH DIGITAL ENHANCEMENT | VALENTINO 32 REFERENCES 1. Owen Gingerich, “The Master of the 1550 Radices: Jofrancus Offusius,” Journal for the History of Astronomy 11 (1993): 235–53, http://adsabs.harvard.edu/full/1993JHA....24..235G. 2. Richard S. Croft, “Fun and Games with Photoshop: using Image Editors to change Photographic Meaning” (In: Visual Literacy in the Digital Age: Selected Readings from the Annual Conference of the International Visual Literacy Association (Rochester, NY October 13-17, 1993)): 3-10. 3. Robert Hauptman, Documentation: A History and Critique of Attribution, Commentary, Glosses, Marginalia, Notes, Bibliographies, Works-Cited Lists, and Citation Indexing and Analysis (Jefferson, NC: McFarland, 2008). 4. Ibid. 5. Jennifer Howard, “Scholarship on the Edge,” Chronicle of Higher Education 52, no. 9 (2005). 6. Ibid. 7. Tatiana Nikolova-Houston,“Marginalia and Colophons in Bulgarian Manuscripts and Early Printed Books,” Journal of Religious & Theological Information 8, no. 1/2, (2009), http://www.tandfonline.com/doi/abs/10.1080/10477840903459586#preview. 8. Ibid. 9. Patricio Ferrari, “Fernando Pessoa as a Writing-Reader: Some Justifications for a Complete Digital Edition of his Marginalia,” Portuguese Studies 24, no. 2 (2008): 69–114, http://www.jstor.org/stable/41105307. 10. Dirk Johnson, “Book Lovers Fear Dim future for Notes in the Margins,” New York Times, February 20, 2011, http://www.nytimes.com/2011/02/21/books/21margin.html?_r=3&emc=tnt&tntemail1=y & 11. Ibid 12. Paige Morgan, “The Minute Particular in the Immensity of the Internet: What Coleridge, Hartley and Blake can teach us about Digital Editing,” Romanticism 15, no. 2 (2009), http://www.euppublishing.com/doi/abs/10.3366/E1354991X09000774. 13. Y. Li, E. Adelson, and A. Agarwala, “ScribbleBoost: Adding Classification to Edge-Aware Interpolation of Local Image and Video Adjustments,” Eurographics Symposium on Rendering27, no. 4 (2008), http://www.mit.edu/~yzli/eg08.pdf. 14. S. Michael Malinconico, “Digital Preservation Technologies and Hybrid Libraries,” Information Services & Use 22, no. 4 (2002): 159–74, http://iospress.metapress.com/content/gep1rx9rednylm2n. http://adsabs.harvard.edu/full/1993JHA....24..235G http://www.tandfonline.com/doi/abs/10.1080/10477840903459586%23preview http://www.jstor.org/stable/41105307 http://www.nytimes.com/2011/02/21/books/21margin.html?_r=3&emc=tnt&tntemail1=y& http://www.nytimes.com/2011/02/21/books/21margin.html?_r=3&emc=tnt&tntemail1=y& http://www.euppublishing.com/doi/abs/10.3366/E1354991X09000774 INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2014 33 15. Ibid. 16. Morgan, “Minute Particular.” 17. Gabriel F. Giralt, “Realism and Realistic Representation in the Digital Age,” Journal of Film & Video 62, no. 3 (2010): 3, http://muse.jhu.edu/journals/journal_of_film_and_video/v062/62.3.giralt.html. 18. Morgan, “Minute Particular.” 19. Morgan, “Minute Particular.” http://muse.jhu.edu/journals/journal_of_film_and_video/v062/62.3.giralt.html 4520 ---- Open Search Environments: The Free Alternative to Commercial Search Services. Adrian O’Riordan INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2014 45 ABSTRACT Open search systems present a free and less restricted alternative to commercial search services. This paper explores the space of open search technology, looking in particular at lightweight search protocols and the issue of interoperability. A description of current protocols and formats for engineering open search applications is presented. The suitability of these technologies and issues around their adoption and operation are discussed. This open search approach is especially useful in applications involving the harvesting of resources and information integration. Principal among the technological solutions are OpenSearch, SRU, and OAI-PMH. OpenSearch and SRU realize a federated model to enable content providers and search clients communicate. Applications that use OpenSearch and SRU are presented. Connections are made with other pertinent technologies such as open-source search software and linking and syndication protocols. The deployment of these freely licensed open standards in web and digital library applications is now a genuine alternative to commercial and proprietary systems. INTRODUCTION Web search has become a prominent part of the Internet experience for millions of users. Companies such as Google and Microsoft offer comprehensive search services to users free with advertisements and sponsored links, the only reminder that these are commercial enterprises. Businesses and developers on the other hand are restricted in how they can use these search services to add search capabilities to their own websites or for developing applications with a search feature. The closed nature of the leading web search technology places barriers in the way of developers who want to incorporate search functionality into applications. For example, Google’s programmatic search API is a RESTful method called Google Custom Search API that offers only 100 search queries per day for free.1 The limited usage restrictions of these APIs mean that organizations are now frequently looking elsewhere for the provision of search functionality. Free software libraries for information retrieval and search engines have been available for some time allowing developers to build their own search solutions. These libraries enable search and retrieval of document collections on the Web or offline. Web crawlers can harvest content from multiple sources. A problem is how to meet users’ expectations of search efficacy while not having Adrian O’Riordan (a.oriordan@cs.ucc.ie) is Lecturer, School of Computer Science and Information Technology, University College, Cork, Ireland. OPEN SEARCH ENVIRONMENTS: THE FREE ALTERNATIVE TO COMMERCIAL SEARCH SERVICES | O’RIORDAN 46 the resources of the large search providers. Reservations about the business case for free open search include that large-scale search is too resource-hungry and the operational costs are too high; but these suppositions have been challenged.2 Open search technology enables you to harvest resources and combine searchers in innovative ways outside the commercial search platforms. Further prospects for open search systems and open-source search lie in areas such as peer-to-peer, information extraction, and subject-specific technology.3 Many search systems unfortunately use their own formats and protocols for indexing, search, and results lists. This makes it difficult to extend, alter, or combine services. Distributed search is the main alternative to building a “single” giant index (on mirrored clusters) and searching at one site, a la Google Search and Microsoft Bing. Callan describes the distributed search model in an information retrieval context.4 In his model, information retrieval consists of four steps: discovering databases, ranking databases by their expected ability to satisfy the query, searching the most relevant databases, and merging results returned by the different databases. Distributed search has become a popular approach in the digital libraries field. Note that in the digital libraries literature distributed search is often called federated search.5 The federated model has clear advantages in some application areas. It is very hard for a single index to do justice to all the possible schemas in use on the web today. Other potential benefits can come from standardization. The leading search engine providers utilize their own proprietary technologies for crawling, indexing, and the presentation of results. The standardization of result lists would be useful for developers combining information from multiple sources or pipelining search to other functions. A common protocol for declaring and finding searchable information is another desirable feature. Standardized formats and metadata are key aspects of search interoperability, but the focus of this article is on protocols for exchanging, searching, and harvesting information. In particular, this article focuses on lightweight protocols, often REST (Representational state transfer)–style applications. Lightweight protocols place less onerous overheads on development in terms of adapting existing systems and additional metadata. They are also simpler. The alternative is heavyweight approaches to federated search, such as using web services or service-oriented architectures. There have been significant efforts at developing lightweight protocols for search, primary among which is the OpenSearch protocol developed by an Amazon subsidiary. Other protocols and services of relevance are SRU, MXG, and the OAI-OMH interoperability framework. We describe these technologies and give examples of their use. Technologies for the exchange and syndication of content are often used as part of the search process or in addition. We highlight key differences between protocols and give instances where technologies can be used in tandem. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2014 47 This paper is structured as follows. The next section describes the open search environment and the technologies contained therein. The following section describes open search protocols in detail, giving examples. Finally, summary conclusions are presented. AN OPEN SEARCH ENVIRONMENT A search environment (or ecosystem) consists of a software infrastructure and participants. The users of search services and the content providers and publishers are the main participants. The systems infrastructure consists of the websites and applications that both publish resources and present a search interface for the user, and the technologies that enable search. Technologies include publishing standards for archiving and syndicating content, the search engines and web crawlers, the search interface (and query languages), and the protocols or glue for interoperability. Baeza-Yates and Raghavan present their vision of next-generation web search highlighting how developments in the web environment and search technology are shaping the next generation search environment.6 Open-source libraries for the indexing and retrieval of document collections and the creation of search engines include the Lemur project (and the companion Indri search engine),7 Xapian,8 Sphinx,9 and Lucene (and the associated Nutch Web crawler).10 All of these systems support web information retrieval and common formats. From a developer perspective they are all cross- platform; Lemur/Indri, Xapian, and Sphinx are in C and C++ whereas Lucene/Nutch is in Java. Xapian has language bindings for other programming languages such as Python. The robustness and scalability of these libraries support large-scale deployment, for example the following large websites use Xapian: Citebase, Die Zeit (German newspaper), and Debian (Linux distribution).11 Middleton and Baeza-Yates present a more detailed comparison of open-source search engines.12 They compare twelve open-source search engines including Indri and Lucene mentioned above across thirteen dimensions. Features include license, storage, indexing, query preprocessing (stemming, stop-word removal), results format, and ranking. Apache Solr is a popular open-source search platform that additionally supports features such as database integration, real-time indexing, faceted search, and clustering.13 Solr uses the Lucene search library for the core information retrieval. Solr’s rich functionality and the provision of RESTful HTTP/XML and JSON APIs makes it an attractive option for open information integration projects. In a libraries context Singer cites Solr as an open-source alternative for next-generation OPAC replacements.14 Solr is employed, for example, in the large-scale Europeana project.15 The focus of much of this paper is on the lightweight open protocols and interoperability solutions that allow application developers to harvest and search content across a range of locations and formats. In contrast, the DelosDLMS digital library framework exemplifies a heavyweight approach to integation.16 In DelosDLMS, services are either loosely or tightly coupled in a service- oriented architecture using web service middleware. OPEN SEARCH ENVIRONMENTS: THE FREE ALTERNATIVE TO COMMERCIAL SEARCH SERVICES | O’RIORDAN 48 Particular issues for open search are the location and collection of metadata for searchable resources and the creation of applications offering search functionality. Principal among these technological solutions are the OpenSearch and SRU protocols. Both implement what Alipour- Hafezi et al. term a federated model in the context of search interoperability, wherein providers agree that their services will conform to certain standard specifications.17 The costs and adoption risk of this approach are low. Technologies such as OpenSearch occupy an abstraction layer above existing search infrastructure such as Solr. Interoperability Interoperability is an active area of work in both the search and digital library fields. Interoperability is “the ability of two or more systems or components to exchange information and to use the information that has been exchanged.”18 Interoperability in web search applies to resource harvesting, meta-search, and to allow search functions interact with other system elements. Interoperability in digital libraries is a well-established research agenda,19 and it has been described by Paepcke et al. as “one of the most important problems to solve [in DLs].”20 Issues that are common to both web search and digital-library search include metadata incompatibilities, protocol incompatibilities, and record duplication and near-duplication. In this paper, the focus is on search protocols; for a comprehensive survey of technology for semantic interoperability in digital library systems, see the DELOS report on same.21 A comprehensive survey of methods and technology for digital library interoperability is provided in a DL.org report.22 Formats and Metadata Free and open standard formats are extensive in web technology. Standard or de facto formats for archiving content include Plain text, Rich Text Format (RTF), HTML, PDF, and various XML formats. Document or resource identification is another area where there has been much agreement. Resource identification schemes need to be globally unique, persistent, efficient, and extensible. Popular schemes include URLs, Persistent URLs (PURLs), the Handle system (handle.net), and DOIs (Digital Object Identifiers). Linking technologies include OpenURL and COinS. OpenURL links sources to targets using a knowledge base of electronic resources such as digital libraries.23 ContextObjects in Spans (COinS), as used in Wikipedia for example, is another popular linking technology.24 Applications can use various formats for transporting and syndicating content. Syndication and transport technologies include XML, JSON, RSS/Atom, and heavyweight web service-based approaches. Much of the metadata employed in digital libraries is in XML formats, for example in MARCXML and the Metadata Encoding and Transmission Standard (METS). The World Wide Web Consortium defined RDF (Resource Description Framework) to provide among other goals “a mechanism for integrating multiple metadata schemes.”25 RDF records are defined in an XML namespace. In RDF, Subject-predicate-object expressions represent web INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2014 49 resources and typically identified by means of an URI (Universal Resource Identifier). JSON (JavaScript Object Notation) is a lightweight data-interchange format that has become popular in Web-based applications and is seeing increasing support in digital library systems.26 Harvesting Web content is harvested using software called web crawlers (or web spiders). A crawler is an instance of a software application that runs automated tasks on the web. Specifically the crawler follows web links to index websites. There was been little standardization in this area except the Robot Exclusion Standard and use of various XHTML meta-elements and HTTP header fields. There is a lot of variability in terms of the policy for the selection of content sources, policy for following links, URL normalization, politeness, depth of crawl, and revisit policy. Consequently, there are many web crawling systems in operation; open-source crawlers include DatapartSearch, Grub, Heritrix, and the aforementioned Nutch. Harvesting and syndication of metadata from open repositories is the goal of the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH), originally developed by Los Alamos National Laboratory, Cornel, and NASA in 2000 and 2001.27 Resource harvesting challenges include scale, keeping information up-to-date, robustness, and security. OAI-PMH has been adopted by many digital libraries, museums, archives, and publishers. The latest version, OAI-PMH 2.0, was released in 2012. OAI-PMH specifies a general application-independent model of network-accessible repository and client harvester that issues requests using HTTP (either GET or POST). Metadata is expressed as a record in XML format. An OAI-PHM implementation must support Dublin Core, with other vocabularies as additions. OAI-PMH is the key technology in the harvesting model of digital library interoperability described by Van de Sompel et al.28 An OAI-PMH-compliant system consists of harvester (client), repository (network accessible server), and items (constituents of a repository). Portal sites such as Europeana and OAIster use OAI-PMH to harvest from large numbers of collections.29 There are online registries of OAI-compliant repositories. The European Commission’s Europeana allow users to search across multiple image collections including the British Library and the Louvre online. Another portal site that uses OAI-PMH is CultureGrid, operated by the UK Collections Trust. CultureGrid provides access to hundreds of museum, galleries, libraries, and archives in the UK. The Apache Software Foundation has developed a module, mod_oai, for Apache Webservers that helps crawlers to discover content. Syndication and Exchange Here we outline lightweight options for syndication and information exchange. Heavyweight web services-based approaches are outside the scope of this article. Web syndication commonly uses RSS (Really Simple Syndication) or its main alternative, Atom. Atom is a proposed IETF standard.30 RSS 2.0 is the latest version in the RSS family of specifications, a simple yet highly extensible OPEN SEARCH ENVIRONMENTS: THE FREE ALTERNATIVE TO COMMERCIAL SEARCH SERVICES | O’RIORDAN 50 format where content items contain plain text or escaped HTML.31 Atom, developed to counter perceived deficiencies in RSS, has a richer content model than RSS and is more reusable in other XML vocabularies.32 Both RSS and Atom use HTTP for transport. RSS organizes information into channels and items, Atom into feeds and entries. Extension as modules allows RSS to carry multimedia payload (RSS enclosures) and geographical information (GeoRSS). Atom has an associated publishing protocol called AtomPub. Syndication middleware, which supports multiple formats, can serve as an intermediary in application architectures. Information and Content Exchange (ICE) is a protocol that aims to “automate the scheduled, reliable, secure redistribution of any content.”33 TwICE is a Java implementation of ICE. ICE automates the establishment of syndication relationships and handles data transfer and results formatting. This gives content providers more control over delivery, schedule, and reliability than simple web syndication without deploying a full-scale web services solution. The Open Archives Initiative—Object Reuse and Exchange (OAI-ORE) protocol provides standards for the description and exchange of aggregations of web resources.34 This specification standardizes how compound digital objects can combine distributed resources of multiple media types. ORE introduces the concepts of aggregation, resource map, and proxy resource. Resource providers or curators can express objects in RDF or Atom format and assign HTTP URIs for identification. ORE supports resource discovery so crawlers or harvesters can find these resource maps and aggregates. ORE can work in partnership with OAI-PMH. We outline some additional lightweight technologies for information exchange to conclude this section. OPML (Outline Processor Markup Language) is a format that represents lists of web feeds for services such as aggregators.35 It is a simple XML format. Feedsync and ROME support format- neutral feed formats that abstract from wire formats such as RSS 2.0 and Atom 1.0 for aggregator or syndication middleware. These technologies are described in the literature.36 LOCKSS (Lots of Copies Keep Stuff Safe) is a novel project that users a peer-to-peer network to preserve and provide access to web resources. For example, the MetaArchive Cooperative uses LOCKSS for digital preservation.37 Meta-search Meta-search is where multiple search services are combined. Such services have a very small share of the total search market owing to the dominance of the big players. MetaCrawler, developed in the 1990s, was one of the first meta-search engines and serves as a model for how such systems operate.38 A meta-search engine utilizes multiple search engines by sending a user request to multiple sources or engines aiming to improve recall in the process. A key issue with meta-search is how to weight search engines and how to integrate sets of results into a single results list. Figure 1 shows a general model of meta-search where the meta-search service chooses which search engines and content providers to employ. Active meta-search engines on the web include dogpile, Yippy, ixquick, and info.com. Note that these types of website appear, change INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2014 51 names, and disappear frequently. Currently meta-search services use various implementation methods such as proprietary protocols and screen scraping. Figure 1. General model of meta-search. Metasearch XML Gateway (MXG) is a meta-search protocol developed by the NISO MetaSearch Initiative, a consortium of meta-search developers and interested parties.39 MXG is a message and response protocol, which enables meta-search service providers and content providers communicate. A goal of the design of MXG was that content providers should not have to expend substantial development resources. MXG, based on SRU, specifies both the query and search results formats. Combining results, aggregation and presentation are not part of the protocol and handled by the meta-search service. The standard defines three levels of compliance allowing varying degrees of commitment and interoperability. SEARCH PROTOCOLS We describe OpenSearch and SRU, along with applications, in the following subsections. After that, we detail related technologies. OpenSearch OpenSearch is a protocol that defines simple formats to help search engines and content providers communicate. It was developed by A9, a subsidiary of Amazon, in 2005.40 It defines common formats for describing a search service, query results, and operation control. It does not specify content formats for documents or queries. The current specification, version 1.1, is available with a Creative Commons license. It is an extensible specification with extensions published on the website. Both free open systems and proprietary systems use OpenSearch. In particular, many open-source search engines and content management systems support OpenSearch, including YaCy, Drupal and Plone CMS. OpenSearch consists of a description file for search source and a response format for query results. Descriptors include elements Url, Query, SyndicationRight, and Language. Resource identification can be by URLs, DOIs, or a linking technology such as OpenURL. Responses describe a list of results and can be in RSS, Atom, or HTML formats. Additionally there is an auto-discovery feature to signal that a HTML page is searchable, implemented using a HTML 4.0 element. Search Engine Content Provider Search Engine Content Provider Choose Combine query combined results OPEN SEARCH ENVIRONMENTS: THE FREE ALTERNATIVE TO COMMERCIAL SEARCH SERVICES | O’RIORDAN 52 OpenSearch makes very few assumptions about the types of sources, the type of query or the search operation. It is ideal for combining content from multiple disparate sources, which may be data from repositories, webpages, or syndicated feeds. For illustrative purposes, listing 1 gives an example OpenSearch description for harvesting book information from an example digital library called DigLib. The root node includes an XML namespace attribute, which gives the URL for the standard version. The url element specifies the content type (a MIME type), the query (book in this case), and the index offset where to begin. The rel attribute states that the result is a collection of resources. DigLib Harvests book items en-us UTF-8 Listing 1. XML OpenSearch Description Record. Next, we describe some deployed applications that use OpenSearch. OJAX uses qeb technologies such as Ajax (Asynchronous Javascript) to provide a federated search service for OAI-PMH compatible open repositories.41, 42 OJAX also supports the Discovery feature of OpenSearch, as described in the OpenSearch 1.1 specification, for auto-detecting that a repository is searchable. Stored searches are in Atom format. Open-source meta-search engines can combine the results of OpenSearch enabled search engines.43 A system built as a proof-of-concept uses four search sources: A9.com, yaCy, mozDex, and alpha. A user can issue a text query (word or phrase) with Boolean operators and several modifiers. Users can prefer or bias particular engines by setting weights. The system ranks results, combined using a voting algorithm and implemented using the Lucene library. OpenSearch can be employed to specify the search sources and as a common format when results are combined. As LeVan points out “the job of the meta-search engine is made much simpler if the local search engine supports a standard search interface.”44 LeVan also mentions MXG in this context. Nguyen et al. describe an application where over one hundred search engines are used in experiments in federated search.45 The search sources were mostly OpenSearch-compliant search engines. An additional tool scrapes results from noncompliant systems. InterSynd uses OpenSearch to help provide a common protocol for harvesting web feeds. InterSynd is a syndication system that harvests, stores, and provides feed recommendations.36 It uses Java.net’s ROME (RSS and atOM utilitiEs) library to represent feeds in a format-neutral way. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2014 53 InterSynd is syndication middleware that allows sources to post and services to fetch information in all major syndication formats (see figure 2). Its feed-discovery module, Disco, uses the Nutch crawler and the OpenSearch protocol to harvest feeds. Nutch is an open-source library for building search engines that supports OpenSearch. Nutch builds on the Lucene information retrieval library, adding web-specifics, such as a crawler, a link-graph database, and parsers for HTML. Figure 2. OpenSeach in InterSynd. OpenSearch 1.1 allows returned results in either RSS 2.0 or Atom 1.0 format or an OpenSearch format, the “bare minimum of additional functionality required to provide search results over RSS channels” (quoted from A9 Website). Listing 2 below shows a Disco results list in RSS 2.0 format. OpenSearch fields appear in the channel description. The Nutch fields appear within each item (not shown). An OpenSearch namespace is specified in the opening XML element. The following additional OpenSearch elements appear in the example: totalResults, itemsPerPage and startIndex. Nutch: metasearch Nutch search results for query: metasearch http://localhost/nutch-1.6- dev/opensearch?query=metasearch&start=0&hitsPerSite=2&hitsPerPage=10 282 0 10 metasearch cut... more items cut... Listing 2. Results Produced using Nutch with OpenSearch. OPEN SEARCH ENVIRONMENTS: THE FREE ALTERNATIVE TO COMMERCIAL SEARCH SERVICES | O’RIORDAN 54 We mention one more application of OpenSearch here. A series of NASA projects to develop a set of interoperable standards for sharing information employs various open technologies for sharing and disseminating datasets including OpenSearch for its discovery capability.46 Discovery of document and data collections is by keyword search, using the OpenSearch protocol. There are various extensions to OpenSearch. For example, an extension to handle SRU allows SRU (Search and Retrieval via URL) queries within OpenSearch contexts. Other proposed extensions include support for mobility, e-commerce, and geo-location. SRU (Search/Retrieval via URL) A technology with some similarities to OpenSearch but more comprehensive is SRU (Search/Retrieval via URL).47 SRU is an open RESTful technology for web search. The current version is SRU 2.0, standardized by the Organization for the Advancement of Structured Information Standards (OASIS) as searchRetrieve. Version 1.0. SRU was developed to provide functionality similar to the widely deployed Z39.50 standard for library information retrieval updated for the web age.48 SRU addresses aspect of search and retrieval by defining models: a data model, a query model, and processing model, a result set model, a diagnostics model and a description-and-discovery model. SRU is extensible and can support various underlying low-level technologies. Both Lucene and DSpace implementations are available. The OCLC implementation of SRU supports both RSS and Atom feed formats and the Atom Publishing Protocol. SRU uses HTTP as the application transport and XML formats for messages. Requests can be in the form of either GET or POST HTTP methods. SRU supports a high-level query language called Contextual Query Language (CQL). CQL is a human-readable query language consisting of search clauses. SRU operation involves three parts: Explain, Search/Retrieve and Scan. Explain is a way to publish resource descriptions. Search/Retrieve entails the sending of requests (formulated in CQL) and the receiving of responses over HTTP. The optional SRU Scan enables software to query the index. The result list is in XML Schema format. The meta-search service MXG uses SRU, but relaxes the requirement to use CQL.39 SRW (Search/Retrieve Web Service) is a web services implementation of SRU that uses SOAP as the transfer mechanism. Hammond combines OpenSearch with SRU technology in an application for nature publishers.49 He also points out the main differences between the protocols such as SRU’s use of a query specification language and differences in the results records. As well as supporting OpenSearch data formats (RSS and Atom), the nature application also supports JSON (Javascript Object Notation). OpenSearch is used for formatting the result sets whereas SRU/CQL is used for querying. This search application launched as a public service in 2009. Listing 3 below is an example from the nature application showing CQL search queries ( tags) used in an OpenSearch description document. Note how both the SRU queryType and the INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2014 55 OpenSearch searchTerms attributes appear in the query. Further details on how to use SRU and OpenSearch together are on the OpenSearch Website. nature.com OpenSearch interface for nature.com The nature.com OpenSearch service nature.com opensearch sru Listing 3. Example using SRU and OpenSearch. Other Technologies Here we more briefly survey some additional technologies of relevance to open-search interoperability. XML-based approaches to information integration, such as the use of XQuery, are an option but do not present a loose integration. Chudnov et al. describes a simple API for a copy function for web applications to enable syndication, searching, and linking of web resources.50 Called unAPI, it requires small changes for publishers to add the functionality to web resources such as repositories and feeds. Developers can layer unAPI over SRU, OpenSearch, or OpenURL.51 Announced in 2008, Yahoo!’s SearchMonkey technology, also called Yahoo!’s Open Search Platform, allowed publishers to add structured metadata to Yahoo! Search results. SearchMonkey divided the problem into two parts: metadata extraction and result presentation. In is not clear how much of this technology survived Yahoo! and Microsoft’s new Search Alliance, signed in 2010.52 Mika described a search interface technology called Microsearch that is similar in nature.53 In Microsearch, semantic fields are added a search and search result presentation enriched with OPEN SEARCH ENVIRONMENTS: THE FREE ALTERNATIVE TO COMMERCIAL SEARCH SERVICES | O’RIORDAN 56 metadata extracted from retrieved content. Govaerts et al. described a federated search and recommender system that operates as a browser add-on. The system is OpenSearch-compliant and all results are in the Atom format.54 The Corporation for National Research Initiatives (CNRI) Digital Object Architecture (DOA) provides a framework for managing digital objects in a networked environment. It consists of three parts: a digital object repository, a resolution mechanism (Handle system), and a digital object registry. The Repository Access Protocol (RAP) proves a means of networked access to digital objects, which supports authentication and encryption.55 SUMMARY AND CONCLUSIONS A rich set of formats and protocols and working implementations show that open search technology is an alternative to the dominant commercial search services. In particular, we discussed the lightweight OpenSearch and SRU protocols as suitable glue to create loosely coupled search-based applications. These can complement other developments in resource discovery and description, open repositories, and open-source information retrieval. The flexibility and extensibility offers exciting opportunities to develop new applications and new types of applications. The successful deployment of open search technology shows that this technology has matured to support many uses. A fruitful area of further development would be to make working with these standards easier for developers and even accessible to the nonprogrammer. REFERENCES 1. Google Custom Search API, https://developers.google.com/custom-search/v1/overview. 2. Mike Cafarella and Doug Cutting, “Building Nutch: Open Source Search: A Case Study in Writing an Open Source Search Engine,” ACM Queue 2, no. 2 (2004), http://0- dl.acm.org.library.ucc.ie/citation.cfm?doid=988392.988408. 3. Wray Buntine et al., “Opportunities from Open Source Search,” in Proceedings, the 2005 IEEE/WIC/ACM International Conference on Web Intelligence, 2–8 (2005), http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1517807. 4. Jamie Callan, “Distributed Information Retrieval,” Advances in Information Retrieval 5 (2000): 127–50. 5. Péter Jacsó, “Internet Insights—Thoughts About Federated Searching,” Information Today 21, no. 9 (2004): 17–27. 6. Ricardo Baeza and Prabhakar Raghavan, “Next Generation Web Search,” in Search Computing (Berlin Heidelberg: Springer, 2010): 11–23, http://link.springer.com/chapter/10.1007/978- 3-642-12310-8_2. https://developers.google.com/custom-search/v1/overview http://0-dl.acm.org.library.ucc.ie/citation.cfm?doid=988392.988408 http://0-dl.acm.org.library.ucc.ie/citation.cfm?doid=988392.988408 http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1517807 http://link.springer.com/chapter/10.1007/978-3-642-12310-8_2 http://link.springer.com/chapter/10.1007/978-3-642-12310-8_2 INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2014 57 7. Trevor Strohman et al., “Indri: A Language Model-Based Search Engine for Complex Queries,” in Proceedings of the International Conference on Intelligent Analysis 2, no. 6, (2005): 2–6. 8. Xapian project website, http://xapian.org/. 9. Andrew Aksyonoff, Introduction to Search with Sphinx: From Installation to Relevance Tuning (Sebastopol, CA: O’Reilly, 2011). 10. Rohit Khare, “Nutch: A Flexible and Scalable Open-Source Web Search Engine,” Oregon State University, 2004, p. 32, http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.105.5978 11. “Xapian Users,” http://xapian.org/users. 12. Christian Middleton and Ricardo Baeza-Yates, “A Comparison of Open Source Search Engines,” 2007, http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.119.6955. 13. Apache Solr, http://lucene.apache.org/solr/. 14. Ross Singer, “In Search of a Really ‘Next Generation’ Catalog,” Journal of Electronic Resources Librarianship 20, no. 3 (2008): 139–42, http://www.tandfonline.com/doi/pdf/10.1080/19411260802412752. 15. Europeana portal, http://www.europeana.eu/portal/. 16. Maristella Agosti et al., DelosDLMS—The Integrated DELOS Digital Library Management System Berlin Heidelberg: Springer, 2007). 17. Mehdi Alipour-Hafezi et al., “Interoperability Models in Digital Libraries: An Overview,” Electronic Library 28, no. 3 (2010): 438–52, http://www.emeraldinsight.com/journals.htm?articleid=1864156. 18. Institute of Electrical and Electronics Engineers, IEEE Standard Computer Dictionary: A Compilation of IEEE Standard Computer Glossaries (New York: IEEE, 1990). 19. Clifford Lynch and Hector García-Molina, “Interoperability, Scaling, and the Digital Libraries Research Agenda,” in IITA Digital Libraries Workshop, 1995. 20. Andreas Paepcke et al., “Interoperability for Digital Libraries Worldwide,” Communications of the ACM 41, no. 4 (1998): 33–42. 21. Manjula, Patel et al., “"Semantic Interoperability in Digital Library Systems,” 2005, http://delos-wp5.ukoln.ac.uk/project-outcomes/SI-in-DLs/SI-in-DLs.pdf. 22. Georgios Athanasopoulos et al., “Digital Library Technology and Methodology Cookbook,” Deliverable D3.4, 2011, http://www.dlorg.eu/index.php/outcomes/dl-org-cookbook. http://xapian.org/ http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.105.5978 http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.119.6955 http://lucene.apache.org/solr/ http://www.tandfonline.com/doi/pdf/10.1080/19411260802412752 http://www.europeana.eu/portal/ http://www.emeraldinsight.com/journals.htm?articleid=1864156 http://delos-wp5.ukoln.ac.uk/project-outcomes/SI-in-DLs/SI-in-DLs.pdf http://www.dlorg.eu/index.php/outcomes/dl-org-cookbook OPEN SEARCH ENVIRONMENTS: THE FREE ALTERNATIVE TO COMMERCIAL SEARCH SERVICES | O’RIORDAN 58 23. Herbert Van de Sompel and Oren Beit-Arie, “Open Linking in the Scholarly Information Environment using the OpenURL Framework,” New Review of Information Networking 7, no. 1 (2001): 59–76, http://www.tandfonline.com/doi/abs/10.1080/13614570109516969. 24. Daniel Chudnov, “COinS for the Link Trail,” Library Journal, 131 (2006): 8-10.25. Lois Mai Chan and Marcia Lei Zeng, “Metadata Interoperability and Standardization—A Study of Methodology, Part II,” D-Lib Magazine 12, no. 6 (2006), http://www.dlib.org/dlib/june06/zeng/06zeng.html. 26. JSON (JavaScript Object Notation), http://www.json.org/. 27. The Open Archives Initiative Protocol for Metadata Harvesting, http://www.openarchives.org/OAI/openarchivesprotocol.html. 28. Herbert Van De Sompel et al., “The UPS Prototype: An Experimental End-User Service across E-print Archives,” D-Lib Magazine 6, no. 2 (2000), http://www.dlib.org/dlib/february00/vandesompel-ups/02vandesompel-ups.html. 29. OAIster, http://oaister.worldcat.org/. 30. Mark Nottingham, ed., “The Atom Syndication Format. RfC 4287,” memorandum, IETF Network Working Group, 2005, http://www.ietf.org/rfc/rfc4287. 31. RSS 2.0 Specification, Berkman Center for Internet & Society at Harvard Law School, July 15, 2003, http://cyber.law.harvard.edu/rss/rss.html. 32. “RSS 2.0 And Atom 1.0 Compared,” http://www.intertwingly.net/wiki/pie/Rss20AndAtom10Compared 33. Jay Brodsky et al., eds., “The Information and Content Exchange (ICE) protocol,” Working Draft, Version 2.0, 2003, http://xml.coverpages.org/ICEv20-WorkingDraft.pdf. 34. Open Archives Initiative Object Reuse and Exchange, http://www.openarchives.org/ore/. 35. OPML (Outline Processor Markup Language), http://dev.opml.org/. 36. Adrian P. O’Riordan, and M. Oliver O’Mahoney, “Engineering an Open Web Syndication Interchange with Discovery and Recommender Capabilities,” Journal of Digital Information, 12, no. 1 (2011), http://journals.tdl.org/jodi/index.php/jodi/article/viewArticle/962. 37. Vicky Reich and David S. H. Rosenthal, “LOCKSS: A Permanent Web Publishing and Access System,” D-Lib Magazine 7, no. 6 (2001): 14, http://mirror.dlib.org/dlib/june01/reich/06reich.html. http://www.tandfonline.com/doi/abs/10.1080/13614570109516969 http://www.dlib.org/dlib/june06/zeng/06zeng.html http://www.json.org/ http://www.openarchives.org/OAI/openarchivesprotocol.html http://www.dlib.org/dlib/february00/vandesompel-ups/02vandesompel-ups.html http://oaister.worldcat.org/ http://www.ietf.org/rfc/rfc4287 http://cyber.law.harvard.edu/rss/rss.html http://www.intertwingly.net/wiki/pie/Rss20AndAtom10Compared http://xml.coverpages.org/ICEv20-WorkingDraft.pdf http://www.openarchives.org/ore/ http://dev.opml.org/ http://journals.tdl.org/jodi/index.php/jodi/article/viewArticle/962 http://mirror.dlib.org/dlib/june01/reich/06reich.html INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2014 59 38. Erik Selberg and Oren Etzioni, “Multi-service Search and Comparison Using the MetaCrawler,” in Proceedings of the Fourth Int'l WWW Conference, Boston, 1995. [pub info?] 39. NISO Metasearch Initiative, Metasearch XML Gateway Implementers Guide, Version 1.0, NISO RP-2006-02, 2006, http://www.niso.org/publications/rp/RP-2006-02.pdf. 40. DeWitt Clinton, “OpenSearch 1.1 Specification, draft 5,” http://opensearch.org/Specifications/OpenSearch/1.1. 41. Judith Wusteman, “OJAX: A Case Study in Agile Web 2.0 Open Source Development,” in Aslib Proceedings 61, no. 3 (2009): 212–31, http://dx.doi.org/10.1108/00012530910959781. 42. Judith Wusteman and Padraig O’hlceadha, “Using Ajax to Empower Dynamic Searching,” Information Technology & Libraries 25, no. 2 (2013): 57–64, http://0- www.ala.org.sapl.sat.lib.tx.us/lita/ital/sites/ala.org.lita.ital/f iles/content/25/2/wusteman.pd f. 43. Adrian P. O–Riordan, “Open Meta-Search with OpenSearch: A Case Study,” technical report hosted at cora.ucc.ie repository, 2007, http://dx.doi.org/10468/982. 44. Ralph LeVan, “OpenSearch and SRU: A Continuum of Searching,” Information Technology & Libraries 25, no. 3 (2013): 151–53, https://napoleon.bc.edu/ojs/index.php/ital/article/view/3346. 45. Dong Nguyen et al., “Federated Search in the Wild: The Combined Power of Over a Hundred Search Engines,” in Proceedings of the 21st ACM International Conference on Information and Knowledge Management (Maui, Hawaii): ACM Press, 2012): 1874–78, http://dl.acm.org/citation.cfm?id=2398535. 46. B. D. Wilson et al., “Interoperability Using Lightweight Metadata Standards: Service & Data Casting, OpenSearch, OPM Provenance, and Shared SciFlo Workflows,” in AGU Fall Meeting Abstracts 1 (2011): 1593, http://adsabs.harvard.edu/abs/2011AGUFMIN51C1593W. 47. Library of Congress, “SRU—Search/Retrieve via URL,” www.loc.gov/standards/sru. 48. The Library of Congress Network Development and MARC Standards Office, “Z39.50 Maintenance Agency Page,” www.loc.gov/z3950/agency. 49. Tony Hammond, “nature.com OpenSearch: A Case Study in OpenSearch and SRU Integration,” D-Lib Magazine 16, no. 7/8, (2010), http://mirror.dlib.org/dlib/july10/hammond/07hammond.print.html. 50. Daniel Chudnov et al., “Introducing unapi,” 2006, http://ir.library.oregonstate.edu/xmlui/handle/1957/2359. http://www.niso.org/publications/rp/RP-2006-02.pdf http://opensearch.org/Specifications/OpenSearch/1.1 http://dx.doi.org/10.1108/00012530910959781 http://0-www.ala.org.sapl.sat.lib.tx.us/lita/ital/sites/ala.org.lita.ital/files/content/25/2/wusteman.pdf http://0-www.ala.org.sapl.sat.lib.tx.us/lita/ital/sites/ala.org.lita.ital/files/content/25/2/wusteman.pdf http://0-www.ala.org.sapl.sat.lib.tx.us/lita/ital/sites/ala.org.lita.ital/files/content/25/2/wusteman.pdf http://dx.doi.org/10468/982 https://napoleon.bc.edu/ojs/index.php/ital/article/view/3346 http://dl.acm.org/citation.cfm?id=2398535 http://adsabs.harvard.edu/abs/2011AGUFMIN51C1593W http://www.loc.gov/standards/sru http://www.loc.gov/z3950/agency http://mirror.dlib.org/dlib/july10/hammond/07hammond.print.html http://ir.library.oregonstate.edu/xmlui/handle/1957/2359 OPEN SEARCH ENVIRONMENTS: THE FREE ALTERNATIVE TO COMMERCIAL SEARCH SERVICES | O’RIORDAN 60 51. Daniel Chudnov and Deborah England, “A New Approach to Library Service Discovery and Resource Delivery,” Serials Librarian 54, no. 1–2 (2008): 63–69, http://www.tandfonline.com/doi/abs/10.1080/03615260801973448. 52. “News About Our SearchMonkey Program,” Yahoo! Search Blog, 2010, http://www.ysearchblog.com/2010/08/17/news-about-our-searchmonkey-program/. 53. Peter Mika, “Microsearch: An Interface for Semantic Search,” in Semantic Search, International Workshop located at the 5th European Semamntic Web Conference (ESWC 2008) 334 (2008): 79–88, http://CEUR-WS.org/Vol-334/. 54. Sten Govaerts et al., “A Federated Search and Social Recommendation Widget,” in Proceedings of the 2nd International Workshop on Social Recommender Systems ([pub info?], 2011): 1–8. 55. S. [first name?]Reilly, “Digital Object Protocol Specification, Version 1.0,” November 12, 2009, http://dorepository.org/documentation/Protocol_Specification.pdf. http://www.tandfonline.com/doi/abs/10.1080/03615260801973448 http://www.ysearchblog.com/2010/08/17/news-about-our-searchmonkey-program/ http://ceur-ws.org/Vol-334/ http://dorepository.org/documentation/Protocol_Specification.pdf 4632 ---- Content Management Systems: Trends in Academic Libraries Ruth Sara Connell INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2013 42 ABSTRACT Academic libraries, and their parent institutions, are increasingly using Content Management Systems (CMSs) for website management. In this study, the author surveyed academic library web managers from four-year institutions to discover whether they had adopted CMSs, which tools they were using, and their satisfaction with their website management system. Other issues, such as institutional control over library website management, were raised. The survey results showed that CMS satisfaction levels vary by tool and that many libraries do not have input into the selection of their CMS because the determination is made at an institutional level. These findings will be helpful for decision makers involved in the selection of CMSs for academic libraries. INTRODUCTION As library websites have evolved over the years, so has their role and complexity. In the beginning, the purpose of most library websites was to convey basic information, such as hours and policies, to library users. As time passed, more and more library products and services became available online, increasing the size and complexity of library websites. Many academic library web designers found that their web authoring tools were no longer adequate for their needs and turned to CMSs to help them manage and maintain their sites. For other web designers, the choice was not theirs to make. Their institution transitioned to a CMS and required the academic library to follow suit, regardless of whether the library staff had a say in the selection of the CMS or its suitability for the library environment. The purpose of this study was to examine CMS usage within the academic library market and to provide librarians quantitative and qualitative knowledge to help make decisions when considering switching to, or between, CMSs. In particular, the objectives of this study were to determine (1) the level of saturation of CMSs in the academic library community; (2) the most popular CMSs within academic libraries, the reasons for the selection of those systems, and satisfaction with those CMSs; (3) if there is a relationship between libraries with their own dedicated Information Technology (IT) staff and those with open source (OS) systems; and (4) if there is a relationship between institutional characteristics and issues surrounding CMS selection. Ruth Sara Connell (ruth.connell@valpo.edu) is Associate Professor of Library Services and Electronic Services Librarian, Christopher Center Library Services, Valparaiso University, Valparaiso, IN. mailto:ruth.connell@valpo.edu CONTENT MANAGEMENT SYSTEMS: TRENDS IN ACADEMIC LIBRARIES | CONNELL 43 Although this study largely focuses on CMS adoption and related issues, the library web designers who responded to the survey were asked to identify what method of web management they use if they do not use a CMS and asked about satisfaction with their current system. Thus, information regarding CMS alternatives (such as Adobe’s Dreamweaver web content editing software) is also included in the results. As will be discussed in the literature review, CMSs have been broadly defined in the past. Therefore, for this study participants were informed that only CMSs used to manage their primary public website were of interest. Specifically, CMSs were defined as website management tools through which the appearance and formatting is managed separately from content, so that authors can easily add content regardless of web authoring skills. LITERATURE REVIEW Most of the library literature regarding CMS adoption consists of individual case studies describing selection and implementation at specific institutions. There are very few comprehensive surveys of library websites or the personnel in charge of academic library websites to determine trends in CMS usage. The published studies including CMS usage within academic libraries do not definitively answer whether overall adoption has increased. In 2005 several Georgia State University librarians surveyed web librarians at sixty-three of their peer institutions, and of the sixteen responses, six (or 38 percent) reported use of “CMS technology to run parts of their web site.” 1 A 2006 study of web managers from wide range of institutions (Associates to Research) indicated a 26 percent (twenty-four of ninety-four) CMS adoption rate.2 A more recent 2008 study of institutions of varying sizes resulted in a little more than half of respondents indicating use of CMSs, although the authors note that “people defined CMSs very broadly,” 3 including tools like Moodle and CONTENTdm, and some of those libraries indicated they did not use the CMS to manage their website. A 2012 study by Comeaux and Schmetzke differs from the others mentioned here in that they reviewed academic library websites of the fifty-six campuses offering ALA-accredited graduate degrees (generally larger universities) and used tools and examined page code to try to determine on their own if the libraries used CMSs, as opposed to polling librarians at those institutions to ask them to self-identify if they used CMSs. They identified nineteen out of fifty-six (34 percent) sites using CMSs. The authors offer this caveat, “It is very possible that more sites use CMSs than could be readily identified. This is particularly true for ‘home-grown’ systems, which are unlikely to leave any readily discernible source code.” 4 Because of different methodologies and population groups studied in these studies, it is not possible to draw conclusions regarding CMS adoption rates within academic libraries over time using these results. As mentioned previously, some people define CMSs more broadly than others. One example of a product that can be used as a CMS, but is not necessarily a CMS, is Springshare’s LibGuides. Many libraries use LibGuides as a component of their website to create guides. However, some libraries have utilized the product to develop their whole site, in effect using it as a CMS. A case study by INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2013 44 two librarians at York College describes why they chose LibGuides as their CMS instead of as a more limited guide creation tool.5 Several themes recurred throughout many of the case study articles. One common theme was the issue of lack of control and problems of collaboration between academic libraries and the campus entities controlling website management. Amy York, the web services librarian at Middle Tennessee State University, described the decision to transition to a CMS in this way, “And while it was feasible for us to remain outside of the campus CMS and yet conform to the campus template, the head of the IT web unit was quite adamant that we move into the CMS.” 6 In a study by Bundza et al., several participants who indicated dissatisfaction with website maintenance mentioned “authority and decision-making issues” as well as “turf struggles.” 7 Other articles expressed more positive collaborative experiences. Morehead State University librarians Kmetz and Bailey noted, “When attending conferences and hearing the stories of other libraries, it became apparent that a typical relationship between librarians and a campus IT staff is often much less communicative and much less positive than [ours]. Because of the relatively smooth collaborative spirit, a librarian was invited in 2003 to participate in the selection of a CMS system.” 8 Kimberley Stephenson also emphasized the advantageous relationships that can develop when a positive approach is used, “Rather than simply complaining that staff from other departments do not understand library needs, librarians should respectfully acknowledge that campus Web developers want to create a site that attracts users and consider how an attractive site that reflects the university’s brand can be beneficial in promoting library resources and services.” 9 However, earlier in the article she does acknowledge that the iterative and collaborative process between the library and their University Relations (UR) department was occasionally contentious and that the web services librarian notifies UR staff before making changes to the library homepage.10 Another common theme in the literature was the reasoning behind transitioning to a CMS. One commonly cited criterion was access control or workflow management, which allows site administrators to assign contributors editorial control over different sections of the site or approve changes before publishing.11 However, although this feature is considered a requirement by many libraries, it has its detractors. Kmetz and Bailey indicated that at Morehead State University, “approval chains have been viewed as somewhat stifling and potentially draconian, so they have not been activated.” 12 These studies greatly informed the questions used and development of the survey instrument for this study. METHOD In designing the survey instrument, questions were considered based on how they informed the objectives of the study. To simplify analysis, it was important to compile as comprehensive a list of CONTENT MANAGEMENT SYSTEMS: TRENDS IN ACADEMIC LIBRARIES | CONNELL 45 CMSs as possible. This list was created by pulling CMS names from the literature review, the Web4Lib discussion list, and the CMSmatrix website (www.cmsmatrix.org). In order to select institutions for distribution, the 2010 Carnegie Classification of Institutions of Higher Education basic classification lists were used.13 The author chose to focus on three broad classifications: 1. Research institutions consisting of the following Carnegie basic classifications: Research Universities (very high research activity), Research Universities (high research activity), and DRU: Doctoral/Research Universities. 2. Master’s institutions consisting of the following Carnegie basic classifications: Master's Colleges and Universities (larger programs), Master's Colleges and Universities (medium programs), Master's Colleges and Universities (smaller programs). 3. Baccalaureate institutions consisting of the following Carnegie basic classifications: Baccalaureate Colleges—Arts & Sciences and Baccalaureate Colleges—Diverse Fields. The basic classification lists were downloaded into Excel with each of the three categories in a different worksheet, and then each institution was assigned a number using the random number generator feature within Excel. The institutions were then sorted by those numbers creating a randomly ordered list within each classification. To determine sample size for a stratified random sampling, Ronald Powell’s “Table for Determining Sample Size from a given population” 14 (with a .05 degree of accuracy) was used. Each classification’s population was considered separately, and the appropriate sample size chosen from the table. The population size of each of the groups (total number of institutions within that Carnegie classification) and the corresponding sample sizes were • research: population = 297, sample size = 165; • master’s: population = 727, sample size = 248; • baccalaureate: population = 662, sample size = 242. The total number of institutions included in the sample size was 655. The author then went through the list of selected institutions and searched online to find their library webpages and find the person most likely responsible for the library’s website. During this process, there were some institutions, mostly for-profits, for which a library website could not be found. When this occurred, that institution was eliminated and the next institution on the list used in its place. In some cases, the person responsible for web content was not easily identifiable; in these cases an educated guess was made when possible, or else the director or a general library email address was used. The survey was made available online and distributed via e-mail to the 655 recipients on October 1, 2012. Reminders were sent on October 10 and October 18, and the survey was closed on October 26, 2012. Out of 655 recipients, 286 responses were received. Some of those responses http://www.cmsmatrix.org/ INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2013 46 had to be eliminated for various reasons. If two responses were received from one institution, the more complete response was used while the other response was discarded. Some responses included only an answer to the first question (name of institution or declination of that question to answer demographic questions) and no other responses; these were also eliminated. Once the invalid responses were removed, 265 remained, for a 40 percent response rate. Before conducting an analysis of the data, some cleanup and standardization of results was required. For example, a handful of respondents indicated they used a CMS and then indicated that their CMS was Dreamweaver or Adobe Contribute. These responses were recoded as non-CMS responses. Likewise, one respondent self-identified as a non-CMS user but then listed Drupal as his/her web management tool and this was recoded as a CMS response. Demographic Profile of Respondents For the purposes of gathering demographic data, respondents were offered two options. They could provide their institution’s name, which would be used solely to pair their responses with the appropriate Carnegie demographic categories (not to identify them or their institution), or they could choose to answer a separate set of questions regarding their size, public/private affiliation, and basic Carnegie classification. The basic Carnegie classification of the largest response group was master’s with 102 responses (38 percent); then baccalaureate institutions (94 responses or 35 percent), and then Research institutions (69 responses or 26 percent). This correlates pretty closely with the distribution percentages, which were 38 percent master’s (248 out of 655), 37 percent baccalaureate (242 out of 655), and 25percent research (165 out of 655). Of the 265 responses, 95 (36 percent) came from academic librarians representing public institutions and 170 (64 percent) from private. Of the private institutions, the vast majority (166 responses or 98 percent) were not-for-profit, while 4 (2 percent) were for-profits. To define size, the Carnegie size and setting classification was used. Very small institutions are defined as less than 1,000 full-time equivalent (FTE) enrollment, small is 1,000–2,999 FTE, medium is 3,000–9,999 FTE, and large is at least 10,000 FTE. The largest group of responses came from small institutions (105 responses or 40 percent), then medium (67 responses or 25 percent), large (60 responses or 23 percent), and very small (33 responses or 12 percent). RESULTS The first question asking for institutional identification (or alternative routing to Carnegie classification questions) was the only question for which an answer was required. In addition, because of question logic, some people saw questions that others did not based on how they answered previous questions. Thus, the number of responses varies for each question. One of the objectives of this study was to identify if there were traits among institutional characteristics and CMS selection and management. The results that follow include both CONTENT MANAGEMENT SYSTEMS: TRENDS IN ACADEMIC LIBRARIES | CONNELL 47 descriptive statistics and statistically significant inferential statistics discovered using Chi-square and Fisher’s exact tests. Statistically significant results are labeled as such. The responses to this survey show that most academic libraries are using a CMS to manage their main library website (169out of 265 responses or 64 percent). Overall, CMS users expressed similar (although slightly greater) satisfaction levels with their method of web management (see table 1.) Table 1 Satisfaction by CMS Use Use a CMS to manage library website Yes No User is highly satisfied or satisfied Yes 79 responses or 54% 41 responses or 47% No 68 responses or 46% 46 responses or 53% Total 147 responses or 100% 87 responses or 100% Non-CMS Users Non-CMS users were asked what software or system they use to govern their site. By far, the most popular system mentioned among the 82 responses was Adobe Dreamweaver, with 24 (29 percent) users listing it as their only or primary system. Some people listed Dreamweaver as part of a list of tools used; for example “PHP / MySQL, Integrated Development Environments (php storm, coda), Dreamweaver, etc.,” and if all mentions of Dreamweaver are included, the number of users rises to 31 (38 percent). Some version of “hand coded” was the second most popular answer with 9 responses (11 percent), followed by Adobe Contribute with 7 (9 percent). Many of the “other” responses were hard to classify and were excluded from analysis. Some examples include: • FTP to the web • Voyager Public Web Browser ezProxy • Excel, e-mail, file folders on shared drives Among the top three non-CMS web management systems, Dreamweaver users were most satisfied, selecting highly satisfied or satisfied in 15 out of 24 (63 percent) cases. Hand coders were highly satisfied or satisfied in 5out of 9 of cases (56 percent), and Adobe Contribute users were only highly satisfied or satisfied in 3 out of 7 (43 percent) cases. Respondents not using a CMS were asked whether they were considering a move to a CMS within the next two years. Most (59 percent) said yes. Research libraries were much more likely to be planning such a move (81percent) than master’s (50 percent) or baccalaureate (45 percent) libraries (see table 2.) A Chi-square test rejects the null hypothesis that the consideration of a move to CMS is independent of basic Carnegie classification; this difference was significant at the p = 0.038 level. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2013 48 Table 2 Non-CMS Users Considering a Move to a CMS within the Next Two Years by Carnegie Classification* Baccalaureate Master’s Research Total No 11 responses or 55% 11 responses or 50% 4 responses or 19% 26 responses or 41% Yes 9 responses or 45% 11 responses or 50% 17 responses or 81% 37 responses or 59% Total 20 responses or 100% 22 responses or 100% 21 responses or 100% 63 responses or 100% Chi-square=6.526, df=2, p=.038 *Excludes “not sure” responses Non-CMS users were asked to provide comments related to topics covered in the survey, and here is a sampling of responses received: • CMSs cost money that our college cannot count on being available on a yearly basis. • The library doesn't have overall responsibility for the website. University web services manages the entire site, I submit changes to them for inclusion and updates. • We are so small that the time to learn and implement a CMS hardly seems worth it. So far this low-tech method has worked for us. • The main university site was moved to a CMS in 2008. The library was not included in that move because of the number of pages. I hear rumors that we will be forced into the CMS that is under consideration for adoption now. The library has had zero input in the selection of the new CMS. CMS Users When respondents indicated their library used a CMS, they were routed to a series of CMS related questions. The first question asked which CMS their library was using. Of the 153 responses, the most popular CMSs were Drupal (40); WordPress (15); LibGuides (14), which was defined within the survey as a CMS “for main library website, not just for guides”; Cascade Server (12); Ektron (6); and ModX and Plone (5 each). These users were also asked about their overall satisfaction with their systems. Among the top four CMSs, LibGuides users were the most satisfied, selecting highly satisfied or satisfied in 12 out of 12 (100 percent) cases. The remaining three systems’ satisfaction ratings (highly satisfied or satisfied) were as follows: WordPress (12out of 15 cases or 80 percent), Drupal (26out of 38 cases or 68 percent), and Cascade Server (3 out of 11 cases or 27 percent). When asked whether they would switch systems if given the opportunity, most (61out of 109 cases or 56 percent) said no. Looking at the responses for the top four CMSs, responses echo the CONTENT MANAGEMENT SYSTEMS: TRENDS IN ACADEMIC LIBRARIES | CONNELL 49 satisfaction responses. LibGuides users were least likely to want to switch (0 out of 7 cases or 0 percent), followed by Wordpress (1 out of 5 cases or 17 percent), Drupal (8out of 23 cases or 26 percent), and Cascade Server (3 out of 7 or 43 percent) users. Respondents were asked whether their library uses the same CMS as their parent institution. Most (106 out of 169 cases or 63 percent) said yes. Libraries at large institutions (over 10,000 FTE) were much less likely (34 percent) than their smaller counterparts to share a CMS with their parent institution (see table 3.) A Chi-square test rejects the null hypothesis that sharing a CMS with a parent institution is independent of size: at a significance level of p = 0.001, libraries at smaller institutions are more likely to share a CMS with their parent. Table 3 CMS Users Whose Libraries Use the Same CMS as their Parent Institution by Size Large Medium Small Very Small Total No 23 responses (66%) 15 responses (33%) 19 responses (27%) 6 responses (35%) 63 responses (37%) Yes 12 responses (34%) 31 responses (67%) 52 responses (73%) 11 responses (65%) 106 responses (63%) Total 35 responses (100%) 46 responses (100%) 71 responses (100%) 17 responses (100%) 169 responses (100%) Chi-square=15.921, df=3, p=.001 Not surprisingly, a similar correlation holds true for comparing shared CMSs and simplified basic Carnegie classification. Baccalaureate and Master’s libraries were more likely to share CMSs with their institutions (69 percent and 71 percent respectively) than Research libraries (42 percent) (see table 4.) At a significance level of p = 0.004, a Chi-square test rejects the null hypothesis that sharing a CMS with a parent institution is independent of basic Carnegie classification. Table 4 CMS Users Whose Libraries Use the Same CMS as their Parent Institution, by Carnegie Classification Baccalaureate Master’s Research Total No 19 responses (31%) 18 responses (29%) 26 responses (58%) 63 responses (37%) Yes 43 responses (69%) 44 responses (71%) 19 responses (42%) 106 responses (63%) Total 62 responses (100%) 62 responses (100%) 45 responses (100%) 169 responses (100%) Chi-square = 11.057, df = 2, p = .004 INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2013 50 When participants responded that their library shared a CMS with the parent institution, they were asked a follow up question about whether the library made the transition with the parent institution. Most (80 out of 99 cases or 81 percent) said yes, the transition was made together. However, private institutions were more likely to have made the switch together (88 percent) than public (63 percent) (See table 5.) A Fisher’s exact test rejects the null hypothesis that transition to CMS is independent of institutional control: at a significance level of p = 0.010, private institutions are more likely than public to move to a CMS in concert. Table 5 Users Whose Libraries and Parent Institutions Use the Same CMS: Transition by Public/Private Control* Private Public Total Switched Independently 9 responses (13%) 10 responses (37%) 19 responses (19%) Switched Together 63 responses (88%) 17 responses or (63%) 80 responses (81%) Total 72 responses (101%)** 27 responses (100%) 99 responses (100%) Fisher’s exact Test: p = .010 * Excludes responses where people indicated “other” ** Due to rounding, total is greater than 100% Similarly, a relationship existed between transition to CMS and basic Carnegie classification. Baccalaureate institutions (93 percent) were more likely than Master’s (80 percent), which were more likely than Research institutions (53 percent) to make the transition together (see table 6.) A Chi-square test rejects the null hypothesis that the transition to CMS is independent of basic Carnegie classification: at a significance level of p = 0.002, higher degree granting institutions are less likely to make the transition together. Table 6 Users Whose Libraries and Parent Institutions Use the Same CMS: Transition by Carnegie Classification* Baccalaureate Master’s Research Total Switched Independently 3 responses (7%) 8 responses (21%) 8 responses (47%) 19 responses (19%) Switched Together 40 responses (93%) 31 responses (80%) 9 responses (53%) 80 responses (81%) Total 43 responses (100%) 39 responses (101%)** 17 responses (100%) 99 responses (100%) Chi-square = 12.693, df = 2, p = .002 *Excludes responses where people indicated “other” **Due to rounding, total is greater than 100% CONTENT MANAGEMENT SYSTEMS: TRENDS IN ACADEMIC LIBRARIES | CONNELL 51 This study indicates that for libraries that transitioned to a CMS with their parent institution, the transition was usually forced. Out of the 88 libraries that transitioned together and indicated whether they were given a choice, only 8 libraries (9 percent) had a say in whether to make that transition. And even though academic libraries were usually forced to transition with their institution, they did not usually have representation on campus-wide CMS selection committees. Only 25 percent (22 out of 87) respondents indicated that their library had a seat at the table during CMS selection. When comparing CMS satisfaction ratings among libraries that were represented on CMS selection committees versus those that had no representation, it is not surprising that those with representation were more satisfied (13 out of 22 cases or 59 percent) than those without (21 out of 59 cases or 36 percent). The same holds true for those libraries given a choice whether to transition. Those given a choice were satisfied more often (6out of 8 cases or 75 percent) than those forced to transition (21 out of 71 cases or 30 percent). Respondents who said that they were not on the same CMS as their institution were asked why they chose a different system. Many of the responses indicated a desire for freedom from the controlling influence of either IT and marketing arms of the institution : • We felt Drupal offered more flexibility for our needs than Cascade, which is what the University at large was using. I've heard more recently that the University may be considering switching to Drupal. • University PR controls all aspects of the university CMS. We want more freedom. • We are a service-oriented organization, as opposed to a marketing arm. We by necessity need to be different. CMS users were asked to provide a list of three factors most important in their selection of their CMS and to rank their list in order of importance. The author standardized the responses, e.g. “price” was recorded as “cost.” The factors listed first, in order of frequency, were ease of use (15), flexibility (10), and cost (6). Ignoring the ranking, 38 respondents listed ease of use somewhere in their “top three”, while 23 listed cost, and 16 listed flexibility. Another objective of this study was to determine if there was a positive correlation between libraries with their own dedicated IT staff and those who chose open source CMSs. Therefore CMS users were asked if their library had its own dedicated IT staff, and 66 out of 143 libraries (46 percent) said yes. Then the CMSs used by respondents were translated into two categories, open source or proprietary systems (when a CMS listed was unknown it was coded as a missing value), and a Fisher’s exact test was run against all cases that had values for both variables to see if a correlation existed. Although those with library IT had open source systems more frequently than those without, the difference was not significant (see table 7.) INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2013 52 Table 7 Libraries with own IT personnel by Open Source CMSs Library has Own IT Yes No Total CMS is Open Source Yes 37 responses (73%) 32 responses (57%) 69 responses (65%) No 14 responses (28%) 24 responses (43%) 38 responses (36%) Total 51 responses (101%)* 56 responses or (100%) 107 responses (101%)* Fisher’s exact Test: p = .109 *Due to rounding, total is greater than 100% In another question, people were asked to self-identify if their organization uses an open source CMS, and if so asked whether they have outsourced any of its implementation or design to an outside vendor. Most (61 out of 77 cases or 79 percent) said they had not outsourced implementation or design. One person commented, “No, I don't recommend doing this. The cost is great, you lose the expertise once the consultant leaves, and the maintenance cost goes through the roof. Hire someone fulltime or move a current position to be the keeper of the system.” One of the advantages of having a CMS is the ability to give multiple people, regardless of their web authoring skills, the opportunity to edit webpages. Therefore, CMS users were asked how many web content creators they have within their library. Out of 152 responses, the most frequent range cited was 2–5 authors (72 responses or 47 percent), followed by (33 responses or 22 percent) with only one author, 6–10 authors (20 responses or 13 percent), 21–50 authors (16 responses or 11 percent), 11–20 authors (6 responses or 4 percent), and over 50 authors (5 responses or 3 percent). Because this question was an open ended response and responses varied greatly, including “Over 100 (over 20 are regular contributors)” and “1–3”, standardization was required. When a range or multiple numbers were provided, the largest number was used. Respondents were asked whether their library uses a workflow management process requiring page authors to receive approval before publishing content. Of the 131 people who responded yes or no, most (88 responses or 67 percent) said no. CMS users were asked to provide comments related to topics covered in the survey. Many comments mentioned issues of control (or lack thereof), while another common theme was concerns with specific CMSs. Here is a sampling of responses received: • Having dedicated staff is a necessity. There was a time when these tools could be installed and used by a techie generalist. Those days are over. A professional content person and a professional CMS person are a must if you want your site to look like a professional site... CONTENT MANAGEMENT SYSTEMS: TRENDS IN ACADEMIC LIBRARIES | CONNELL 53 I'm shocked at how many libraries switched to a CMS yet still have a site that looks and feels like it was created 10 years ago. • Since the CMS was bred in-house by another university department, we do not have control over changing the design or layout. The last time I requested a change, they wanted to charge us. • Our university marketing department, which includes the web team, is currently in the process of switching [CMSs]. We were not invited to be involved in the selection process for a new CMS, although they did receive my unsolicited advice. • We compared costs for open source and licensed systems, and we found the costs to be approximately equivalent based on the development work we would have needed in an Open Source environment. • The library was not part of the original selection process for the campus' first CMS because my position didn't exist at that time. Now that we have a dedicated web services position, the library is considered a "power user" in the CMS and we are often part of the campus wide discussions about the new CMS and strategic planning involving the campus website. • We currently do not have the preferred level of control over our library website; we fought for customization rights for our front page, and won on that front. However, departments on campus do not have permission to install or configure modules, which we hope will change in the future. • There’s a huge disconnect between IT /administration and the library regarding unique needs of the library in the context of web-based delivery of information. DISCUSSION Comparing the results of this study to previous studies indicates that CMS usage within academic libraries is rising. The 64 percent CMS adoption rate found in this survey, which used a more narrow definition of CMS than some previous studies cited in the literature review, is higher than adoption rates in any of said studies. As more libraries make the transition, it is important to know how different CMSs have been received among their peers. Although CMS users are slightly more satisfied than non-CMS users (54 percent vs. 47 percent), the tools used matter. So if a library using Dreamweaver to manage their site is given an option of moving with their institution to a CMS and that CMS is Cascade Server, they should strongly consider sticking with their current non-CMS method based on the respective satisfaction levels reported in this study (63 percent vs. 27 percent). Satisfaction levels are important, but should not be considered in a vacuum. For example, although LibGuides users reported very high satisfaction levels (100 percent were satisfied or very satisfied), users were mostly (11 out of 14 users or 79 percent) small or very small schools, while the remaining three (21percent) were medium schools. No large schools reported using LibGuides as their CMS. LibGuides may be wonderful for a smaller school without need of much INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2013 54 customization or, in some cases, access to technical expertise but may not be a good CMS solution for larger institutions. One of the largest issues raised by survey respondents was libraries’ control, or lack thereof, when moving to a campus-selected CMS. Given the complexity of academic libraries websites, library representation on campus-wide CMS selection committees is warranted. Not only are libraries more satisfied with the results when given a say in the selection, but libraries have special needs when it comes to website design that other campus units do not. Including library representation ensures those needs are met. Some of the respondents’ comments regarding lack of control over their sites are disturbing to libraries being forced or considering a move to a campus CMS. Clearly, having to pay another campus department to make changes to the library site is not an attractive option for most libraries. Nor should libraries have to fight for the right or ability to customize their home pages. Developing good working relationships with the decision makers may help prevent some of these problems, but likely not all. This study indicates that it is not uncommon for academic libraries to be forced into CMSs, regardless of the CMSs acceptability to the library environment. CONCLUSION The adoption of CMSs to manage academic libraries’ websites is increasing, but not all CMSs are created equal. When given input into switching website management tools, library staff have many factors to take into consideration. These include, but are not limited to, in-house technical expertise, desirability of open source solutions, satisfaction of peer libraries with considered systems, and library specific needs, such as workflow management and customization requirements. Ideally, libraries would always be partners at the table when campus-wide CMS decisions are being made, but this study shows that this does not happen in most cases. If a library suspects that it is likely to be required to move to a campus-selected system, its staff should be alert for news of impending changes so that they can work to be involved at the beginning of the process to be able to provide input. A transition to a bad CMS can have long-term negative effects on the library, its users, and staff. A library’s website is its virtual “branch” and vitally important to the functioning of the library. The management of such an important component of the library should not be left to chance. REFERENCES 1. Doug Goans, Guy Leach, and Teri M. Vogel, “Beyond HTML: Developing and Re-imagining Library Web Guides in a Content Management System,” Library Hi Tech 24, no. 1 (2006): 29–53, doi:10.1108/07378830610652095. 2. Ruth Sara Connell, “Survey of Web Developers in Academic Libraries,” The Journal of Academic Librarianship 34, no. 2 (March 2008): 121–129, doi:10.1016/j.acalib.2007.12.005. http://dx.doi.org/10.1016/j.acalib.2007.12.005 CONTENT MANAGEMENT SYSTEMS: TRENDS IN ACADEMIC LIBRARIES | CONNELL 55 3. Maira Bundza, Patricia Fravel Vander Meer, and Maria A. Perez-Stable, “Work of the Web Weavers: Web Development in Academic Libraries,” Journal of Web Librarianship 3, no. 3 (July 2009): 239–62. 4. David Comeaux and Axel Schmetzke, “Accessibility of Academic Library Web Sites in North America—Current Status and Trends (2002–2012).” Library Hi Tech 31, no. 1 (January 28, 2013): 2. 5. Daniel Verbit and Vickie L. Kline, “Libguides: A CMS for Busy Librarians,” Computers in Libraries 31, no. 6 (July 2011): 21–25. 6. Amy York, Holly Hebert, and J. Michael Lindsay, “Transforming the Library Website: You and the IT Crowd,” Tennessee Libraries 62, no. 3 (2012). 7. Bundza, Vender Meer, and Perez-Stable, “Work of the Web Weavers: Web Development in Academic Libraries.” 8. Tom Kmetz and Ray Bailey, “Migrating a Library’s Web Site to a Commercial CMS Within a Campus-wide Implementation,” Library Hi Tech 24, no. 1 (2006): 102–14, doi:10.1108/07378830610652130. 9. Kimberley Stephenson, “Sharing Control, Embracing Collaboration: Cross-Campus Partnerships for Library Website Design and Management,” Journal of Electronic Resources Librarianship 24, no. 2 (April 2012): 91–100. 10. Ibid. 11. Elizabeth L. Black, “Selecting a Web Content Management System for an Academic Library Website,” Information Technology & Libraries 30, no. 4 (December 2011): 185–89; Andy Austin and Christopher Harris, “Welcome to a New Paradigm,” Library Technology Reports 44, no. 4 (June 2008): 5–7; Holly Yu , “Chapter 1: Library Web Content Management: Needs and Challenges,” in Content and Workflow Management for Library Web Sites: Case Studies, ed. Holly Yu (Hersey, PA: Information Science Publishing, 2005), 1–21; Wayne Powel and Chris Gill, “Web Content Management Systems in Higher Education,” Educause Quarterly 26, no. 2 (2003): 43– 50; Goans, Leach, and Vogel, “Beyond HTML.” 12. Kmetz and Bailey, “Migrating a Library’s Web Site.” 13. Carnegie Foundation for the Advancement of Teaching, 2010 Classification of Institutions of Higher Education, accessed February 4, 2013, http://classifications.carnegiefoundation.org/descriptions/basic.php. 14. Ronald R. Powell , Basic Research Methods for Librarians (Greenwood, 1997). http://classifications.carnegiefoundation.org/descriptions/basic.php 4633 ---- High-Performance Annotation Tagging over Solr Full-text Indexes Michele Artini, Claudio Atzori, Sandro La Bruzzo, Paolo Manghi, Marko Mikulicic, and Alessia Bardi INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2014 22 ABSTRACT In this work, we focus on the problem of annotation tagging over information spaces of objects stored in a full-text index. In such a scenario, data curators assign tags to objects with the purpose of classification, while generic end users will perceive tags as searchable and browsable object properties. To carry out their activities, data curators need annotation tagging tools that allow them to bulk tag or untag large sets of objects in temporary work sessions where they can virtually and in real time experiment with the effect of their actions before making the changes visible to end users. The implementation of these tools over full-text indexes is a challenge because bulk object updates in this context are far from being real-time and in critical cases may slow down index performance. We devised TagTick, a tool that offers to data curators a fully functional annotation tagging environment over the full-text index Apache Solr, regarded as a de facto standard in this area. TagTick consists of a TagTick Virtualizer module, which extends the API of Solr to support real-time, virtual, bulk-tagging operations, and a TagTick User Interface module, which offers end-user functionalities for annotation tagging. The tool scales optimally with the number and size of bulk tag operations without compromising the index performance. INTRODUCTION Tags are generally conceived as nonhierarchical terms (or keywords) assigned to an information object (e.g., a digital image, a document, a metadata record) in order to enrich its description beyond the one provided by object properties. The enrichment is intended to improve the way end users (or machines) can search, browse, evaluate, and select the objects they are looking for. Examples are qualificative terms, i.e. terms associating the object to a class (e.g., biology, computer science, literature) or qualitative terms, i.e. terms associating the object to a given measure of value (e.g., rank in a range, opinion).1 Approaches differ in the way tags are generated. In some cases users (or machines)2 freely and collaboratively produce tags,3 thereby generating so-called Michele Artini (michele.artini@isti.cnr), Claudio Atzori (claudio.atzori@isti.cnr.it), Sandro La Bruzzo (msandro.labruzzo@isti.cnr), Paolo Manghi (paolo.manghi@iti.cnr.it), and Mark Mikulicic (mmark.mikulicic@isti.cnr.it) are researchers at Istituto di Scienza e Tecnologie dell’Informazione “Alessandro Faedo,” Consiglio Nazionale delle Richerce, Pisa, Italy. Alessia Bardi (mallessia.bardi@for.unipit.it) is a researcher at the Dipartimento di Ingegneria dell’Informazione, Università di Pisa, Italy. mailto:michele.artini@isti.cnr.it mailto:claudio.atzori@isti.cnr.it mailto:msandro.labruzzo@isti.cnr mailto:paolo.manghi@iti.cnr.it mailto:mmark.mikulicic@isti.cnr.it mailto:mallessia.bardi@for.unipit.itmailto: HIGH-PERFORMANCE ANNOTATION TAGGING OVER SOLR FULL-TEXT INDEXES | ARTINI ET AL 23 folksonomies. The natural heterogeneity of folksonomies calls for solutions to harmonise and make more effective their usage, such as tag clouds.4 In other approaches users can pick tags from a given set of values (e.g., vocabulary, ontology, range) or else find hybrid solutions, where a degree of freedom is still permitted.5,6 A further differentiation is introduced by semantically enriched tags, which are tags contextualized by a label or prefix that provides an interpretation for the tag.7 For example, in the digital library world, the annotation of scientific article objects with subject tags could be done according to the tag values of the tag interpretations of ACM scientific disciplines and “Dewey Decimal Classification,” whose term ontologies are different.8 The action of tagging is commonly intended as the practice of end users or machines of assigning or removing tags to the objects of an information space. An information space is a digital space a user community populates with information objects for the purpose of enabling content sharing and providing integrated access to different but related collections of information objects.9 The effect of tagging information objects in an information space may be private, i.e., visible to the users who tagged the objects or to a group of users sharing the same right, or public, i.e., visible to all users.10 Many well-known websites allow end users to tag web resources. For example Delicious11 (http://delicious.com) allows users to tag web links with free and public keywords; Stack Overflow (http://stackoverflow.com), which lets users ask and answer questions about programming, allows tagging of question threads with free and public keywords; Gmail 12 (http://mail.gmail.com) allows users to tag emails—at the same time, tags are also transparently used to encode email folders. In the digital library context, the portal Europeana (http://www.europeana.eu) allows authenticated end users to tag metadata records with free keywords to create a private set of annotations. In this work we shall focus on annotation tagging—that is, tagging used as a manual data curation technique to classify (i.e., attach semantics to) the objects of an information space. In such a scenario, tags are defined as controlled vocabularies whose purpose is classification.13,14 Unlike semantic annotation scenarios, where semantic tags may be semiautomatically generated and assigned to objects,15 in annotation tagging authorized data curators are equipped with search tools to identify the sets of objects they believe should belong or not belong to a given category (identified by a tag), and to eventually perform the tagging or untagging actions required to apply the intended classification. In general, such operations may assign or remove tags to and from an arbitrarily large subset of objects of the Information Space. It is therefore hard to predict the quality and consistency of the combined effect of a number of such actions. As a consequence, data curators must rely on virtual tagging functionalities which allow them to bulk (un)tag sets of objects in temporary work sessions, where they can in real-time preview and experiment (do/undo) the effects of their actions before making the changes visible to end users. Examples of scenarios that may require annotation tagging can be found in many fields of application. This is the case, for example, in several data infrastructures funded by the European Commission FP7 program, which share the common goal of populating very large information spaces by aggregating textual metadata records collected from several data sources. Examples are the data http://delicious.com/ http://stackoverflow.com/ http://mail.gmail.com/ http://www.europeana.eu/ INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2014 24 infrastructures for DRIVER,16 Heritage of the People’s Europe (HOPE),17 European Film Gateway (EFG and EFG1914),18 OpenAIRE19 (http://www.openaire.eu), and Europeana. In such contexts, the aggregated records are potentially heterogeneous, not sharing common classification schemes, and annotation tagging becomes a powerful mean to make the Information Space more effectively consumable by end users. There at two significant challenges to be tackled in the realization of annotation tagging tools. First is the need to support bulk-tagging actions in almost real time so that data curators need not wait long for their actions to complete. Second, bulk-tagging actions need to be virtualized over the information space, so that data curators can verify the quality of their actions before committing them, and access to the information space is unaffected by such actions. Naturally, the feasibility and quality of annotation tagging tools strictly depends on the data management system adopted to index and search objects of the information space. In general, not to compromise information space availability, bulk-updates are based on offline, efficient strategies, which minimize the update’s delay,20 or virtualisation techniques, which perform the update in such a way that users have the impression this was completed.21 In this work, we target the specific problem of annotation tagging of information spaces whose objects are documents in a Solr full-text index (v3.6).22 Solr is an open-source Apache project delivering a full-text index whose instances are capable of scaling up to millions of records, can benefit from horizontal clustering, replica handling, and production-quality performance for concurrent queries and bulk updates. The index is widely adopted in the literature and often in contexts where annotation tagging is required, such as the aforementioned aggregative data infrastructures. The implementation of virtual and bulk-tagging facilities over Solr information spaces is a challenge, since bulk updates of Solr objects are fast, but far from being real-time when large sets of objects are involved. In general, independently of the configuration, a re-indexing of millions of objects may take up to some hours, while for real-time previews even minutes would not be acceptable. Moreover, in critical cases, update actions may also slow down index perfor- mance and compromise access to the information space. In this paper, we present TagTick, a tool that implements facilities for annotation tagging over Solr with no remarkable degradation of performances with respect to the original index. TagTick consists of two main modules: the TagTick Virtualizer, which implements functionalities for real- time bulk (un)tagging in the context of work sessions for Solr, and the TagTick User Interface, which implements user interfaces for data curators to create, operate and commit work sessions, so as to produce newly tagged information spaces. TagTick software can be demoed and downloaded from http://nemis.isti.cnr.it/product/tagtick-authoritative-tagging-apache-solr. ANNOTATION TAGGING Annotation tagging is a process operated by data curators whose aim is improving the end user’s search experience over an Information Space. Specifically, the activity consists in assigning searchable and browsable tags to objects in order to classify and logically structure the http://www.openaire.eu/ http://nemis.isti.cnr.it/product/tagtick-authoritative-tagging-apache-solr HIGH-PERFORMANCE ANNOTATION TAGGING OVER SOLR FULL-TEXT INDEXES | ARTINI ET AL 25 Information Space into further (and possibly overlapping) meta-classes of objects. Moreover, when ontologies published on the Web are used, for example ontologies available as linked data such as the GeoNames ontology (http://www.geonames.org/ontology/documentation.html) or the DBPedia ontology (http://dbpedia.org/Ontology), then tags are means to link objects in the information space to external resources. In this section, we shall describe the functional requirements of annotation tagging in order to introduce assumptions and nomenclature to be used in the remainder of the paper. Information Space: Objects, Classes, Tags, and Queries We define an information space as a set of objects of different classes C1 . . . Ck. Each class Ci has a structure (l1 : V1, . . . ,ln : Vn), where lj’s are object property labels and Vj are the types of the property values. Types can be value domains, such as strings, integers, dates, or controlled vocabularies of terms. In its general definition, annotation tagging has to do with semantically enriched tagging, where a tag consists of a pair (i, t), made of a tag interpretation i and a tag value t from a term ontology T; as an example of interpretation consider the ACM subject classification scheme (e.g., i = ACM), where T is the set of ACM terms. In this context, tagging is de-coupled from the Information Space and can be configured a- posteriori. Typically, given an Information Space, data curators set up the annotation tagging environment by: (i) defining the interpretation/ontology pairs to be used for classification, and (ii) assigning to each class C the interpretations to be used to tag its objects. As a result, class structures are enriched with a set of interpretations (i1:T1 . . . im:Tm), where ij are tag interpretation labels and Tj the relative ontologies. Unless differently specified, an object may be assigned multiple tag values for the same tag interpretation, e.g. scientific publication objects may cover different scientific ACM disciplines. Finally, the information space can reply to queries q formed according to the abstract syntax intable 1, where Op is a generic Boolean operator (dependent on the underlying data management system, e.g. “=,” “<,” “>”) and C∈{C1, . . . ,Ck}. Tag predicates (i = t) and class predicates (class = C) represent exact matches, which mean “the object is tagged with the tag (i, t)” and “the object belongs to class C.” q ∷=(q And q) | (q Or q) | (l Op v) | (i = t) | (class = C) | v | ε Table 1. Solr Query Language. http://www.geonames.org/ontology/documentation.html http://dbpedia.org/Ontology INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2014 26 Virtual and Real-time Tagging In annotation tagging data curators apply bulk (un)tagging actions with respect to a tag (i, t) over arbitrarily large sets of objects returned by queries q over the information space. Due to the potential impact that such operations may have over the information space, tools for annotation tagging should allow data curators to perform their actions in a protected environment called work session. In such an environment curators can test sequences of bulk (un)tagging actions and incrementally shape up an information space preview: they may view the history of such actions, undo some of them, add new actions, and pose queries to test the quality of their actions. To offer a usable annotation tagging tool, it is mandatory for such actions to be performed in (almost) real- time. For example, curators should not wait more than a few seconds to test the result of tagging 1 million objects, an action which they might undo immediately after. Moreover, such actions should not conflict (e.g., slow performance) with the activities of end users running queries on the information space. Finally, when data curators believe the preview has reached its maturity, they can commit the work session, i.e., materialise the preview in the information space, and make the changes visible to end users. APACHE SOLR AND ANNOTATION TAGGING As mentioned in the introduction, our focus is on annotation tagging for Apache Solr (v3.6). This section describes the main information space features and functionalities of the Solr full-text index search platform. In particular, it explains the issues arising when using its native APIs to im- plement bulk real-time tagging as described previously. Solr Information Spaces: Objects, Classes, Tags, and Queries Solr is one of the most popular full-text indexes. It is an Apache open source Java project that offers a scalable, high performance and cross-platform solution for efficient indexing and querying of information spaces made of millions of objects (documents in Solr jargon).23 A Solr index stores a set of objects, each consisting in a flat list of possibly replicated and unordered fields associated to a value. Each object is referable by a unique identifier generated by the index at indexing time. The information spaces described previously can be modelled straightforwardly in Solr. Each object in the index contains field-value pairs relative to the properties and tag interpretations of all classes they belong to. Moreover, we shall assume that all objects share one field named class whose values indicate the classes (e.g. C1, . . . ,Ck) to which the object belongs. Such an assumption does not restrict the application domain, since classes are typically encoded in Solr by a dedicated field. The Solr API provides methods to search objects by general keywords, field values, field ranges, fuzzy terms and other advanced search options, plus methods for the bulk addition and deletion of objects. In our study, we shall restrict to the search method query(q, qf), where q and qf are CQL queries respectively referred as the “main query” and the “filter query”. In particular, in order to match the query language requirements described previously, we shall assume that q and qf are HIGH-PERFORMANCE ANNOTATION TAGGING OVER SOLR FULL-TEXT INDEXES | ARTINI ET AL 27 expressed according to the CQL subset matching the query language in table 1. getDocset :RS→DS returns the docset relative to a result set intersectDocsets :DS × DS→DS returns the intersection of two docset intersectSize :DS × DS→Integer returns the size of the intersection of two docsets unifyDocsets :DS × DS→DS returns the union of two docsets andNotDocsets :DS × DS→DS given two docsets ds1 and ds2 returns the docset {d | d ∈ ds1 ⋀ ¬ d ∈ ds2} searchOnDocsets :Q × DS→RS executes a query q over a docset and returns the relative resultset Table 2. Solr Docset Management Low-Level Interface. To describe the semantics of query(q, qf) it is important to make a distinction between the Solr notions of result set and docset. In Solr, the execution of a query returns a result set (i.e., QueryResponse in Solr jargon) that logically contains all objects matching the query. In practice, a result set is conceived to be returned at the time of execution to offer instant access to the query result, which is meantime computed and stored in memory into a low-level Solr data structure called docset. Docsets are internal Solr data structures, which contain lists of object identifiers and allow for efficient operations such as union and intersection of very large sets of objects to optimize query execution. Table 1 illustrates some of the methods used internally by Solr to handle docsets. Method names have been chosen to be self-explanatory and therefore do not match the ones used in the libraries of Solr. INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2014 28 ⟦query(q,qf)⟧Solr= � {d|ID(d)∈ ⟦q⟧DS} if (qf=null) searchOnDocset(q, ⟦qf⟧Cache(ϕ)) if (qf ≠null) ⟦qf⟧Cache(ϕ)= � ds if (ϕ(qf)=ds) ⟦qf⟧Cache(ϕ[qf ← ⟦qf⟧DS]) if (ϕ(qf) = ⊥) ⟦(q1 And q2)⟧DS= ⟦q1⟧DS ∩ ⟦q2⟧DS ⟦(q1 Or q2)⟧DS= ⟦q1⟧DS ∪ ⟦q2⟧DS ⟦(l op v)⟧DS= {ID(d)| d.l op v} ⟦(i=t)⟧DS= {ID(d)| d.i op t} Table 3. Semantic Functions. Informally, query(q, qf) returns the result set of objects matching the query q intersected with the objects matching the filter query qf, i.e. its semantics is equivalent to the one of the command query(q And qf, null). In practice, the usage of a filter query qf is intended to efficiently reduce the scope of q to the set of objects whose identifiers are in the docset of qf. To this aim, Solr keeps in memory a filter cache ϕ:Q →DS. The first time a filter query qf is received, Solr executes it and stores the relative docset ds in ϕ, where it can be accessed to optimize the execution of query(q, qf). Once the docset ϕ(qf) = ds is available, query(q, qf) invokes the low-level method searchOnDocset(q, ds) (see table 1). The method executes q to obtain its docset, efficiently intersects such docset with ds, and populates the result set relative to the query. Due to the efficiency of docset intersection and in-memory data structures, query execution time is closely limited to the one necessary to execute q. Table 3 shows the semantic functions ⟦.⟧Solr :Q x Q →RS , ⟦.⟧DS :Q →DS, ⟦.⟧Cache :Q x ℘(Q x DS) → DS. The first yields the result set of query(q, qf); the second the docset relative to a query q (where d is an object); and the third resolves queries into docsets by means of a filter cache ϕ. Limits to Virtual and Real-Time Tagging in Solr Whilst Solr is a well-known and established solution for full-text indexing over very large information spaces, it poses challenges for higher-level applications willing to expose to users private, modifiable views of the same index. This is the case for annotation tagging tools, which must provide data curators with work sessions where they can update with tagging and untagging actions a logical view of the information space, while still providing end users with search facilities over the last committed Information Space. Since Solr API does not natively provide “view management” primitives, the only approach would be that of materializing tagging and untagging HIGH-PERFORMANCE ANNOTATION TAGGING OVER SOLR FULL-TEXT INDEXES | ARTINI ET AL 29 actions in the index while making sure that such changes are not visible to end users. Prefixing tags with work session identifiers, cloning of tagged objects, or keeping index replicas may be valuable techniques, but all fail to deliver the real-time requirement described previously. This is due to the fact that when very large sets of objects are involved the re-indexing phase is generally far from being real-time. In general, independently of the configuration, processing such requests may take up to some hours for millions of objects, while for real-time previews even minutes would not be acceptable. TAGTICK VIRTUALIZER: VIRTUAL REAL-TIME TAGGING FOR SOLR This section presents the TagTick Virtualizer module, as the solution devised to overcome the inability of Apache Solr to support out-of-the-box real-time virtual views over Information Spaces. The Virtualizer API, shown in table 4, supports methods for creating, deleting and committing work sessions, and, in the context of a work session: (1) performing tagging/untagging actions and (2) querying the information space modified by such actions. In the following we will describe both functional semantics and implementation of the API, given in terms of a formal symbolic notation. The semantics defines the expected behaviour of the API and is provided in terms of the semantics of Solr. The implementation defines the realization of the API methods in terms of the low-level docset management library of Solr. The right side of figure 1 illustrates the layering of functionalities required to implement the TagTick Virtualizer module. As shown, the realization of the module required exposing the Solr low-level docset library through an API. Figure 1. TagTick Virtualizer: The Architecture. TagTick Virtualizer API: the intended semantics The commands createSession() creates a new session s, intended as a sequence of (un)tagging INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2014 30 actions over an initial Information Space I. The command and deleteSession(s) removes the session s from the environment. We denote the virtual information space obtained by modifying I with the actions in s as I(s); note that: I(𝜖) = I. createSession() creates and returns a work session s deleteSession(s) deletes a work session s commitSession(s) commits a work session s action(A, rs, (i, t), s) applies the action A with (i, t)to all objects in rs in s virtQuery(q, s) executes q over the Information Space I(s) Table 4. TagTick Virtualizer API: The Methods. The command action(A, rs, (i, t), s), depending on the value of A being tag or untag, applies the relative action for the tag (i, t) to all objects in rs and in the context of the session s. (Un)tagging actions occur in the context of a session s, hence update the scope of the Information Space I(s). The construction of such rs takes place in the annotation tagging tool user interface and may require several queries before all objects to be bulk (un)tagged are collected. Annotation tagging tools may for example provide web-basket mechanisms to support curators in this process. The command commitSession(s) makes the virtual Information Space I(s) persistent, i.e., materializes the bulk updates collected in session s. Once this operation is completed, the session s is deleted. The command virtQuery(q, s) executes a virtual search whose semantics is that of the Solr’s method query(q, null) executed over I(s). More formally, let’s extend the semantic function ⟦.⟧Solr to include the information space scope of the execution, that is: ⟦query(q,qf)⟧Solr I is the semantics of query(q, qf) over a given Information Space I. Then, we can define: ⟦virtQuery(q, s)⟧TV = ⟦query(q, null)⟧Solr I(s) TagTick Virtualizer API: The implementation To provide its functionalities in real time, the TagTick Virtualizer avoids any form of update action into the index. The module emulates the application of bulk (un)tagging actions over the information space by exploiting Solr low-level library for docset management, whose methods are shown in table 2. The underlying intuition is based on two considerations: (1) the action action(A, rs, (i, t), s) can be encoded in memory as an association between the tag (i, t) and the objects in the docset ds relative to rs in the context of s; and (2) the subset of objects ds should be returned to the HIGH-PERFORMANCE ANNOTATION TAGGING OVER SOLR FULL-TEXT INDEXES | ARTINI ET AL 31 query (i = t) if executed over I in the scope of s (i.e., as if I was updated with such an action). By following this approach, the module may rewrite and execute calls of the form virtQuery(q And (i = t)) into calls searchOnDocset(q, ds), thereby emulating the real-time execution of the query over the information space I(s). More generally, any query of the form q And qtag predicates, where qtag predicates is a query combining tag predicates relative to tags touched in the session, can be rewritten as searchOnDocset(q, ds). In such cases, ds is obtained by combining the docsets relative to tag predicates by means of the low-level methods intersectDocsets and unifyDocsets. The TagTick Virtualizer module implements the aforementioned session cache by means of an in- memory map ρ = S × I × T →DS, which caches the tagging status of all active work sessions. To this aim, ρ maps triples (s, i, t) onto docsets ds that are defined as the set of objects tagged with the tag (i, t) in the context of s at the time of the request. The TagTick Virtualizer is stateless with regard to the specific tags and sessions identifiers it is called to handle; such information is typically held in applications using the module to take advantage of real-time, virtual tagging mechanisms. Tagging and untagging actions The method action(A, rs, (i, t), s) has the effect of changing the status ρ to reflect the action of tagging or untagging the objects in the result set rs with the tag (i, t) in the session s. Table 5 describes the effect of the command over the status ρ in terms of the semantic function ⟦.⟧M:C × ℘(S×I×T)→℘(S×I×T) that takes a command C and a status ρ and returns the status ρ affected by C. In order to optimize the memory heap, ρ is populated following a lazy approach, according to which a new entry for the key (s, i, t) is created when the first tagging or untagging action with respect to the tag (i, t) is performed in the scope of s. When the user adds or removes a tag (i, t) for the first time in the session s (case ρ(s, i, t)= ⊥), the value of the entry ρ(s, i, t) is initialized to the docset relative to the query i = t: ds = getDocset(⟦query((i=t),null)⟧Solr I The function init(ρ, s, i, t) returns such new ρ over which the tag or untag action is eventually executed. If the action involves a tag (i, t) for which an entry ρ(s, i, t)= ds exists (case ρ(s, i, t) ≠ ⊥), the commands return the new ρ obtained by adding or removing the docset getDocset(rs) to or from ds. Such actions are performed in memory with minimal execution time. INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2014 32 ⟦action(A, rs, (i, t), s)⟧M(ρ)= � updateTag(ρ, rs, (i, t), s) if(A=tag And ρ(s, i, t)≠ ⊥) updateUntag(ρ, rs, (i, t), s) if(A=untag And ρ(s, i, t)≠ ⊥) ⟦action(A, rs, (i, t), s)⟧M(init(ρ, s, i, t)) if(ρ(s, i, t) = ⊥) init(ρ, s, i, t)= ρ[ρ(s, i, t)←getDocset(⟦query(i=t, null)⟧Solr] updateTag(ρ, rs, (i, t), s)= ρ[ρ(s, i, t)← ρ(s, i, t) ∪ getDocset(rs)] updateUntag(ρ, rs, (i, t), s)= ρ[ρ(s, i, t)← ρ(s, i, t) ∖ getDocset(rs)] Table 5. Semantics of tag/untag commands. Queries over a Virtual Information Space As mentioned above, the command virtQuery(q, s) is implemented by executing the low-level method searchOnDocset(q', ds). Informally, 𝑞′ is the subpart of q whose predicates are not affected by actions in s, while ds is the subset of objects matching tag predicates affected by actions in s, to be calculated by means of the map ρ. To make this a real statement, two main issues must be ad- dressed. The first one is syntactical: how to extract from q the sub-query q' and the subquery to be filtered by ρ to generate ds. The second issue is semantic: the misalignment between the objects in the original Information Space I, where searchOnDocset is executed, and the ones in I(s), to be virtually queried over and returned by virtQuery. Syntactic issue: To obtain 𝑞′ and ds from q, the TagTick Virtualizer module includes a Query Rewriter module that is in charge of rewriting q as a query: q' And qtags in session (1) Both queries are compliant to the query grammar in table 1, but the second is a query that groups all tag predicates in q which are affected by s. The reason of this restriction is due to the fact that the method searchOnDocset(q’, ds) performs an intersection between the docset ds and the docset obtained from the execution of q�. In principle, qtags in session may contain arbitrary combinations of tag predicates (i = t) combined with And and Or operators. To get a better understanding, refer to the examples in table 6, where we assumed to have two tag interpretations A with terms {a1, a2} and B with terms {b1, b2} where ρ(s, A, a1) and ρ(s, B, b1) are defined in ρ; note that keyword searches, e.g., “napoleon,” are not run over tag values. The first two queries can be executed, while the last one is invalid. Indeed there is no way to factor out the tag predicate (A = a1) so that it can be separated and joint with the rest of the query using an And operator. HIGH-PERFORMANCE ANNOTATION TAGGING OVER SOLR FULL-TEXT INDEXES | ARTINI ET AL 33 Clearly, the ability of the Query Rewriter module to rewrite the query independently of its complexity may be crucial to increase the usability level of TagTick Virtualizer. In its current implementation, the TagTick Virtualizer assumes that q is provided to virtQuery as already satisfying the expected query structure (1). As we shall see in the next section, this assumption is very reasonable in the realization of our annotation tagging tool TagTick and, more generally, in the definition of tools for annotation tagging. Indeed, such tools typically allow data curators to run Google-like free-keyword queries to be refined by a set of tags selected from a list. Such queries fall in our assumption and also match the average requirements of this application domain. 𝑞 = "napoleon" 𝐴𝑛𝑑 (𝐴 = 𝑎1 𝑂𝑟 𝐵 = 𝑏1) 𝑤ℎ𝑒𝑟𝑒: 𝑞′ = "napoleon" 𝑞𝑡𝑎𝑔 𝑖𝑛 𝑠𝑒𝑠𝑠𝑖𝑜𝑛 = (𝐴 = 𝑎1 𝑂𝑟 𝐵 = 𝑏1) 𝑞 = (𝐴 = 𝑎2 𝑂𝑟 "napoleon") 𝐴𝑛𝑑 (𝐴 = 𝑎1 𝑂𝑟 𝐵 = 𝑏1) 𝑤ℎ𝑒𝑟𝑒: 𝑞′ = (𝐴 = 𝑎2 𝑂𝑟 "napoleon") 𝑞𝑡𝑎𝑔 𝑖𝑛 𝑠𝑒𝑠𝑠𝑖𝑜𝑛 = (𝐴 = 𝑎1 𝑂𝑟 𝐵 = 𝑏1) 𝑞 = (𝐴 = 𝑎1 𝑂𝑟 𝐵 = 𝑏2) 𝐴𝑛𝑑 napoleon Table 6. Query rewriting. Semantic issue: The command searchOnDocset(q�, ds) does not match the expected semantics of virtQuery(q, s). The reason is that searchOnDocset is executed over the original information space I and objects in the returned result set may not reflect the new tagging imposed by actions in s. For example, consider an untagging action for the tag (i, t) and the result set rs in s. Although the objects in rs would never be returned for a query virtQuery((i = t),s), they could be returned for queries regarding other properties and in this case they would still display the tag (i, t). To solve this problem, the function patchResultset : RS → RS in table 7 intercepts the result set returned by searchOnDocset and “patches” its objects, by properly removing or adding tags according to the actions in s. To this aim, the function exploits the low-level function intersectSize, which efficiently computes and returns the size of the intersection between two docsets. For each object d in a given result set rs, the function verifies if d belongs to the docsets ρ(s, i, t) relative to the tags touched by the session s: if this is the case (intersectSize returns 1), the object should be enriched with the tag (add(d, (i, t))), otherwise the tag should be removed from the object (remove(d, (i, t))). INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2014 34    = = = = ≠ 0 ))(},({ )),(,( 1 ))(},({ )),(,( )),(,( ))},(,({),,( ^),,( rsgetDocsetdizeintersectSiftidremove rsgetDocsetdizeintersectSiftidadd tidentpatchDocum tidentpatchDocumsrrstsetpatchResul tisr dÎrs   Table 7. Patching result sets. The TagTick Virtualizer implements also patching of results for browse queries. A Solr browse query is a CQL query q followed by the list of object properties l for which a group-by operation (in the sense of relational databases) is demanded. The query returns two responses: the query result set rs and the group-by statistics (l, v, k(l, v)) calculated over the result set and for the given properties, where k(l, v) is the number of objects featuring the value v for the property l in rs. As in the case of standard queries, the semantic issue affects browse queries when a group-by is applied over a tag interpretation i touched in the current work session. Indeed, the relative stats would be calculated over the information space I rather than the intended I(s). To solve this issue, when a browse query demands for stats over a tag interpretation i, the relative triples (i, t, k(i, t)) are patched as follows: 1. If (i, t, k(i, t)) is such that ρ(s, i, t )= ⊥, i.e. the tag was not affected by the session, then k(i, t) is left unchanged; 2. If (i, t, k(i, t)) is such that ρ(s, i, t )= ds, then k(i,t)= intersectSize(ds, getDocset(rs)) The operation returns the number of objects currently tagged with (i, t) which are also present in the result set rs. Query execution: The implementation of virtQuery can therefore be defined as ⟦virtQuery(q,s)⟧TV = patchResultset(searchOnDocset(q�, ds), ρ, s) where q is rewritten in terms of q� and qtags in session by the Query Rewriter module, and ds is the docset obtained by applying the function ⟦.⟧VT:Q × S × ℘(S × I × T) →DS defined in Table 8 to qtags in session. The function, given a query of tag predicates, a session identifier, and the status map ρ returns the docset of objects satisfying the query in the session’s scope. HIGH-PERFORMANCE ANNOTATION TAGGING OVER SOLR FULL-TEXT INDEXES | ARTINI ET AL 35 ⟦𝑞1 𝑂𝑟 𝑞2⟧𝑉(𝑠,𝜌) = unifyDocsets(⟦𝑞1⟧𝑉(𝑠,𝜌), ⟦𝑞2⟧𝑉(𝑠,𝜌)) ⟦𝑞1 𝐴𝑛𝑑 𝑞2⟧𝑉(𝑠,𝜌) = intersectDocsets(⟦𝑞1⟧𝑉(𝑠,𝜌), ⟦𝑞2⟧𝑉(𝑠,𝜌)) ⟦(i=t)⟧V(s, ρ) = ρ(s, i, t) Table 8. Evaluation of qtags in session in session s. The definition of ρ, the Query Rewriter module, the semantics of the commands action and virtQuery, the definition of searchOnDocset, and the function ⟦.⟧V guarantee the validity of the following claim, crucial for the correctness of the TagTick Virtualizer: Claim (Search correctness) Given an information space I, a map ρ, and a session s, for any query q such that 1. q = q� And qtags in session 2. ds = �qtags in session�V(s, ρ) we can claim that ⟦virtQuery(q, s)⟧TV = ⟦query(q, null)⟧Solr I(s) hence the implementation of the command virtQuery matches its expected semantics. Making a Virtual Information Space Persistent The commitSession(s) command is responsible for updating the initial Information Space I to the changes applied in s, i.e. add and remove tags to objects in I according to the actions in s. To this aim, the module relies on the map ρ, which associates each tag (i, t) to the set of objects virtually tagged by (i, t) in s, and on the low-level function andNotDocsets. By properly matching the set of objects tagged by (i, t) in I and I(s) the function derives the sets of objects to tag and untag in I. Overall, the execution of commitSession(s) consists in: 1. Identifying the set of tags affected by tagging or untagging actions in the session s: changedTags(s) = {(i, t)|ρ(s, i, t) ≠ ⊥} 2. For each (i, t) ∈ changedTags(s) a) fetching the result set relative to all objects in I with tag (i = t): rs = query((i = t), null); b) keeping in memory the relative docset ds = getDocset(rs); c) calculating in memory the set of objects in I to be untagged by (i = t): INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2014 36 toBeUntagged = andNotDocsets(ds, ρ(s, i, t)); d) calculating in memory the set of objects in I to be tagged with (i = t) toBeTagged = intersectDocset(ρ(s, i, t), ds); e) update the index to tag and untag all objects in the two sets; and f) Remove session s. The TagTick Virtualizer module is also responsible for the management of conflicts on commits and to avoid index inconsistencies. To this aim, only the first commit action is executed, and once the relative actions are materialized into the index, all other sessions are invalidated, i.e., deleted. TagTick User Interface: Annotation Tagging for Solr The TagTick User Interface module implements the functionalities presented in previously over a Solr index equipped with the TagTick Virtualizer module described in the section on Solr and annotation tagging (see figure 2). The user interface offers to authenticated data curators an annotation tagging environment where they can open work sessions, do and undo sequences of (un)tagging actions, and eventually commit the session into the current Solr information space. When data curators log out from the tool, the modules stores on disk their pending work sessions and the relative (un)tagging actions. Such sessions will be restored at the next access to the interface, to allow data curators continuing their work. Figure 2. TagTick: User Interface. The TagTick User Interface is a general-purpose module that can be configured to adapt to the classes and to the structure of objects residing in the index. To this aim, the modules acquires this information from XML configuration files where data curators can specify: HIGH-PERFORMANCE ANNOTATION TAGGING OVER SOLR FULL-TEXT INDEXES | ARTINI ET AL 37 1. The names of the different classes, the values used to encode such classes in the index, and the index field used to contain such values; 2. The list of tag interpretations together with the relative ontologies: in the current implementation ontologies are flat sets of terms, which can be optionally populated by curators during the tagging step; and 3. The intended use of interpretations: the association between classes and interpretations. Once instantiated, the TagTick User Interface allows users to search for objects of all classes by means of free keywords and to refine such searches by class and by the tags relative to such class. This combination of predicates, which matches the query structure 𝑞� = 𝑞 𝐴𝑛𝑑 𝑞𝑡𝑎𝑔𝑠 𝑖𝑛 𝑠𝑒𝑠𝑠𝑖𝑜𝑛 ex- pected by the TagTick Virtualizer, is then executed by the module and the results presented in the interface. Users can then add or remove tags to the objects—the interface makes sure that the right interpretations are used for the given class. As an example, we shall consider the real-case instantiation of TagTick in the context of the HOPE project, whose aim is to deliver a data infrastructure capable of aggregating metadata records describing multimedia objects relative to labour history and located across several data sources.24 Such objects are collected, cleaned, and enriched to form an information space stored into a Solr index. The index stores two main classes of objects: descriptive units and digital resources. Descriptive unit objects contain properties describing cultural heritage objects (e.g., a pin). Digital resource objects instead describe the digital material representing the cultural heritage objects (e.g., the pictures of a pin). TagTick is currently used in the project HOPE to classify the aggregated objects according to two tag interpretations: “historical themes,” to tag descriptive units with an ontology of terms describing historical periods, and “export mode,” to tag digital resources with an ontology which describes the different social sites (e.g., YouTube, Facebook, Flickr) from which the resource must be made available from. In particular, figure 3 illustrates the HOPE TagTick user interface. In the screenshot, a set of descriptive units obtained by a query is being added a new tag “Communism . . .” of the tag interpretation “historical themes.” The TagTick User Interface offers the possibility to access the history of actions, in order to visualize their sequence, and possibly to undo their effects. Figure 4 shows the history of actions that led to the actual tag virtualization in the current work session. Curators can only rollback the last action they accomplished. This is because virtual tagging actions may be depending on each other; e.g., an action is based on a query that includes tag predicates whose tag has been affected by previous actions. Other approaches may infer the interdependencies between the queries behind the tagging actions and expose dependency-based undo options. INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2014 38 Figure 3. TagTick User Interface: Bulk Tagging Action. Figure 4. TagTick User Interface: Managing History of Actions. STRESS TESTS The motivations behind the realization of TagTick are to be found in annotation tagging requirements of bulk and real-time tagging. In general, the indexing speed of Solr highly depends on the underlying hardware, on the number of threads used for feeding, on the average size of the objects and their property values, and on the kind of text analysis to be adopted.25 However, even assuming the most convenient scenario, bulk indexing in Solr is comparably slow with respect to other technologies, such as relational databases,26 and far from being real-time. In this section, we present the result of stress tests conceived to provide concrete measures of query performance, i.e., the real-time effect, the scalability of the tool, and how many tagging actions can be handled in the same session. The idea of the tests is to re-create worst scenarios and give evidence of the ability of TagTick to cope and scale with response time and memory HIGH-PERFORMANCE ANNOTATION TAGGING OVER SOLR FULL-TEXT INDEXES | ARTINI ET AL 39 consumption. The experiments were run on a machine with processor Intel(R) Xeon(R) CPU E5630 @ 2.53GHz (4 cores), a total of memory 4 GB, and available disk of 100 GB (used at around 52 percent). The machine installs an Ubuntu 10.04.2 LTS operating system, with a Java Virtual Machine configured as -Xmx1800m -XX:MaxPermSize = 512m. In simpler terms, a medium-low server for a production index. The index was fed with 10 million objects randomly generated and with the following structure: [ identifier: String, title: String, description: String, publisher: String, URL: String, creator: String, date: Date, country: String, subject: Terms] The tag interpretation subject can be assigned values from an ontology Terms of scientific subjects, such as “Agricultural biotechnology,” “Automation,” “Biofuels,” “Biotechnology,” “Business aspects.” The objects are initially generated without tags. Each test defines a new session s with K tagging actions of the form action(tag, virtQuery(identifier <> ID,null), t, s) where ID is a random identifier and t is a random tag (subject, term). In practice, the action adds the tag t to all objects in the index, thereby generating docsets of size 10 million. Once the K actions are executed, the test returns the following measures: 1. The size of the heap space required to store K tags in memory. 2. The minimal, average, and maximum time required to reply to two kinds of stress queries to the index (calculated out of 100 queries): a. The query identifier <> ID AND(i,t)∈s(i = t): the query returns the objects in the index which feature all tags touched by the session. b. The query identifier <>ID OR(i,t)∈s(i = t): the query returns the objects in the index which feature at least one of the tags assigned in the session. In both cases, since tagging actions where applied to all objects in the index, the result will contain the full index. However, in one case the response will be calculated by intersecting docsets, while in the other case by unifying them. Note that by selecting a random identifier value (ID), the test makes sure that low-level Solr optimization by caching is not fired, as this would compromise the validity of the test. INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2014 40 3. The minimal, average, and maximum time required to reply to browse queries which involve all tags used in the session (calculated out of 100 queries). 4. The time required to reconstruct the session in memory whenever the data curator logs into TagTick. The results presented in figure 5 show that the average time for the execution of search and browse queries always remain under 2 seconds, which we can consider under the “real-time” threshold from the point of view of the users. User tests have been conducted in the context of the HOPE project, where curators were positively impressed by the tool. HOPE curators can today apply sequences of tagging operations over millions of aggregated records by means of a few clicks. Moreover, independently of the number of tagging operations, queries over the tagged records take about 2 seconds to complete. The execution time has a major increase from 0 tags to 1 tag. This behavior is expected because when there is 1 tag in the session, the 10 million records must be “patched.” From 1 tag onwards the execution time increases as well, but not at the same rate as in the previous case. This means that in the average case patching 10 million records with 100 tags does not cost much more than tagging them with 1 tag. Figure 5. Stress Test for TagTick Search and Browse Functionality. The results in figure 6 show that the amount of memory to be used does not exceed the limits expected on reasonable servers running a production system. The time required to reconstruct the sessions is generally long, starting from 20 seconds for 50 tags up to 1.5 minutes for 200 tags. HIGH-PERFORMANCE ANNOTATION TAGGING OVER SOLR FULL-TEXT INDEXES | ARTINI ET AL 41 On the other hand, this is a one-time operation, required only when logging in to the tool. Figure 6. Stress Test for Heap Size Growth and Session Restore Time. CONCLUSIONS In this paper, we presented TagTick, a tool devised to enable annotation tagging functionalities over Solr instances. The tool allows a data curator to safely apply and test bulk tagging and untagging actions over the index in almost real time and without compromising the activities of end users searching the index at the same time. This is possible thanks to the TagTick Virtualizer module, which implements a layer over Solr that enables real-time and virtual tagging by keeping in memory the inverted list of objects associated to a (un)tagging action. The layer is capable of parsing user queries to intercept the usage of tags kept in memory and, in this case, to manipulate the query response to deliver the set of objects expected after tagging. Future developments may regard the ability to enable more complex query parsing to handle rewriting of a larger set of queries beyond Google-like queries currently handled by the tool. Another interesting challenge is tag propagation. Curators may be interested in having the action of (un)tagging an object to be propagated to objects that are somehow related with the object. Handling this problem requires the inclusion into the information space model of relationships between classes of objects and the extension of the TagTick Virtualizer module for the specification and management of propagation policies. ACKNOWLEDGEMENTS The work presented in this paper has been partially funded by the European Commission FP7 eContentplus-2009 Best Practice Networks project HOPE (Heritage of the People’s Europe, http://www.peoplesheritage.eu), grant agreement 250549. http://www.peoplesheritage.eu/ INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2014 42 REFERENCES 1. Arkaitz Zubiaga, Christian Körner, and Markus Strohmaier, “Tags vs Shelves: From Social Tagging to Social Classification,” in Proceedings of the 22nd ACM conference on Hypertext and Hypermedia, 93–102 (New York: ACM, 2011), http://dx.doi.org/10.1145/1995966.1995981. 2. Meng Wang et al., “Assistive Tagging: A Survey of Multimedia Tagging with Human-Computer Joint Exploration,” ACM Computer Survey 44, no. 4 (September 2012): 25:1–24, http://dx.doi.org/10.1145/2333112.2333120. 3. Lin Chen et al., “Tag-Based Web Photo Retrieval Improved by Batch Mode Re-tagging,” in 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2010), 3440–46, http://dx.doi.org/10.1109/ CVPR.2010.5539988. 4. Emanuele Quintarelli, Andrea Resmini, and Luca Rosati, “Information Architecture: Facetag: Integrating Bottom-Up and Top-Down Classification in a Social Tagging System,” Bulletin of the American Society for Information Science & Technology 33, no. 5 (2007): 10–15, http://dx.doi.org/10.1002/bult.2007.1720330506. 5. Stijn Christiaens, “Metadata Mechanisms: From Ontology to Folksonomy . . . and Back,” in Lecture Notes in Computer Science: On the Move to Meaningful Internet Systems 2006: OTM 2006 Workshops (Berlin Heidelberg: Springer-Verlag, 2006). 6. M. Mahoui et al., “Collaborative Tagging of Art Digital Libraries: Who Should Be Tagging?” in Theory and Practice of Digital Libraries, ed. Panayiotis Zaphiris et al., 162–72, vol. 7489, Lecture Notes in Computer Science (Springer Berlin Heidelberg, 2012), http://dx.doi.org/10.1007/978-3-642-33290-6_18. 7. Alexandre Passant and Philippe Laublet, “Meaning Of A Tag: A Collaborative Approach to Bridge the Gap Between Tagging and Linked Data,” in Proceedings of the Linked Data on the Web (LDOW2008) Workshop at WWW2008, http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.142.6915. 8. Michael Khoo et al., “Towards Digital Repository Interoperability: The Document Indexing and Semantic Tagging Interface for Libraries (DISTIL),” in Theory and Practice of Digital Libraries, ed. Panayiotis Zaphiris et al., 439–44, vol. 7489, Lecture Notes in Computer Science (Springer Berlin Heidelberg, 2012), http://dx.doi.org/10.1007/978-3-642-33290-6_49. 9. Leonardo Candela, et al, “Setting the Foundations of Digital Libraries: The DELOS Manifesto.” D-Lib Magazine 13, no. 3/4, March/April 2007, http://dx.doi.org/10.1045/march2007- castelli. 10. Jennifer Trant, “Studying Social Tagging and Folksonomy: A Review and Framework,” Journal of Digital Information (January 2009), http://hdl.handle.net/10150/105375. http://dx.doi.org/10.1145/1995966.1995981 http://dx.doi.org/10.1145/2333112.2333120 http://dx.doi.org/10.1109/%20CVPR.2010.5539988 http://dx.doi.org/10.1002/bult.2007.1720330506 http://dx.doi.org/10.1007/978-3-642-33290-6_18 http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.142.6915 http://dx.doi.org/10.1007/978-3-642-33290-6_49 http://dx.doi.org/10.1045/march2007-castelli http://dx.doi.org/10.1045/march2007-castelli http://hdl.handle.net/10150/105375 HIGH-PERFORMANCE ANNOTATION TAGGING OVER SOLR FULL-TEXT INDEXES | ARTINI ET AL 43 11. Cameron Marlow et al., “HT06, tagging Paper, Taxonomy, Flickr, Academic Article, To Read,” in Proceedings of the Seventeenth Conference on Hypertext and Hypermedia, 31–40 (New York: ACM, 2006), http://dx.doi.org/10.1145/1149941.1149949. 12. Andrea Civan et al., “Better to Organize Personal Information by Folders or By Tags? The Devil is In the Details,” Proceedings of the American Society for Information Science and Technology 45, no. 1 (2008): 1–13, http://dx.doi.org/10.1002/meet.2008.1450450214. 13. Marianne Lykke et al., “Tagging Behaviour with Support from Controlled Vocabulary,” in Facest of Knowledge Organization, ed. Alan Gilchrist and Judi Vernau, 41–50 (Bingley, UK: Emerald Group, 2012) 14. Guus Schreiber et al., “Semantic Annotation and Search of Cultural-Heritage Collections: The MultimediaN E-Culture Demonstrator,” Web Semantics: Science, Services and Agents on the World Wide Web 6, no. 4 (2008): 243–49, http://dx.doi.org/10.1016/j.websem.2008.08.001. 15. Diana Maynard and Mark A. Greenwood, “Large Scale Semantic Annotation, Indexing and Search at the National Archives,” in Proceedings of LREC vol. 12 (2012). 16. Martin Feijen, “DRIVER: Building the Network for Accessing Digital Repositories Across Europe,” Ariadne 53 (October 2007), http://www.ariadne.ac.uk/issue53/feijen-et-al/. 17. Heritage of the People’s Europe (HOPE), http://www.peoplesheritage.eu/. 18. European Film Gateway Project, http://www.europeanfilmgateway.eu. 19. Paolo Manghi et al., “OpenAIREplus: The European Scholarly Communication Data Infrastructure,” D-Lib Magazine 18, no. 9–10 (September 2012), http://dx.doi.org/10.1045/september2012-manghi. 20. Panagiotis Antonopoulos et al., “Efficient Updates for Web-Scale Indexes over the Cloud,” in 2012 IEEE 28th International Conference on Data Engineering Workshops (ICDEW), 135–42, April 2012, http://dx.doi.org/10.1109/ICDEW.2012.51. 21. Chun Chen et al., “TI: An Efficient Indexing Mechanism for Real-Time Search on Tweets,” in Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, 649– 60 (New York: ACM, 2011), http://dx.doi.org/10.1145/1989323.1989391. 22. Rafal Kuc, Apache Solr 4 Cookbook (Birmingham, UK: Packt, 2013). 23. David Smiley and Eric Pugh, Apache Solr 3 Enterprise Search Server (Birmingham, UK: Packt, 2011). 24. The HOPE Portal: The Social History Portal, http://www.socialhistoryportal.org/timeline- map-collections. http://dx.doi.org/10.1145/1149941.1149949 http://dx.doi.org/10.1002/meet.2008.1450450214 http://dx.doi.org/10.1016/j.websem.2008.08.001 http://www.ariadne.ac.uk/issue53/feijen-et-al/ http://www.peoplesheritage.eu/ http://www.europeanfilmgateway.eu/ http://dx.doi.org/10.1045/september2012-manghi http://dx.doi.org/10.1109/ICDEW.2012.51 http://dx.doi.org/10.1145/1989323.1989391 http://www.socialhistoryportal.org/timeline-map-collections http://www.socialhistoryportal.org/timeline-map-collections INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2014 44 25. Assuming to operate a stand-alone instance of Solr, hence not relying on Solr sharding techniques with parallel feeding. 26. WhyUseSolr—Solr Wiki, http://wiki.apache.org/solr/WhyUseSolr. 4635 ---- Usability Test Results for Encore in an Academic Library Megan Johnson INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2013 59 ABSTRACT This case study gives the results a usability study for the discovery tool Encore Synergy, an Innovative Interfaces product, launched at Appalachian State University Belk Library & Information Commons in January 2013. Nine of the thirteen participants in the study rated the discovery tool as more user friendly, according to a SUS (Standard Usability Scale) score, than the library’s tabbed search layout, which separated the articles and catalog search. All of the study’s participants were in favor of switching the interface to the new “one box” search. Several glitches in the implementation were noted and reported to the vendor. The study results have helped develop Belk library training materials and curricula. The study will also serve as a benchmark for further usability testing of Encore and Appalachian State Library’s website. This article will be of interest to libraries using Encore Discovery Service, investigating discovery tools, or performing usability studies of other discovery services. INTRODUCTION Appalachian State University’s Belk Library & Information Commons is constantly striving to make access to libraries resources seamless and simple for patrons to use. The library’s technology services team has conducted usability studies since 2004 to inform decision making for iterative improvements. The most recent versions (since 2008) of the library’s website have featured a tabbed layout for the main search box. This tabbed layout has gone through several iterations and a move to a new Content Management System (Drupal). During fall semester 2012, the library website’s tabs were: Books & Media, Articles, Google Scholar, and Site Search (see figure 1). Some issues with this layout, documented in earlier usability studies and through anecdotal experience, will be familiar to other libraries who have tested a tabbed website interface. User access issues include the belief of many patrons that the “articles” tab looked for all articles the library had access to. In reality the “articles” tab searched seven EBSCO databases. Belk Library has access to over 400 databases. Another problem noted with the tabbed layout was that patrons often started typing in the articles box, even when they knew they were looking for a book or DVD. This is understandable, since when most of us see a search box we just start typing, we do not read all the information on the page. Megan Johnson (johnsnm@appstate.edu) is E-Learning and Outreach Librarian, Belk Library and Information Commons, Appalachian State University, Boone, NC. mailto:johnsnm@appstate.edu USABILITY TEST RESULTS FOR ENCORE IN AN ACADEMIC LIBRARY | JOHNSON 60 Figure 1. Appalachian State University Belk Library website tabbed layout search, December 2012. A third documented user issue is confusion over finding an article citation. This is a rather complex problem, since it has been demonstrated through assessment of student learning that many students cannot identify the parts of a citation, so this usability issue goes beyond the patron being able navigate the library’s interface, it is partly a lack of information literacy skills. However, even sophisticated users can have difficulty in determining if the library owns a particular journal article. This is an ongoing interface problem for Belk Library and many other academic libraries. Google Scholar (GS) often works well for users with a journal citation, since on campus they can often simply copy and paste a citation to see if the library has access, and, if so, the full text it is often is available in a click or two. However, if there are no results found using GS, the patrons are still not certain if the library owns the item. BACKGROUND In 2010, the library formed a task force to research the emerging market of discovery services. The task force examined Summon, EBSCO Discovery Service, Primo and Encore Synergy and found the products, at that time, to still be immature and lacking value. In April 2012, the library reexamined the discovery market and conducted a small benchmarking usability study (the results are discussed in the methodology section and summarized in appendix A). The library felt enough improvements had been made to Innovative Interface’s Encore INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2013 61 Synergy product to justify purchasing this discovery service. An Encore Synergy Implementation Working Group was formed, and several subcommittees were created, including end-user preferences, setup & access, training, and marketing. To help inform the decision of these subcommittees, the author conducted a usability study in December 2012, which was based on, and expanded upon, the April 2012 study. The goal of this study was to test users’ experience and satisfaction with the current tabbed layout, in contrast to the “one box” Encore interface. The library had committed to implementing Encore Synergy, but there are options in layout of the search box on the library’s homepage. If users expressed a strong preference for tabs, the library could choose to leave a tabbed layout for access to the articles part of Encore, for the catalog part, and create tabs for other options like Google Scholar, and a search of the library’s website. A second goal of the study was to benchmark the user experience for the implementation of Encore synergy so that, over time, improvements could be made to promote seamless access to Appalachian State University library’s resources. A third goal of this study was to document problems users encountered and report them to Innovative. Figure 2. Appalachian State University Belk Library website Encore Search, January 2013. USABILITY TEST RESULTS FOR ENCORE IN AN ACADEMIC LIBRARY | JOHNSON 62 LITERATURE REVIEW There have been several recent reviews of the literature on library discovery services. Thomsett- Scott and Reese conclude that discovery tools are a mixed blessing. 1 Users can easily search across abroad areas of library resources and limiting by facets is helpful. Downsides include loss of individual database specificity and user willingness to look beyond the first page of results. Longstanding library interface problems, such as patrons’ lack of understanding of holding statements, and knowing when to it is appropriate to search in a discipline specific database are not solved by discovery tools.2 In a recent overview of discovery services, Hunter lists four vendors whose products have both a discovery layer and a central index: EBSCO’s Discovery Service (EDS); Ex Libris’ Primo Central Index; Serials Solutions’ Summon; and OCLC’s WorldCat Local (WCL). 3 Encore does not have currently offer a central index or pre-harvested metadata for articles, so although Encore has some of the features of a discovery service, such as facets and connections to full text, it is important for libraries considering implementing Encore to understand that the part of Encore that searches for articles is a federated search. When Appalachian purchased Encore, not all the librarians and staff involved in the decision making were fully aware of how this would affect the user experience. Further discussion of this in the “glitches revealed” section. Fagan et al. discuss James Madison University’s implementation of EBSCO Discovery Service and their customizations of the tool. They review the literature of discovery tools in several areas, including articles that discuss the selection processes, features, and academic libraries’ decisions process following selection. They conclude, the “literature illustrates a current need for more usability studies related to discovery tools.” 4 The most relevant literature to this study are case studies documenting a library’s experience with implementing a discovery services and task based usability studies of discovery services. Thomas and Buck5 sought to determine with a task based usability study whether users were as successful performing common catalog-related tasks in WorldCat Local (WCL) as they are in the library’s current catalog, Innovative Interfaces’ WebPAC. The study helped inform the library’s decision, at that time, to not implement WCL. Beecher and Schmidt6 discuss American University’s comparison of WCL and Aquabrowser (two discovery layers), which were implemented locally. The study focused on user preferences based on students “normal searching patterns” 7 rather than completion of a list of tasks. Their study revealed undergraduates generally preferred WCL, and upperclassmen and graduates tended to like Aquabrower better. Beecher and Schmidt discuss the research comparing assigned tasks versus user-defined searches, and report that a blend of these techniques can help researchers understand user behavior better.8 INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2013 63 This article reports on a task-based study, in which the last question asks the participant to research something they had looked for within the past semester, and the results section indicates that the most meaningful feedback came from watching users research a topic they had a personal interest in. Having assigned tasks also can be very useful. For example, an early problem noted with discovery services was poor search results for specific searches on known items, such as the book “The Old Man and the Sea.” Assigned tasks also give the user a chance to explore a system for a few searches, so when they search for a topic of personal interest, it is not their first experience with a new system. Blending assigned tasks with user tasks proved helpful in this study’s outcomes. Encore Synergy has not yet been the subject of a formally published task-based usability study. Allison reports on an analysis of Google Analytic statistics at University of Nebraska-Lincoln after Encore was implemented.9 The article concludes that Encore increases the user’s exposure to all the library’s holdings, describes some of the challenges UNL faced and gives recommendations for future usability studies to evaluate where additional improvements should be made. The article also states UNL plans to conduct future usability studies. Although there are not yet formal published task-based studies on Encore, at least one blogger from Southern New Hampshire University documented their implementation of the service. Singley reported in 2011, “Encore Synergy does live up to its promise in presenting a familiar, user-friendly search environment.10 She points out, “To perform detailed article searches, users still need to link out to individual databases.” This study confirms that users do not understand that articles are not fully indexed and integrated; articles remain, in Encore’s terminology, in “database portfolios.” See the results section, task 2, for a fuller discussion of this topic. METHOD This study included a total of 13 participants. These included four faculty members, and six students recruited through a posting on the library’s website offering participants a bookstore voucher. Three student employees were also subjects (these students work in the library’s mailroom and received no special training on the library’s website). For the purposes of this study, the input of undergraduate students, the largest target population of potential novice users, was of most interest. Table 3 lists demographic details of the student or faculty’s college, and for students, their year. This was a task-based study, where users were asked to find a known book item and follow two scenarios to find journal articles. The following four questions/tasks were handed to the users on a sheet of paper: 1. Find a copy of the book the Old Man and the Sea. 2. In your psychology class, your professor has assigned you a 5-page paper on the topic of eating disorders and teens. Find a scholarly article (or peer-reviewed) that explores the relation between anorexia and self-esteem. http://www.snhu.edu/ USABILITY TEST RESULTS FOR ENCORE IN AN ACADEMIC LIBRARY | JOHNSON 64 3. You are studying modern Chinese history and your professor has assigned you a paper on foreign relations. Find a journal article that discusses relations between China and the US. 4. What is a topic you have written about this year? Search for materials on this topic. The follow up questions where verbally asked either after a task, or asked as prompts while the subject was working. 1. After the first task (find a copy of the book The Old Man and the Sea) when the user finds the book in APPsearch, ask: “Would you know where to find this book in the library?” 2. How much of the library’s holdings do you think APPsearch/ Articles Quick Search is looking across? 3. Does “peer reviewed” mean the same as “scholarly article”? 4. What does the “refine by tag” block the right mean to you? 5. If you had to advise the library to either stay with a tabbed layout, or move to the one search box, what would you recommend? Participants were recorded using Techsmith’s screen-casting software Camtasia, which allows the user’s face to be recorded along with their actions on the computer screen. This allows the observer to not rely solely on notes or recall. If the user encounters a problem with the interface, having the session recorded makes it simple to create (or recreate) a clip to show the vendor. In the course of this study, several clips were sent to Innovative Interfaces, and they were responsive to many of the issues revealed. Further discussion is in the “glitches revealed” section. Seven of the subjects first used the library site’s tabbed layout (which was then the live site) as seen in figure 1. After they completed the tasks, participants filled in a System Usability Scale (SUS) form. The users then completed the same tasks on the development server using Encore Synergy. Participants next filled out a SUS form to reflect their impression of the new interface. Encore is locally branded as APPsearch and the terms are used interchangeably in this study. The six other subjects started with the APPsearch interface on a development server, completed a SUS form, and then did the same tasks using the library’s tabbed interface. The time it took to conduct the studies was ranged from fifteen to forty minutes per participant, depending on how verbal the subject was, and how much they wanted to share about their impressions and ideas for improvement. Jakob Nielson has been quoted as saying you only need to test with five users: “After the fifth user, you are wasting your time by observing the same findings repeatedly but not learning much new.”11 He argues for doing tests with a small number of users, making iterative improvements, and then retesting. This is certainly a valid and ideal approach if you have full control of the design. In the case of a vendor-controlled product, there are serious limitations to what the INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2013 65 librarians can iteratively improve. The most librarians can do is suggest changes to the vendor, based on the results of studies and observations. When evaluating discovery services in the spring of 2012, Appalachian State Libraries conducted a four person task based study (see Appendix A), which used University of Nebraska at Lincoln’s implementation of Encore as a test site to benchmark our students’ initial reaction to the product in comparison to the library’s current tabbed layout. In this small study, the average SUS score for the library’s current search box layout was 62, and for UNL’s implementation of Encore, it was 49. This helped inform the decision of Belk Library, at that time, not to purchase Encore (or any other discovery service), since students did not appear to prefer them. This paper reports on a study conducted in December 2012 that showed a marked improvement in users’ gauge of satisfaction with Encore. Several factors could contribute to the improvement in SUS scores. First is the larger sample size of 13 compared to the earlier study with four participants. Another factor is in the April study, participants were using an external site they had no familiarity with, and a first experience with a new interface is not a reliable gauge of how someone will come to use the tool over time. This study was also more robust in that it added the task of asking the user to search for something they had researched recently and the follow up questions were more detailed. Overall it appears that, in this case, having more than four participants and a more robust design gave a better representation of user experience. The System Usability Scale (SUS) The System Usability Scale has been widely used in usability studies since its development in 1996. Many libraries use this tool in reporting usability results.12,13 It is simple to administer, score, and understand the results.14 SUS is an industry standard with references in over 600 publications.15 An “above average” score is 68. Scoring a scale involves a formula where odd items have one subtracted from the user response, and with even numbered items, the user response is subtracted from five. The total converted responses are added up, and then multiplied by 2.5. This makes the answers easily grasped on the familiar scale of 1-100. Due to the scoring method, it is possible that results are expressed with decimals.16 A sample SUS scale is included in Appendix D. RESULTS The average SUS score for the 13 users for Encore was 71.5, and for the tabbed layout, the average SUS score was 68. This small sample set indicates there was a user preference for the discovery service interface. In a relatively small study like this, these results do not imply a scientifically valid statistical measurement. As used in this study, the SUS scores are simply a way to benchmark how “usable” the participants rated the two interfaces. When asked the subjective follow up question, “If you had to advise the library to either stay with a tabbed layout, or move to the one search box, what would you recommend?” 100% of the participants recommended the library change to APPsearch, (although four users actually rated USABILITY TEST RESULTS FOR ENCORE IN AN ACADEMIC LIBRARY | JOHNSON 66 the tabbed layout with a higher SUS score). These four participants said things along the lines of, “I can get used to anything you put up.” Participant SUS SUS Year and major or College APPsearch First Encore Tabbed layout Student A 90 70 Senior/Social Work/Female No Student B 95 57.5 Freshman/Undeclared/Male Yes Student C 82.5 57.5 Junior/English/Male Yes Student D 37.5 92 Sophomore/Actuarial Science/Female Yes Student E 65 82.5 Junior/Psychology/Female Yes Student F 65 77.5 Senior/Sociology/Female No Student G 67.5 75 Junior/Music Therapy/Female No Student H 90 82.5 Senior/Dance/Female No Student I 60 32.5 Senior/Political Science/Female No Faculty A 40 87.5 Family & Consumer/Science/Female Yes Faculty B 80 60 English/Male No Faculty C 60 55 Education/Male No Faculty D 97.5 57.5 English/Male Yes Average 71.5 68 Table 1. Demographic details and individual and average SUS scores. DISCUSSION Task 1: “Find a copy of the book The Old Man and the Sea.” All thirteen users had faster success using Encore. When using Encore, this “known item” is in the top three results. Encore definitely performed better than the classic catalog in saving the time of the user. In approaching task 1 from the tabbed layout interface, four out of thirteen users clicked on the books and media tab, changed the drop down search option to “title,” and were (relatively) quickly successful. The remaining nine who switched to the books and media tab and used the default keyword search for “the old man and the sea” had to scan the results (using this search method, the book is the seventh result in the classic catalog), which took two users almost 50 seconds. This length of time, for an “average user” to find a well-known book is not considered to be acceptable to the technology services team at Appalachian State University. When using the Encore interface, the follow up question for this task was, “would you know where to find this book in the library?” Nine out of 13 users did not know where the book would be, or INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2013 67 how to find it. The three faculty members and student D could pick out the call number and felt they could locate the book in the stacks. Figure 3. Detail of the screen of results for searching for “The Old Man and the Sea”. The classic catalog that most participants were familiar with has a “map it” feature (from the third party vendor StackMap), and Encore did not have that feature incorporated yet. Since this study has been completed, the “map it” has been added to the item record in APPSearch. Further research can determine if students will have a higher level of confidence in their ability to locate a book in the stacks when using Encore. Figure 3 shows the search as it appeared in December 2012 and figure 4 has the “map it” feature implemented and pointed out with a red arrow. Related to this task of searching for a known book, student B commented that in Encore, the icons were very helpful in picking out media type. Figure 4. Book item record in Encore. The red arrow indicates the “Map it” feature, an add-on to the catalog from the vendor StackMap. Browse results are on the right, and only pull from the catalog results. When using the tabbed layout interface (see Figure 1), three students typed the title of the book into the “articles” tab first, and it took them a few moments figure out why they had a problem with the results. They were able to figure it out and re-do the search in the “correct” Books & USABILITY TEST RESULTS FOR ENCORE IN AN ACADEMIC LIBRARY | JOHNSON 68 Media tab, but student D commented, “I do that every time!” This is evidence that the average user does not closely examine a search box--they simply start typing. Task 2: “In your psychology class, your professor has assigned you a five-page paper on the topic of eating disorders and teens. Find a scholarly article (or peer-reviewed) that explores the relation between anorexia and self-esteem.” This question revealed, among other things, that seven out of the nine students did not fully understand the term scholarly or peer reviewed article are meant to be synonyms in this context. When asked the follow up question “what does ‘peer reviewed’ mean to you?” Student B said, “My peers would have rated it as good on the topic.” This is the kind of feedback that librarians and vendors need to be aware of in meeting students’ expectations. Users have become accustom to online ratings by their peers of hotels and restaurants, so the terminology academia uses may need to shift. Further discussion on this is in the “changes suggested” section below. Figure 5. Typical results for task two. Figure 5 shows a typical user result for task 2. The follow up question asked users “what does the refine by tag box on the right mean to you?” Student G reported they looked like Internet ads. Other users replied with variations of, “you can click on them to get more articles and stuff.” In fact, the “refine by tag” box in the upper right column top of screen contains only indexed terms from the subject heading of the catalog. This refines the current search results to those with the specific subject term the user clicked on. In this study, no user clicked on these tags. INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2013 69 For libraries considering purchasing and implementing Encore, a choice of skins is available, and it is possible to choose a skin where these boxes do not appear. In addition to information from Innovative Interfaces, libraries can check a guide maintained by a librarian at Saginaw Valley State University17 to see examples of Encore Synergy sites, and links to how different skins (cobalt, pearl or citrus) affect appearance. Appalachian uses the “pearl” skin. Figure 6. Detail of screenshot in Figure 5. Figure 6 is a detail of the results shown in the screenshot for average search for task 2. The red arrows indicate where a user can click to just see article results. The yellow arrow indicates where the advanced search button is. Six out of thirteen users clicked advanced after the initial search results. Clicking on the advanced search button brought users to a screen pictured in figure 7. USABILITY TEST RESULTS FOR ENCORE IN AN ACADEMIC LIBRARY | JOHNSON 70 Figure 7. Encore's advanced search screen. Figure 7 shows the Encore’s advanced search screen. This search is not designed to search articles; it only searches the catalog. This aspect of advanced search was not clear to any of the participants in this study. See further discussion of this issue in the “glitches revealed” section. INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2013 71 Figure 8. The "database portfolio" for Arts & Humanities. Figure 8 shows typical results for task 2 limited just to articles. The folders on the left are basically silos of grouped databases. Innovative calls this feature “database portfolios.” In this screen shot, the results of the search narrowed to articles within the “database portfolio” of Arts & Humanities. Clicking on the individual databases return results from that database, and moves the USABILITY TEST RESULTS FOR ENCORE IN AN ACADEMIC LIBRARY | JOHNSON 72 user to the database’s native interface. For example, in Figure 8, clicking on Art Full Text would put the user into that database, and retrieve 13 results. While conducting task 2, faculty member A stressed she felt it was very important students learn to use discipline specific databases, and stated she would not teach a “one box” approach. She felt the tabbed layout was much easier than APPSearch and rated the tabbed layout in her SUS score with a 87.5 versus the 40 she gave Encore. She also wrote on the SUS scoring sheet “APPsearch is very slow. There is too much to review.” She also said that the small niche showing how to switch results between “Books & More” to Article was “far too subtle.” She recommended bold tabs, or colors. This kind of suggestion librarians can forward to the vendor, but we cannot locally tweak this layout on a development server to test if it improves the user experience. Figure 9. Closeup of switch for “Books & More” and “Articles” options. Task 3: “You are studying modern Chinese history and your professor has assigned you a paper on foreign relations. Find a journal article that discusses relations between China and the US.” Most users did not have much difficulty finding an article using Encore, though three users did not immediately see a way to limit only to articles. Of the nine users who did narrow the results to articles, five used facets to further narrow results. No users moved beyond the first page of results. Search strategy was also interesting. All thirteen users appeared to expect the search box to work like Google. If there were no results, most users went to the advanced search, and reused the same terms on different lines of the Boolean search box. Once again, no users intuitively understood that “advanced search” would not effectively search for articles. The concept of changing search terms was not a common strategy in this test group. If very few results came up, none of the users clicked on the “did you mean” or used suggestions for correction in spelling or change in terms supplied by Encore. During this task, two faculty members commented on load time. They said students would not wait, results had to be instant. But when working with students, when the author asked how they felt when load time was slow, students almost all said it was fine, or not a problem. They could “see it was working.” One student said, “Oh, I’d just flip over to Facebook and let the search run.” So perhaps librarians should not assume we fully understand student user expectations. It is also INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2013 73 worth noting that, for the participant, this is a low-stakes usability study, not crunch time, so attitudes may be different if load time is slow for an assignment due in a few hours. Task 4: “What is a topic you have written about this year? Search for materials on this topic.” This question elicited the most helpful user feedback, since participants had recently conducted research using the library’s interface and could compare ease of use on a subject they were familiar with. A few specific examples follow. Student A, in response to the task to research something she had written about this semester, looked for “elder abuse.” She was a senior who had taken a research methods class and written a major paper on this topic, and she used the tabbed layout first. She was familiar with using the facets in EBSCO to narrow by date, and to limit to scholarly articles. When she was using APPsearch on the topic of elder abuse, Encore held her facets “full text” and “peer reviewed” from the previous search on China and U.S. Foreign relations. An example of Encore “holding a search” is demonstrated in figures 10 and 11 below. Student A was not bothered by the Encore holding limits she had put on a previous search. She noticed the limits, and then went on to further narrow within the database portfolio of “health” which limited the results to the database CINAHL first. She was happy with being able to limit by folder to her discipline. She said the folders would help her sort through the results. Student G’s topic she had researched within the last semester was “occupational therapy for students with disabilities” such as cerebral palsy. She understood through experience, that it would be easiest to narrow results by searching for ‘occupational therapy’ and then add a specific disability. Student G was the user who made the most use of facets on the left. She liked Encore’s use of icons for different types of materials. Student B also commented on “how easy the icons made it.” Faculty B, in looking for the a topic he had been researching recently in APPsearch, typed in “Writing Across the Curriculum glossary of terms” and got no results on this search. He said, “mmm, well that wasn’t helpful, so to me, that means I’d go through here” and he clicked on the Google search box in the browser bar. He next tried removing “glossary of terms” from his search and the load time was slow on articles, so he gave up after ten seconds and clicked on “advanced search” and tried putting “glossary of terms” in the second line. This led to another dead end. He said, “I’m just surprised Appalachian doesn’t have anything on it.” The author asked if he had any other ideas about how to approach finding materials on his topic from the library’s homepage and he said no, he would just try Google (in other words, navigating to the group of databases for Education was not a strategy that occurred to him). USABILITY TEST RESULTS FOR ENCORE IN AN ACADEMIC LIBRARY | JOHNSON 74 The faculty member D had been doing research on a relatively obscure historical event and was able to find results using Encore. When asked if he had seen the articles before, he said, “Yes, I’ve found these, but it is great it’s all in one search!” Glitches revealed It is of concern for the user experience that the advanced search of Encore does not search articles; it only searches the catalog. This was not clear to any participant in this study. As noted earlier, Encore’s article search is a federated search. This affects load time for article results, and also puts the article results into silos, or to use Encore’s terminology, “database portfolios.” Encore’s information on their website definitely markets the site as a discovery tool, saying, it “integrates federated search, as well as enriched content—like first chapters—and harvested data… Encore also blends discovery with the social web. 18” It is important for libraries considering purchase of Encore that while it does have many features of a discovery service, it does not currently have a central index with pre-harvested metadata for articles. If Innovative Interfaces is going to continue to offer an advanced search box, it needs to be made explicitly clear that the advanced search is not effective for searching for articles, or Innovative Interfaces needs to make an advanced search work with articles by creating a central index. To cite a specific example from this study, when Student E was using AppSearch, with all the tasks, after she ran a search, she clicked on the advanced search option. The author asked her, “So if there is an advanced search, you’re going to use it?” The student replied, “yeah, they are more accurate.” Another aspect of Encore that users do not intuitively grasp is that when looking at the results for an article search, the first page of results comes from a quick search of a limited number of databases (see Figure 8). The users in this study did understand that clicking on the folders will narrow by discipline, but they did not appear to grasp that the result in the database portfolios are not included in the first results shown. When users click on an article result, they are taken to the native interface (such as Psych Info) to view the article. Users seemed un-phased when they went into a new interface, but it is doubtful they understand they are entering a subset of APPsearch. If users try to add terms or do a new search in the native database they may get relevant results, or may totally strike out, depending on chosen database’s relevance to their research interest. INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2013 75 Figure 10. Changing a search in Encore. Another problem that was documented was that after users ran a search, if they changed the text in the “Search” box, the results for articles did not change. Figure six demonstrates the results from task 2 of this study, which asks users to find information on anorexia and self-esteem. The third task asks the user to find information on China and foreign relations. Figure 10 demonstrates the results for the anorexia search, with the term “china” in the search box, just before the user clicks enter, or the orange arrow for new search. Figure 11. Search results for changed search. Figure 11 show that the search for the new term, “China” has worked in the catalog, but the results for articles are still about anorexia. In this implementation of Encore, there is no “new search button” (except in the advanced search page, there is a “reset search” button, see Figure 7) and USABILITY TEST RESULTS FOR ENCORE IN AN ACADEMIC LIBRARY | JOHNSON 76 refreshing the browser is had no effect on this problem. This issue was screencast19 and sent to the vendor. Happily, as of April 2013, Innovative Interfaces appears to have resolved this underlying problem. One purpose of this study was to determine if users had a strong preference for tabs, since the library could choose to implement Encore with tabs (one for access to articles, one for the catalog, and other tab options like Google Scholar). This study indicated users did not like tabs in general, they much preferred a “one box solution” on first encounter. A major concern raised was the user’s response to the question, “How much of the library’s holdings do you think APPsearch/ Articles Quick Search is looking across?” Twelve out of thirteen users believed that when they were searching for articles from the Quick Search for articles tabbed layout, they were searching all the library databases. The one exception to this was a faculty member in the English department, who understood that the articles tab searched a small subset of the available resources (seven EBSCO databases out of 400 databases the library subscribes to). All thirteen users believed APPsearch (Encore) was searching “everything the library owned.” The discovery service searches far more resources than other federated searches the library has had access to in the past, but it is still only searching 50 out of 400 databases. It is interesting that in the Fagan et al. study of EBSCO’s Discovery Service, only one out of ten users in that study believed the quick search would search “all” the library’s resources.20 A glance at James Madison University’s library homepage21 suggests wording that may improve user confusion. Figure 12. Screenshot of James Madison Library Homepage, accessed December 18, 2012. INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2013 77 Figure 13. Original Encore interface as implemented in January 2013. Given the results that 100% of the users believed that APPsearch looked at all databases the library has access to, the library made changes to the wording in the search box. (See figure 7). Future tests can determine if this has any positive effect on the understanding of what APPsearch includes. Figure 14. Encore search box after this usability study was completed. The arrow highlights additions to the page as a result of this study. Some other wording changes suggested were from the finding that only seven out of nine students fully understood that “peer reviewed” would limit to scholarly articles. A suggestion was made to Innovative Interfaces to change the wording to “Scholarly (Peer Reviewed)” and they did so in early January. Although Innovative’s response on this issue was swift, and may help students, changing the wording does not address the underlying information literacy issue of what students understand about these terms. Interestingly, Encore does not include any “help” pages. Appalachian’s liaison with Encore has asked about this and been told by Encore tech support that Innovative feels the product is so intuitive; users will not need any help. Belk Library has developed a short video tutorial for users, and local help pages are available from the library’s homepage, but according to Innovative, a link to these resources cannot be added to the top right area of the Encore screen (where help is commonly located in web interfaces). Although it is acknowledged that few users actually read “help” pages, it seems like a leap of faith to think a motivated searcher will understand things like the “database portfolios” (see Figures 9) without any instruction at all. After implementation, the USABILITY TEST RESULTS FOR ENCORE IN AN ACADEMIC LIBRARY | JOHNSON 78 librarians here at Appalachian conducted internally developed training for instructors teaching APPsearch, and all agreed that understanding what is being searched and how to best perform a task such as an advanced article search is not “totally intuitive,” even for librarians. Finally, some interesting search strategy patterns were revealed. On the second and third questions in the script (both having to do with finding articles) five of the thirteen participants had the strategy of putting in one term, then after the search ran, adding terms to narrow results using the advanced search box. Although this is a small sample set, it was a common enough search strategy to make the author believe this is not an unusual approach. It is important for librarians and for vendors to understand how users approach search interfaces so we can meet expectations. Further Research The findings of this study suggest librarians will need to continue to work with vendors to improve discovery interfaces to meet users expectations. The context of what is being searched and when is not clear to beginning users in Encore One aspect of this test was it was the participants’ first encounter with a new interface, and even Student D, who was unenthused about the new interface (she called the results page “messy, and her SUS score was 37.5 for Encore, versus 92 for the tabbed layout) said that she could learn to use the system given time. Further usability tests can include users who have had time to explore the new system. Specific tasks that will be of interest in follow up studies of this report are if students have better luck in being able to know where to find the item in the stacks with the addition of the “map it” feature. Locally, librarian perception is that part of the problem with this results display is simply visual spacing. The call number is not set apart or spaced so that it stands out as important information (see figure 5 for a screenshot). Another question to follow up on will be to repeat the question, “How much of the library’s holdings do you think APPsearch is looking across?” All thirteen users in this study believed APPsearch was searching “everything the library owned.” Based on this finding, the library made small adjustments to the initial search box (see figures 14 and 15 as illustration). It will be of interest to measure if this tweak has any impact. SUMMARY All users in this study recommended that the library move to Encore’s “one box” discovery service instead of using a tabbed layout. Helping users figure out when they should move to using discipline specific databases will most likely be a long-term challenge for Belk Library, and for other academic libraries using discovery services, but this will probably trouble librarians more than our users. INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2013 79 The most important change Innovative Interfaces could make to their discovery service is to create a central index for articles, which would improve load time and allow for an advanced search feature for articles to work efficiently. Because of this study, Innovative Interfaces made a wording change in search results for article to include the word “scholarly” when describing peer reviewed journal articles in Belk Library’s local implementation. Appalachian State University libraries will continue to conduct usability studies and tailor instruction and e-learning resources to help users navigate Encore and other library resources. Overall, it is expected users, especially freshman and sophomores, will like the new interface but will not be able to figure out how to improve search results, particularly for articles. Belk Library & Information Commons’ instruction team is working on help pages and tutorials, and will incorporate the use of Encore into the library’s curricula. REFERENCES 1 . Thomsett-Scott, Beth, and Patricia E. Reese. "Academic Libraries and Discovery Tools: A Survey of the Literature." College & Undergraduate Libraries 19 (2012): 123-43. 2. Ibid, 138. 3. Hunter, Athena. “The Ins and Outs of Evaluating Web-Scale Discovery Services” Computers in Libraries 32, no. 3 (2012) http://www.infotoday.com/cilmag/apr12/Hoeppner-Web-Scale- Discovery-Services.shtml (accessed March 18, 2013) 4. Fagan, Jody Condit, Meris Mandernach, Carl S. Nelson, Jonathan R. Paulo, and Grover Saunders. "Usability Test Results for a Discovery Tool in an Academic Library." Information Technology & Libraries 31, no. 1 (2012): 83-112. 5. Thomas, Bob., and Buck, Stephanie. OCLC's WorldCat local versus III's WebPAC. Library Hi Tech, 28(4) (2010), 648-671. doi: http://dx.doi.org/10.1108/07378831011096295 6. Becher, Melissa, and Kari Schmidt. "Taking Discovery Systems for a Test Drive." Journal Of Web Librarianship 5, no. 3: 199-219 [2011]. Library, Information Science & Technology Abstracts with Full Text, EBSCOhost (accessed March 17, 2013). 7. Ibid, p. 202 8. Ibid p. 203 9. Allison, Dee Ann, “Information Portals: The Next Generation Catalog,” Journal of Web Librarianship 4, no. 1 (2010): 375–89, http://digitalcommons.unl.edu/cgi/viewcontent.cgi?article=1240&context=libraryscience (accessed March 17, 2013) http://www.infotoday.com/cilmag/apr12/Hoeppner-Web-Scale-Discovery-Services.shtml http://www.infotoday.com/cilmag/apr12/Hoeppner-Web-Scale-Discovery-Services.shtml http://dx.doi.org/10.1108/07378831011096295 USABILITY TEST RESULTS FOR ENCORE IN AN ACADEMIC LIBRARY | JOHNSON 80 10. Singley, Emily. 2011 “Encore Synergy 4.1: A review” The cloudy librarian: musings about library technologies http://emilysingley.wordpress.com/2011/09/17/encore-synergy-4-1-a- review/ [accessed March 20, 2013]. 11 . Nielson, Jakob. 2000. “Why You Only Need to Test with 5 Users” http://www.useit.com/alertbox/20000319.html (accessed December 18, 2012]. 12. Fagan et al, 90. 13. Dixon, Lydia, Cheri Duncan, Jody Condit Fagan, Meris Mandernach, and Stefanie E. Warlick. 2010. "Finding Articles and Journals via Google Scholar, Journal Portals, and Link Resolvers: Usability Study Results." Reference & User Services Quarterly no. 50 (2):170-181. 14. Bangor, Aaron, Philip T. Kortum, and James T. Miller. 2008. "An Empirical Evaluation of the System Usability Scale." International Journal of Human-Computer Interaction no. 24 (6):574-594. doi: 10.1080/10447310802205776. 15. Sauro, Jeff. 2011. “Measuring Usability With the System Usability Scale (SUS)” http://www.measuringusability.com/sus.php. [accessed December 7, 2012]. 16. Ibid. 17. Mellendorf, Scott. “Encore Synergy Sites” Zahnow Library, Saginaw Valley State University. http://librarysubjectguides.svsu.edu/content.php?pid=211211 (accessed March 23, 2013). 18. Encore overview, “http://encoreforlibraries.com/overview/” (accessed March 21, 2013). 19. Johnson, Megan. Videorecording made with Jing on January 30, 2013 http://www.screencast.com/users/Megsjohnson/folders/Jing/media/0ef8f186-47da-41cf- 96cb-26920f71014b 20. Fagan et al. 91. 21. James Madison University Libraries, “http://www.lib.jmu.edu” (accessed December 18, 2012). http://emilysingley.wordpress.com/ http://emilysingley.wordpress.com/2011/09/17/encore-synergy-4-1-a-review/ http://emilysingley.wordpress.com/2011/09/17/encore-synergy-4-1-a-review/ http://www.useit.com/alertbox/20000319.html http://www.measuringusability.com/sus.php http://librarysubjectguides.svsu.edu/content.php?pid=211211 http://encoreforlibraries.com/overview/ http://www.screencast.com/users/Megsjohnson/folders/Jing/media/0ef8f186-47da-41cf-96cb-26920f71014b http://www.screencast.com/users/Megsjohnson/folders/Jing/media/0ef8f186-47da-41cf-96cb-26920f71014b http://www.lib.jmu.edu/ INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2013 81 APPENDIX A Pre-Purchase Usability Benchmarking Test In April 2012, before the library purchased Encore, the library conducted a small usability study to serve as a benchmark. The study outlined in this paper follows the same basic outline, and adds a few questions. The purpose of the April study was to measure student perceived success and satisfaction with the current search system of books and articles Appalachian uses compared with use of the implementation of Encore discovery services at University of Nebraska Lincoln (UNL). The methodology was four undergraduates completing a set of tasks using each system. Two started with UNL, and two started at Appalachian’s library homepage. In the April 2012 study, the participants were three freshman and one junior, and all were female. All were student employees in the library’s mailroom, and none had received special training on how to use the library interface. After the students completed the tasks, they rated their experience using the System Usability Scale (SUS). In the summary conclusion of that study, the average SUS score for the library’s current search box layout was 62, and for UNL’s Encore search it was 49. Even though none of the students was particularly familiar with the current library’s interface, it might be assumed that part of the higher score for Appalachian’s site was simply familiarity. Student comments from the small April benchmarking study included the following. The junior student said the UNL site had "too much going on" and Appalachian was "easier to use; more specific in my searches, not as confusing as compared to UNL site." Another student (a freshman), said she has "never used the library not knowing if she needed a book or an article." In other words, she knows what format she is searching for and doesn’t perceive a big benefit to having them grouped. This same student also indicated she had no real preference between Appalachian or the UNL. She believed students would need to take time to learn either and that UNL is a "good starting place." USABILITY TEST RESULTS FOR ENCORE IN AN ACADEMIC LIBRARY | JOHNSON 82 APPENDIX B Instructions for Conducting the Test Notes: Use Firefox for the browser, set to “private browsing” so that no searches are held in the cache (search terms to not pop into the search box from the last subject’s search). In the bookmark toolbar, the only two tabs should be available “dev” (which goes to the development server) and “lib” (which goes to the library’s homepage). Instruct users to begin each search from the correct starting place. Identify students and faculty by letter (Student A, Faculty A, etc). Script Hi, ___________. My name is ___________, and I'm going to be walking you through this session today. Before we begin, I have some information for you, and I'm going to read it to make sure that I cover everything. You probably already have a good idea of why we asked you here, but let me go over it again briefly. We're asking students and faculty to try using our library's home page to conduct four searches, and then ask you a few other questions. We will then have you do the same searches on a new interface. (Note: half the participants to start at the development site, the other half start at current site). After each set of tasks is finished, you will fill out a standard usability scale to rate your experience. This session should take about twenty minutes. The first thing I want to make clear is that we're testing the interface, not you. You can't do anything wrong here. Do you have any questions so far? OK. Before we look at the site, I'd like to ask you just a few quick questions. What year are you in college? What are you majoring in? Roughly how many hours a week altogether--just a ballpark estimate--would you say you spend using the library website? OK, great. Hand the user the task sheet. Do not read the instructions to the participant, allow them to read the directions for themselves. Allow the user to proceed until they hit a wall or become frustrated. Verbally encourage them to talk aloud about their experience. USABILITY TEST RESULTS FOR ENCORE IN AN ACADEMIC LIBRARY | JOHNSON 83 Written instructions for participants. Find the a copy of the book the Old Man and the Sea. In your psychology class, your professor has assigned you a 5-page paper on the topic of eating disorders and teens. Find a scholarly article (or peer-reviewed) that explores the relation between anorexia and self-esteem. You are studying modern Chinese history and your professor has assigned you a paper on foreign relations. Find a journal article that discusses relations between China and the US. What is a topic you have written about this year? Search for materials on this topic. USABILITY TEST RESULTS FOR ENCORE IN AN ACADEMIC LIBRARY | JOHNSON 84 APPENDIX C Follow up Questions for Participants (Or ask as the subject is working) After the first task (find a copy of the book The Old Man and the Sea) when the user finds the book in APPSearch, ask “Would you know where to find this book in the library?” How much of the library’s holdings do you think APPSearch/ Articles Quick Search is looking across? Does “Peer Reviewed” mean the same as “scholarly article”? What does the “refine by tag” block the right mean to you? If you had to advise the library to either stay with a tabbed layout, or move to the one search box, what would you recommend? Do you have any questions for me, now that we're done? Thank subject for participating. USABILITY TEST RESULTS FOR ENCORE IN AN ACADEMIC LIBRARY | JOHNSON 85 APPENDIX D Sample System Usability Scale (SUS) Strongly Strongly disagree agree I think that I would like to use this system frequently 1 2 3 4 5 I found the system unnecessarily complex 1 2 3 4 5 I thought the system was easy to use 1 2 3 4 5 I think that I would need the support of a technical person to be able to use this system 1 2 3 4 5 I found the various functions in this system were well integrated 1 2 3 4 5 I thought there was too much inconsistency in this system 1 2 3 4 5 I would imagine that most people would learn to use this system very quickly 1 2 3 4 5 I found the system very cumbersome to use 1 2 3 4 5 I felt very confident using the system 1 2 3 4 5 I needed to learn a lot of things before I could get going with this system 1 2 3 4 5 Comments: 4636 ---- That Was Then, This Is Now: Replacing the Mobile-Optimized Site with Responsive Design Hannah Gascho Rempel and Laurie Bridges INFORMATION TECHNOLOGY AND LIBRARIES |DECEMBER 2013 8 ABSTRACT As mobile technologies continue to evolve, libraries seek sustainable ways to keep up with these changes and to best serve our users. Previous library mobile usability research has examined tasks users predict they might be likely to perform, but little is known about what users actually do on a mobile-optimized library site. This research used a combination of survey method and web analytics to examine what tasks users actually carry out on a library mobile site. The results indicate that users perform an array of passive and active tasks and do not want content choices to be limited on mobile devices. Responsive design is described as a long-term solution for addressing both designers’ and users’ needs. INTRODUCTION Technology is in a constant state of flux. As librarians well know, emerging technology can quickly become outdated in a few short years. In 2010 Blackberry phones were at their peak, but now their mobile devices account for a little more than 5 percent of the market share and the Android dominates the top spot, with approximately 52 percent of the market share.1 As smartphone use and design has continued to proliferate and advance, users have become accustomed to quicker load times for webpages and are now well-acquainted with how to navigate the web from their phones. As the mobile phone market changes and evolves, usability experts continuously update, test, and revise standards for the mobile web. At Oregon State University (OSU) we recently set out to improve our mobile site. This required updating our knowledge about patron use of library mobile sites, both through reviewing the literature and by conducting our own primary research. What we found surprised us and challenged us to reconsider what we had previously assumed about patrons’ mobile habits. In this article we will describe past research on library mobile website usability, our own research on how our mobile library site is used, and why we ended up deciding to use responsive design as the guiding principle for our redesign. Hannah Gascho Rempel (hannah.rempel@oregonstate.edu), is Science Librarian and Graduate Student Services Coordinator, Oregon State University, Corvallis, Oregon. Laurie Bridges (laurie.bridges@oregonstate.edu), a LITA member, is Instruction and Emerging Services Librarian, Oregon State University, Corvallis, Oregon. mailto:hannah.rempel@oregonstate.edu mailto:laurie.bridges@oregonstate.edu THAT WAS THEN, THIS IS NOW: REPLACING THE MOBILE-OPTIMIZED SITE WITH RESPONSIVE DESIGN | REMPEL AND BRIDGES 9 Background: That was Then In early 2010 we co-authored an article with our then-programmer about the mobile landscape in libraries. We were among the first to propose that libraries should develop separate websites and catalog interfaces optimized for mobile devices.2 Based on the widespread implementation of mobile-optimized library websites since then, it appears this proposal was both relevant and timely. Our 2010 recommendations were based on usability studies, library reports, and technology trends from 2007 through 2009. Research and literature at the time pointed to the need for considering the mobile context, for example, the “attention span” of mobile users as they search for information on the go.3 We noted the advice of Jakob Nielsen, who indicated that “if mobile use is important to your Internet strategy, it’s smart to build a dedicated mobile site.” Although that particular webpage is no longer available, the essence of Nielsen’s thinking can be found in a Mobile Usability Update posted on September 26, 2011, which states, “A dedicated mobile site is a must,” in the introductory paragraph.4 The iPhone was released in the United States in late 2007, and in 2008 the proliferation of dedicated mobile sites began. In December 2008, our mobile team focused on developing a site for the two most popular device types at the time, “smartphones” and “feature phones.” The differences between the two types of phones were numerous and there were drawbacks to the feature phones. However, we felt it was important to have a site that rendered well on feature phones because at that time feature phones dominated the market with only 28 percent of mobile phone users in the United States owning a smartphone.5 Our initial site design in 2008 and 2009 focused on our primary users, members of the university community. The first phase of our mobile website, released in March 2009, included static pages like library hours, contact information, frequently asked questions, and directions.6 The second phase, released in September 2009, included a mobile catalog interface (designed in-house), a staff directory, and a computer availability map. In February 2010 the site averaged one hundred unique users a day. Mobile site analytics showed that the most viewed pages were computer availability, catalog, and hours. Background: This is Now Recent research and case studies show a shift in the mobile context. A comprehensive study by Alan Aldrich in 2010 examined the mobile websites of large research universities and their libraries in the United States and Canada. Aldrich notes, “Users seem to want access to information just as if they were using a fully web-capable desktop or laptop computer.” Aldrich ponders the possibility that patron expectations and desires may be evolving as smartphones begin to dominate the mobile landscape.7 In 2011, Jakob Nielsen noted that because most people do not use the web on their feature phones and most companies do not support feature phones in their web design process, he would no longer be testing feature phones in his usability studies.8 Our experience matches Nielsen’s on this INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2013 10 point; approximately 10 percent of daily users accessing our library’s mobile site came from feature phones in 2010; in 2012, that number had dwindled to less than 1 percent. In 2012, Nielsen began considering the proliferation of mobile devices beyond phones, when he wrote, “High-end sites will need 3 mobile designs to target phones, mid-sized tablets, and big tablets.”9 Note the slight change in the phrase “dedicated mobile site” (from the 2009 Mobile Usability Update) to “mobile designs.” Nielsen goes on to suggest responsive design, which we will address in more detail later in this article, as a solution to this design problem. As we prepared for another redesign of our mobile site, we knew we needed a current snapshot of users and how they actually use the mobile version of the library’s website before starting our redesign. This may sound like a simple decision; however, it is a step that the library literature has not documented well. Researchers have focused on what library users predict they might need rather than analyzing their actual behaviors. Mobile Site User Experiences Mobile library website development has been influenced by other, nonlibrary mobile sites that have placed heavy emphasis on developing sites for users who are on the go.10 Earlier studies examining general mobile browsing and searching habits found that mobile users’ most popular activities were reading news, weather, or sports articles, looking for information using search engines, and checking email.11 When libraries used these studies to inform mobile site development, the result was a streamlined version of the full library site. It is instructive to consider studies like those done by Coursaris and Kim, who performed a meta-analysis of more than one hundred mobile usability studies. They demonstrated both the extreme breadth of mobile usability research, which has examined everything from how users perform tasks on their mobile devices while walking on a treadmill to how users navigate mobile maps, to mobile restaurant selection, as well as the niche-specific nature of many of these studies, the results of which may not be transferable to other contexts.12 When looking at mobile usability studies specifically within the higher education sector, a field more closely related to libraries, research focuses primarily on the use of mobile phones for enhancing student learning through specific activities or sites.13 Less research examines how mobile portal or university homepages are used. One exception is the Iowa Course Online (ICON) Mobile Device Survey, which was administered in 2010 and asked students what aspects of ICON (a course management system) they used on their mobile devices.14 Students were frequent users of this site; three-quarters of respondents used the site at least 1–3 times per week. The top three selected tasks were grades, “content” (a category including PDFs and Microsoft Word documents), and schedules. As defined by Kaikkonen, users’ top tasks included a mix of “passive” content that require no additional interaction (e.g., grades, weather) and “active” content, which requires further searching, reading, or location-specific information from the user (e.g., searching in web browsers, mapping, or looking through PDFs). 15 The combination of passive and active content more closely matches the types of tasks potentially required by mobile library website users. THAT WAS THEN, THIS IS NOW: REPLACING THE MOBILE-OPTIMIZED SITE WITH RESPONSIVE DESIGN | REMPEL AND BRIDGES 11 Research on library mobile site use or usability primarily has focused on users’ speculations about what they might like from a mobile library site before the site was constructed, or on a handful of users’ experiences navigating an existing site while accomplishing researcher-assigned tasks.16 No known research exists that demonstrates what tasks users actually perform on an existing mobile library website in real-time. For example, focus groups conducted in 2009 at Kent State University library before the creation of their mobile site led to the conclusion that “students want the site for ‘quick research’ not to ‘sit down and write a term paper on my phone,’” and that students did not want as many research choices, such as all of the databases the library subscribes to, made available on their mobile device (or perhaps even on the full website).17 A survey at the Open University in the United Kingdom given prior to deployment of the mobile site extrapolated that students would want to access library hours, a map of the library, contact information, the library catalog and their borrowing record from their mobile device.18 In addition, a survey of the student body at Utah State University in 2011 intended to help inform their mobile site design found that students might want to access the mobile catalog, retrieve articles and reserve study rooms.19 These results helped determine the future development of these academic library mobile websites and were interpreted to demonstrate that particular tasks might be better suited for the mobile environment. However, they also demonstrate that student users might want to engage in a variety of both active and passive tasks, such as searching the library catalog and checking the library’s hours. As part of a growing recognition that it is time to reevaluate mobile library sites, Bohyun Kim, reviewed eight library mobile sites, and presented her analysis at the American Library Association Annual Conference in 2012.20 Kim compared screenshots of 2010 and 2012 homepages and found a greater emphasis on search in the 2012 versions. Kim also highlighted constraints and assumptions that are no longer true in the mobile environment, such as mobile devices’ slow networks, a focus on information on the go, and mobile sites with fewer features and content. Kim’s analysis signals a shift in how libraries, as well as the broader mobile environment, are envisioning the content they provide on their mobile sites. Because websites designed for the mobile context are still relatively new, an important part of this shift in the design of library’s mobile sites should include an investigation into what types of tasks users are actually performing on these sites. Using web analytics software, we are able to learn which mobile webpages are the most visited on OSU’s mobile site, and we know the path users took from the first hit to when they exited the site. However, what we do not know is what the user’s intention is in visiting the site, what types of searches they enter, or if these users are able to accomplish their search goals. The objectives of this study were to gather a list of tasks users attempt to accomplish when visiting the library mobile site and to understand the difficulties users encounter when they try to access information on the OSU’s mobile site. A more in-depth understanding of how our mobile website is used will help us to provide an improved interface, especially as we work on a site redesign. INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2013 12 METHODS This study used an online survey instrument to gain a better understanding of what tasks mobile site visitors are trying to perform when they visit our library’s mobile site, to discover whether they were able to accomplish their task, and any other general impressions, suggestions, or feedback these users had about our mobile library site. This survey (approved by OSU’s institutional review board) was available on the Qualtrics survey platform for twelve-weeks from November 2012 to January 2013. The survey was accessible via a link on the mobile version of our library’s homepage, meaning that only users who used the mobile-optimized version of our website had access to the survey. The survey was open to anyone who used the library’s mobile website, not just OSU affiliates. The survey was completed by 115 participants. A $2 gift certificate to a coffee shop was distributed to participants upon completion of the survey. Because mobile site use can cover a complex range of scenarios, and because we were more interested in learning about what this range of scenarios was for our mobile site, we did not use a closed-task scenario with preset tasks. We asked real users who were currently browsing the mobile site to choose from a list of tasks that best described what they were searching for on the library's mobile site (there were also open-ended questions and options). If they were looking for a book or planned to conduct research on a topic, we used display logic in our survey to further probe their answer and ask about the parameters of their search. If they indicated they had previously used the mobile site to search for books or other research materials, we used the same method used with the book search to ask if they were able to find these materials. If they were looking for articles, we then asked if they read the articles on their mobile device. In addition, we asked if there was anything they wished they could do on the site and for any other general feedback on the mobile site. Finally, we collected some demographic information about the participants: their OSU affiliation and the frequency with which they use the library’s mobile site. The survey data was analyzed using Qualtrics’ cross-tab functionality and Microsoft Excel to observe trends and potential differences by user groups. Open-ended responses were examined for common themes. To help provide some counterbalance to our survey data, a combination of Urchin and Google Analytics statistics were analyzed for two of the months the survey was available. Urchin statistics were gathered for the mobile version of the website, and Google Analytics statistics of our Drupal-based pages were gathered for mobile users of the full version of the library’s website. We tabulated average daily visitors, specific page views, and the type of browser used with these analytical tools. FINDINGS Online Survey—Closed-Ended Questions An advantage of administering a survey versus simply using web analytics to assess a mobile site is that more granular information, such as demographics, can be gathered. Of the 115 online survey respondents, 74 identified themselves as undergraduate students, 19 were graduate THAT WAS THEN, THIS IS NOW: REPLACING THE MOBILE-OPTIMIZED SITE WITH RESPONSIVE DESIGN | REMPEL AND BRIDGES 13 students, 8 were faculty members, 3 were community members, 2 were alumni, 2 were staff members, and 1 chose the “other” field and self-identified as a parent (see figure 1). Not all respondents answered every question. Because undergraduates make up the overwhelming majority of the campus body, it makes sense that 64 percent of respondents identified themselves as undergraduates. However, the demographic responses also illustrate that multiple user groups access the library’s mobile site. Figure 1. Demographic distribution of survey respondents (N = 109). The survey participants were asked how often they had previously used the library’s mobile site. Sixty-nine respondents (60 percent) were accessing the site for the first time. No respondents used the mobile site daily, 1 respondent visited 2–3 times per week, 5 respondents (4 percent) visited once a week, 11 respondents (9.5 percent) visited 2–3 times per month, 7 respondents (6 percent) visited once a month, and 16 (14 percent) visited less than once a month (see figure 2). The majority of respondents had not previously used the library’s mobile site. One possible reason for this is that the data was collected primarily during fall term, a time when there are many new students on campus. An alternative explanation is that people who use the mobile site often are highly task-oriented and did not want to be distracted by taking a survey. The fact that the majority of respondents had not previously used the library’s mobile site affected responses to later questions in the survey, which asked participants to remember previous experiences and satisfaction with the site. 1 2 2 3 8 19 74 0 10 20 30 40 50 60 70 80 Other Staff member Alumni Community Member (non-student) Faculty member Graduate student Undergraduate What is Your Affiliation? INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2013 14 Figure 2. Frequency with which survey respondents used the library’s mobile site (N = 109). One of our main research goals was to determine our users’ intention for visiting the library’s mobile site. Respondents could choose as many items as were applicable from a list of reasons for visiting the library’s mobile site; as a result, the data is presented as the percent of total responses. Respondents could also choose “something else” and enter additional reasons for visiting the mobile site. These “other” reasons were grouped and the groupings are reported in the list of reasons for visiting the site. The top reason respondents visited the site was to view the library’s hours (47 percent). The next two most frequent reasons for visiting the site were research related, with 25 percent intending to look for a book and 21 percent intending to do research on a topic. The fourth and fifth most common choices were associated with using the library building, with 13 percent looking for study room reservations and 10 percent looking for the availability of computers in the library. Because the library’s current mobile site has been optimized for tasks perceived to be most important for mobile users, and because some of the features available via the full-site version of our ILS (integrated library system) are not available via the library’s mobile site, not all of the tasks respondents wanted to accomplish on the mobile site were actually available on the mobile site. These items include study room reservations (13 percent of responses); “My Account” features, such as the ability to check due dates, make renewals, and place holds (6 percent of responses); and interlibrary loan (1 percent of responses). Finally, some features that had been considered ideal for the mobile context because of their location-sensitive, time-saving, or hedonic functionality—such as looking for directions (1 percent of responses), finding a quick way to contact a librarian with a question (2 percent of responses), and viewing 0 1 5 11 7 16 69 0 10 20 30 40 50 60 70 80 Daily 2-3 Times a Week Once a Week 2-3 Times a Month Once a Month Less than Once a Month This is my first time Number of Participants How Often Do You Use the Library's Mobile Site? THAT WAS THEN, THIS IS NOW: REPLACING THE MOBILE-OPTIMIZED SITE WITH RESPONSIVE DESIGN | REMPEL AND BRIDGES 15 the webcam for the coffee shop line (3 percent of responses)—were rarely selected as a reason for visiting the mobile site (see figure 3). Figure 3. Respondents’ reasons for visiting the library’s mobile site by percent of responses. Respondents could choose more than one response. To determine if different user groups approach the library’s mobile site differently, we compared the reasons for visiting the mobile site across user groups. When looking at the top five reasons respondents visited the mobile site, only a few differences appeared based on user group (because of the small sample size, a statistical analysis determining significance cannot be done, but results may indicate avenues for future research). Graduate students were somewhat more likely to visit the library’s mobile site to look for research on a topic, as well as to look for study room reservations. However, undergraduate students were more likely than graduate students to be interested in the availability of computers in the library (see table 1). 1% 1% 1% 1% 1% 2% 2% 3% 4% 6% 10% 13% 21% 25% 47% 0% 5% 10% 15% 20% 25% 30% 35% 40% 45% 50% Jobs Interlibrary loan Directions Availability of other technology Academic calendar A way to contact a librarian with a question Course Reserves Library Coffee Shop Webcam Staff Directory My Account Computer availability Study room reservations Research on a topic A book Library hours Percent of Responses What Are You Searching For on the Library's Mobile Site? INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2013 16 What Are You Searching For During This Visit to the Library Mobile Site? Library hours A book Research on a topic Study room reservations Computer availability n Undergraduate 48.7 21.6 14.9 14.9 12.2 74 Graduate student 47.4 31.6 31.6 21.1 5.3 19 Faculty member 12.5 25 37.5 0 0 8 Community member 100 33.3 33.3 0 0 3 Staff member 0 0 50 0 0 2 Alumni 50 50 0 0 0 2 Other 0 0 0 0 0 1 Table 1. Percentage of respondents’ reasons for visiting the library’s mobile site by user group for the top five most-selected tasks. (Respondents could choose more than one response.) Because the online survey was available over a twelve-week period, which included portions of fall term, winter break, and winter term, we could look at a breakdown of use by time of term. Specifically, we wanted to see if there was a difference in use during the middle of the term versus finals week and intersession. During the latter period, we anticipated users would not be using the library’s mobile site for research purposes; however, when comparing these two different usage periods for the top five reasons respondents visited the site, we found respondents’ tasks tended to be quite similar regardless of whether or not the term was in session. The only two differences were (1) during intersession respondents tended to be more likely to search for a book using the mobile site and (2) during the term, more respondents were looking for a way to make study room reservations. While the high number of respondents looking for library hours dominates these results, it does appear that respondents were still using the mobile site to conduct research- related tasks even during intersession (see figure 4). THAT WAS THEN, THIS IS NOW: REPLACING THE MOBILE-OPTIMIZED SITE WITH RESPONSIVE DESIGN | REMPEL AND BRIDGES 17 Figure 4. Percentage of respondents’ reasons for visiting the library’s mobile site by during term vs. intersession for the top five most-selected tasks. Respondents could choose more than one response. Online Survey—Open-Ended Questions As described earlier, the second and third most frequently cited reasons for visiting the library’s mobile site were research related: looking for a book and doing research on a topic. Survey participants indicating they had come to the site looking for a book were prompted to enter the search they intended to use. Of the twenty-eight respondents who indicated they were looking for a book, twenty-five provided the search words they planned to use (see table 2). Respondents reported a wide range of search types, from known-item titles or authors such as Moby Dick or Ian Fleming, to broad topic areas such as Women Studies, to focused keywords like “high performance computing.” All of these searches fit into the active task category of mobile use. 7% 13% 15% 15% 33% 6% 5% 14% 22% 34% 0% 5% 10% 15% 20% 25% 30% 35% 40% Computer availability Study room reservations Research on a topic A book Library hours P e r c e n t o f R e s p o n s e s What Are You Looking For on the Mobile Library Site (by time of term)? During Term Finals Week/Intersession INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2013 18 Accelerated c++ Autism Spectrum in Children Beauty pageant Beer and Circus Course Reserves Four fish Googlr [sic] ian fleming moby dick Oregon taxes Pomerania seafood quality Semiconductor Sir thomas Malory The Power of Now High performance computing Hope is an imperative I wanted to look up textbooks for my winter term classes Look up reference number to see if I can find out online if reserve is available Name of the book and author if I know it Title VIN # Web design Women studies Writing the successful thesis and dissertation Table 2. Respondents’ book searches on their mobile device. If a survey participant indicated they had come to the site to conduct research on a topic, we prompted them to provide more detailed information about their search. Of the twenty-three respondents who indicated they were conducting research on a topic, twenty provided the search words they planned to use (see table 3). It is apparent from the responses that at least five respondents misunderstood our question, and instead of entering keywords for a search, they entered the databases or search engines they planned to use to conduct their research, such as 1Search (our library’s iteration of Serials Solutions’ Summon), ERIC, PsycInfo, and perhaps Google; although, when considering the search term Google, it is possible the respondent was going to attempt to conduct research on the Google company itself. The remaining search terms reflect in- depth concepts that would either retrieve many results (e.g., procurement and contract processes), or that while retrieving fewer results would give the researcher more than just a single simple document to consult (e.g., ethnobotany Oregon). As with the book searches, these research topics represent more active tasks that move the mobile user beyond the earlier perceptions of mobile use that predicted tasks centered on quick, entertaining, or location-specific information.21 THAT WAS THEN, THIS IS NOW: REPLACING THE MOBILE-OPTIMIZED SITE WITH RESPONSIVE DESIGN | REMPEL AND BRIDGES 19 Unit 731 Social justice Taxes Ecological anthropology shared governance college Ncaa history World War I 1search EBSCO Medline search Eric search Procurement and contract processes seafood quality e-journals to search for a specific subscription Psychinfo Ethnobotany Oregon Google Dye properties and peak wavelengths Databases in search of company info for Applebee’s Thesis writing Beauty pageants The Science Teacher Journal Table 3. Respondents’ topic searches for nonbook research sources. Web Analytics Results In addition to collecting survey responses, we monitored our web analytics during the survey period to see how the site usage matched with our survey respondents’ stated activities. The mobile version of the site averaged 124 daily visitors between November 5, 2012, and January 5, 2013. The top three pages viewed were the computer availability map (37 percent), the mobile homepage (25 percent) and the research page (3 percent). The mobile homepage also displays the library hours. Because mobile users do not just use mobile optimized sites, we also looked at mobile use of the Drupal-designed pages on the full version of our website. Of the top twenty pages viewed, eleven were content pages and nine were navigational pages. Based on page views, the top three content pages were study rooms (8 percent), hours (5 percent) and research databases (5 percent). The top three navigational pages based on page views were the homepage (40 percent), which also displays the library hours, the Find It page (8 percent), which lists links to content like databases, the catalog, and ejournals, and the In the Library page (7 percent), which provides links to information about things a patron might use in the library, such as study rooms or computers. The web analytics do not exactly mirror the tasks reported by the survey respondents and reflect an even greater emphasis on practical tasks like using a computer or a study room in the library. INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2013 20 In our research a disconnect appeared between what users actually do on the mobile site, according to Urchin and Google Analytics, and what they would like to do on the site based on information gathered from the participants of our online survey. It became apparent from our survey that our participants were not only attempting simple searches appropriate for a stripped- down mobile-optimized site but were also attempting active tasks, like conducting more complex searches we formerly would have expected them to do only on the full site. We needed a website that no longer restricted the activities our users could do based on our outdated assumptions of their use of our website. Responsive Design As a solution to our problem of how to provide a consistent, nonrestricted experience to all of our users regardless of how they were accessing our site, we turned to the concept of responsive design. Responsive design was conceived as recently as 2010,22 but adoption is growing rapidly because it offers a more scalable solution for designers, allowing them to move away from designing different websites for every platform, and instead designing sites that scale differently in different contexts (for example, an iPhone vs. an iPad). Responsive design is a more dynamic strategy, requiring as web designer Ethan Marcotte states, “fluid grids, flexible images, and media queries.”23 However, Marcotte goes on to argue that responsive design “requires a different way of thinking. Rather than quarantining our content into disparate, device-specific experiences, we can use media queries to progressively enhance our work within different viewing contexts.” The following images illustrate the reflow of a responsive design layout for desktop and tablet views. Figure 5. Desktop view of responsive OSU Libraries webpage. THAT WAS THEN, THIS IS NOW: REPLACING THE MOBILE-OPTIMIZED SITE WITH RESPONSIVE DESIGN | REMPEL AND BRIDGES 21 Figure 6. Tablet view of responsive OSU Libraries webpage. This summer, our web designer redesigned both the full site and mobile site using responsive design. At OSU, we have decided that we will no longer have a separate mobile site. Instead, Drupal responsive design modules and themes allow our site to be viewed optimally, independent of screen size, as “one web.”24 Using a responsive design allowed us to choose a one-column layout for our mobile site and a three column layout for our full site. Using Drupal modules and themes, the three columns from the full site reflow into one column on the mobile device. Menu bars collapse, numerous static pictures become one image box with rotating pictures, and paragraphs are simply linkable titles. These design decisions allow users to perform both active and passive tasks depending on their needs, regardless of context. Responsive design provides clear advantages for designers, for example, they no longer need to maintain separate versions of low-use pages, such as a directions page. In addition, while responsively designed websites are not automatically accessible, it is simpler for designers to create a single iteration of a site that meets accessibility guidelines, and which can then scale appropriately to other contexts. However, there are also advantages for the user. Responsive design ensures that users will encounter a predictable interface and experience across all of the platforms from which they access the library’s website. Moreover, in our local context, we have had a policy of not providing links to websites not optimized for mobile devices, such as databases, from our mobile site. As we switched to a responsively designed site, we moved away from this policy, thereby providing more research choices to our mobile users. There are also some drawbacks to using responsive design in our context. Some might consider linking out to resources not optimized for mobile devices a drawback rather than an advantage. In addition, our redesign only involved applying responsive principles to our library sites that are developed in Drupal. Sites such as the study room reservation system, My Account, and the INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2013 22 library’s catalog draw upon other vendors’ tools, and therefore are not under our design control. However, access to these sites will no longer be circumscribed for mobile users, but may involve less intuitive navigation depending on their device. CONCLUSION Our goal in this study was to gain a more in-depth understanding of how our mobile website is used to guide a redesign of our mobile interface. As a result of examining current trends in web design and development, through analysis of the data we collected that demonstrated not only what our users currently do on our mobile site but also what they intend to do and what gaps they perceive in our service, we have chosen to integrate our full and mobile sites into “one web” through the use of responsive design. Gathering qualitative information from our mobile site users has provided us with more realistic tasks that we can use in further usability testing. Finally, we are again reminded of the continual evolution of our users’ needs and the expanding possibilities that are available as information becomes increasingly mobile. REFERENCES 1. Jeff Clabaugh, “BlackBerry U.S. Market Share Falls to 5.4 percent; Google’s Android Remains on Top,” Washington Business Journal, April 4, 2013, http://www.bizjournals.com/washington/news/2013/04/04/blackberry-us-market-share- falls-to.html. 2. Laurie Bridges, Hannah Gascho Rempel, and Kimberly Griggs, “Making the Case for a Mobile Library Web Site,” Reference Services Review 38, no. 2 (2010): 309–20, doi: 10.1108/00907321011045061. 3. Anne Kaikkonen, “Full or Tailored Mobile Web—Where and How Do People Browse on Their Mobiles?” in Proceedings of the International Conference on Mobile Technology, Applications, and Systems, Mobility ’08 (New York: ACM, 2008), 28:1–28:8, doi: 10.1145/1506270.1506307. 4. Jakob Nielsen, “Mobile Usability Update (Jakob Nielsen’s Alertbox),” September 26, 2011, http://www.useit.com/alertbox/mobile-usability.html. 5. “Feature Phones Comprise Overwhelming Majority of Mobile Phone Sales in Q2 2009,” NPD Group, 2009, https://www.npd.com/wps/portal/npd/us/news/press-releases/pr_090819/. 6. Kim Griggs, Laurie M. Bridges, and Hannah Gascho Rempel, “Library/Mobile: Tips on Designing and Developing Mobile Web Sites,” Code4Lib Journal 8 (November 11, 2009), http://journal.code4lib.org/articles/2055. 7. Alan Aldrich, “Universities and Libraries Move to the Mobile Web,” Educause Review Online, June 24, 2010, http://www.educause.edu/ero/article/universities-and-libraries-move- mobile-web. http://www.bizjournals.com/washington/news/2013/04/04/blackberry-us-market-share-falls-to.html http://www.bizjournals.com/washington/news/2013/04/04/blackberry-us-market-share-falls-to.html http://www.useit.com/alertbox/mobile-usability.html https://www.npd.com/wps/portal/npd/us/news/press-releases/pr_090819/ http://journal.code4lib.org/articles/2055 http://www.educause.edu/ero/article/universities-and-libraries-move-mobile-web http://www.educause.edu/ero/article/universities-and-libraries-move-mobile-web THAT WAS THEN, THIS IS NOW: REPLACING THE MOBILE-OPTIMIZED SITE WITH RESPONSIVE DESIGN | REMPEL AND BRIDGES 23 8. Nielsen, “Mobile Usability Update (Jakob Nielsen’s Alertbox).” 9. Jakob Nielsen, “Mobile Site vs. Full Site (Jakob Nielsen's Alertbox),” April 10, 2012, http://www.useit.com/alertbox/mobile-vs-full-sites.html. 10. Keren Mills, M-Libraries: Information Use on the Move (Cambridge, UK: Arcadia Programme, 2009), http://www.dspace.cam.ac.uk/handle/1810/221923; Bridges, Rempel, and Griggs, “Making the Case for a Mobile Library Web Site.” 11. Anne Kaikkonen, “Full or Tailored Mobile Web?” 12. Constantinos K. Coursaris and Dan J. Kim, “A Meta-Analytical Review of Empirical Mobile Usability Studies,” Journal of Usability Studies 6, no. 3 (May 2011): 117–71. 13. Emrah Baki Basoglu and Omur Akdemir, “A Comparison of Undergraduate Students’ English Vocabulary Learning: Using Mobile Phones and Flash Cards,” Turkish Online Journal of Educational Technology—TOJET 9, no. 3 (July 1, 2010): 1–7; Suzan Duygu Eristi et al., “The Use of Mobile Technologies in Multimedia-Supported Learning Environments,” Turkish Online Journal of Distance Education 12, no. 3 (July 1, 2011): 130–41; Stephanie Cobb et al., “Using Mobile Phones to Increase Classroom Interaction,” Journal of Educational Multimedia and Hypermedia 19, no. 2 (April 1, 2010): 147–57; Shelley Kinash, Jeffrey Brand, and Trishita Mathew, “Challenging Mobile Learning Discourse Through Research: Student Perceptions of ‘Blackboard Mobile Learn’ and ‘iPads,’” Australasian Journal of Educational Technology 28, no. 4 (January 1, 2012): 639–55. 14. University of Iowa, ICON Mobile Device Survey, n.d., https://icon.uiowa.edu/support/statistics/ICON percent20Mobile percent20Device percent20Survey.pdf. 15. Anne Kaikkonen, “Full or Tailored Mobile Web?” 16. Kimberly D. Pendell and Michael S. Bowman, “Usability Study of a Library’s Mobile Website: An Example from Portland State University,” Information Technology & Libraries 31, no. 2 (2012): 45–62, doi: 10.6017/ital.v21i2.1913. 17. Jamie Seeholzer and Joseph Salem, “Library on the Go: A Focus Group Study of the Mobile Web and the Academic Library,” College & Research Libraries 72, no. 1 (January 2011): 9–20. 18. Mills, M-Libraries. 19. Angela Dresselhaus and Flora Shrode, “Mobile Technologies & Academics: Do Students Use Mobile Technologies in Their Academic Lives and Are Librarians Ready to Meet This Challenge?” Information Technology & Libraries 31, no. 2 (2012): 82–101, doi: 10.6017/ital.v31i2.2166. http://www.useit.com/alertbox/mobile-vs-full-sites.html http://www.dspace.cam.ac.uk/handle/1810/221923 https://icon.uiowa.edu/support/statistics/ICON%20percent20Mobile%20percent20Device%20percent20Survey.pdf https://icon.uiowa.edu/support/statistics/ICON%20percent20Mobile%20percent20Device%20percent20Survey.pdf INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2013 24 20. Bohyun Kim, “It’s Time to Look at Your Library’s Mobile Website Again!” (presented at the American Library Association Annual Conference, Anaheim, CA, June 24, 2012), http://www.slideshare.net/bohyunkim/its-time-to-look-at-your-librarys-mobile-website- again. 21. Bridges, Rempel, and Griggs, “Making the Case for a Mobile Library Web Site.” 22. Ethan Marcotte, “Responsive Web Design,” A List Apart, May 25, 2010, http://alistapart.com/article/responsive-web-design. 23. Ibid. 24. Jeff Burnz, “Responsive Design,” Drupal, February 14, 2013, http://drupal.org/node/1322126. http://www.slideshare.net/bohyunkim/its-time-to-look-at-your-librarys-mobile-website-again http://www.slideshare.net/bohyunkim/its-time-to-look-at-your-librarys-mobile-website-again http://alistapart.com/article/responsive-web-design http://drupal.org/node/1322126 4656 ---- SIMON FRASER UNIVERSITY COMPUTER PRODUCED MAP CATALOGUE 105 Brian PHILLIPS: Head Social Sciences Librarian and Gary ROGERS : Programmer-Analyst, Computer Centre, Simon Fraser University, Burnaby, British Columbia An IBM 360/50 computer and magnetic tape are used in a new univer- sity library to produce a map catalogue by area and up to six subiects for each map. Cataloguing is by non-professional staff using the Library of Congress "G, schedule. Author, title, and publisher are in variable length fields, and codes are seldom used for input or interpretation. Ma- chine searches by area, subjects, author, publisher, scale, pro-;ection, date and language can be carried out. Simon Fraser University in Burnaby, British Columbia, opened in Sep- tember 1965 to 2,500 students. The Library's book collection was small and the map collection yet to be started. To-day there are 6,000 students, approximately 350,000 volumes and 25,000 sheet maps. When graduate work was offered in geography the map collection had to be expanded rapidly. Only a small staff was available and it was es- sential that any map catalogue be largely maintained by trained non- professional assistants. The circulation, acquisitions, and serials systems were automated and there was of course no sacred 3"x5" card file to be replaced. An IBM 1401 (now a 360/50) was in the Library and the Uni- versity Librarian encouraged experiment. Some form of automated book catalogue was clearly indicated and work began in 1966 to develop one. Automated or semi-automated methods for cataloguing and producing map lists have been in use for over twenty years. Very little, however, appeared in print on the subject until the 1960's. Since that time there 106 Journal of Library Automation Vol. 2/ 3 September, 1969 has been a number of articles on proposed systems and experimental projects, though only a few describe operating systems. The U.S. Army Map Service Library has used punched cards since 1945 ( 1). At the time of investigation, this system was not fully auto- mated, making use only of electric accounting machines rather than a computer. Other automated catalogues, such as those for the San Juan project ( 2 ) and for McMaster University ( 3), restricted the amount of information possible by using only one punched card. These systems re- quired codes and tables for both input and interpretation. The literature revealed other approaches, and several, such as indexing by co-ordinates ( 4 ), or using a hierarchical classification, were considered. In the former, each sheet is indexed by its centroid in latitude and longi- tude. This provides complete control by location; but all requests, and of course indexing, must be expressed by centroid and the extent of the search area indicated in miles. A hierarchical system, such as that sug- gested by Donahue and Hedges ( 5) or that used by McMaster Univer- sity, permits a detailed subdivision of area and/ or subject. There is, how- ever, no agreement on standards, with each library developing a classi- fication to meet its own needs. Visits were made to the University of California at Santa Cruz and Illinois State University at Normal, Illinois, to see two systems that were being automated. Both had used the universally recognized classification of the Library of Congress. The California system was first outlined by Carlos Hagen ( 6) in a proposal to automate the map library at the Los Angeles Campus, and implemented with some changes at Santa Cruz by Stanley Stevens. William Easton at Illinois State has described his work in cataloguing the collection there ( 7). The use of codes in both cases meant a number of revisions, as new projections, publishers or other information were required. Because of format, library staff must be called upon to interpret much of the information. MATERIALS AND METHOD In the Simon Fraser map catalogue the Library of Congress classifica- tion "G" schedule ( 8) is adopted for comruter use. In it each major natural or political unit is assigned a block o four numbers. The schedule starts with the world and hemispheres, then sweeps through North and South America, to Europe, Asia, Africa, and finally Oceania. Adjacent areas are thus grouped together numerically. The classification similarly groups related subjects. A single letter is used for the broad subject and an alphanumeric code for subdivisions. In an automated system each area name must have a unique number if it is to appear in the printout under that name. To this end it h~s been necessary to make variations in the Library of Congress "G" sched- ule. Indo-China ( G8010-G8014 }, for example, must be split to provide separate numbers for Laos, Cambodia and Vietnam. The subject classifi- Computer Pmduced Map Catalog 107 cation must also be divided to provide an alphanumeric code for each subject that is grouped under one general number in the schedule. As commonly in map libraries, the main entry is area rather than au- thor, which is of secondary importance. The author (engraver, cartogra- pher, etc.) is entered on the coding form and appears in the description. In the imprint, publisher is given first, followed by place of publication. These three elements are in variable length fields. Information from the maps is entered on a coding sheet (Figure 1) by a library assistant. Difficult sheets are entered by a librarian who checks all sheets. As indicated on the flow chart (Figure 2), the coding sheets are sent to the library keypunching section, where a deck is made for each record. The number of cards for any particular map depends upon the quantity of information required to describe it. The cards are then sent to the Computing Centre, where they are written onto magnetic tape and used to update the current master · files. A preliminary survey determined the average length of a map record to be 350 characters, while the maximum approached the region of 700. In order to maximize the use of tape space, it was decided that four of the fields would be variable in length. These are: 1) main entry (area and title) ( 215 characters maximum); 2) publisher ( 129 characters maxi- mum); 3) author ( 129 characters maximum); and 4) notes ( 215 char- acters maximum). Access to these data elements is made possible by storing their character counts in fields preceding the variable portion of the record. Two master files are kept and updated each time a run is required. These are the area master (by L.C. class number) and the subject mas- ter. The area master contains all maps and is used to produce the classi- fied list and the alphabetical list. The subject master contains only those maps which have been assigned an L.C. subject code. If a map has more than one subject it appears on both the list and the tape file as many times as it has subjects. Changes and deletions are entered into the system along with addi- tions. Status codes signal the three: A- addition, C- change, D- deletion. Change and deletion records are complete decks. The records are changed or deleted by comparing the call number on the area master and the call number and subject code on the subject master. Call number and subject code are the only fields that cannot be thus changed. Their change is accomplished by replacing the old record with the revised record. As the only unique identifying number for each punched card would be the call number (maximum of 24 spaces), a six-digit I. D. number is assigned. It is repeated for each of the five decks. The maximum number of cards used in any deck is four (main entry and notes), though up to ninety-nine could be used if necessary. 1 6 1.0 . NUIIIER S.F.U. MAP CODING SHEET ~~·:·;~·;·j;J;t:T ~:· :·r: ; :·: ;·:,~~·~·~;: ~· ~·~,~~;J·: : : :T·;·T; ;·;r ~:·T· ;~~·~ ·;~ ;~~· : f:·: :r·;·:·:~·:·r;·j . .... ~~ 6 10 80 1.0 . NUMBER t:·:·:·:·tl::t:·:~·: :·:·:·:·::· :·: :: ·: ·:·::::;:·:~::~~:~·:·:~·:·:·: :·~:· : ·: ·:·: ·:·: :,::~·::·:·: :·:·:·:,: :: : : : :: : : :: I 60 f:~gl:l:::h· ::~~: :~:: :,:·:·:·: : ::: ::: :~:·:: 0:~: ~: :: :: : :::: ::::::::::: :::::: :::::::: I 1 6 1.0 . NUMBER 10 6 10 Fig. 1. .Map Coding Sheet. 1 0 .. -------NOTES ............. . ...... @ I ~ 1:"4 .... C3" ~ ~ f g· ~ ~ CoJ Cf.) .g l ~ t5 ffi 8 Computer Produced Map Catalog 109 MAP WORK FORMS SORT RECORDS CARD CODE WITHIN CARD TYPE WITHIN 1.0. HUMBER Fig. 2. Work Flow Chart. 1 TAPE LAYOUT • LM005 CD SUBJECT MASTER & UPDATE TAPES (V AREA MASTER & UPDATE TAPES STATUS IDENTIFICATION A= ADDITION AREA NUMBER C =CHANGE ® D = DElETION CD CD 6 1 8 PROJECTION LANGUAGE CD G) 40 41 4 2 ~-~ .. 45 TITLE PUBliSHER LENGTH LENGTH CD 0) it 83 84 86 87 1113 LOCATION CD AUTHOR LENGTH 0) ~- - BLOCK SIZE = MAX • 2340 REC. SIZE = MAX.· 780 SUB SUBJ. DATE OF SHEET COPY SCAlE AREA MAJ. PUBliCATION NUMBER CD CD CD @) CD G) 15 tG 1119 Z213 - --- 32 33 35 37 39 -- CD SUBJ. MASTER FORM SIZE CODE NAME SUBJ. CODE G) SPARES CD ® G) CD AREA MASTER· ALPHA • CODE G) @or@ 4547 4149 54 55 -·-~·- 57 58 - 60 or 61 61 or 62 --- - -- 80 NOTES TITlE PUBliSHER AUTHOR NOTES LENGTH (PLACE & SUBJECT) 0) MAX = 215 MAX= 129 MAX= 129 MAX = 215 89 90~ --- --91 93 - - ------- -........- - VARIABLE LENGTH SECTION Fig . .'3: Layout for Area and Subject Masters and Update Tapes. ..... b ..... ~ -c -t"-4 & ~ ~ g ..... ... c ;:$ ~ ~ t-0 ~ IJ) CD "0 ,.,. ~ ct> VI-! ..... co ~ Computer Produced Map Catalog 111 The equipment used is all IBM. The cards are punched on an 029 keypuncher and verified on an 059. A computer model 360/50 is now being used, though equipment of this capacity is not necessary. During the development of the project an IBM 1401 and later a 360/40 were used. Printing is done on an IBM 1403 at 1100 lines per minute. The Programmes and Their Functions. The following nine programmes which were originally written in auto- coder are now in PL/I. This is a relatively new high-level language for the IBM 360 system. To have maximum efficiency from this language large core storage is necessary, though it can be used, with restrictions, on a 32K core storage computer. With the use of other programming languages the system could run efficiently on any computer. LM001: This programme puts the card decks (from keypunching) onto tape in card image. LM005: This programme creates and explodes each group of records on the card image tape with the same identification number to produce a subject update tape and an area update tape (Figure 3). At the same time, each record is edited; if an error is found, the record is rejected. In order for a record to be valid, the following conditions must exist: 1) Numerical Identification Number. 2) Valid card type, i.e. ( 1, 2, 3, 4, 5) (See Figure 1 ) . 3) No duplicate cards for the same map. 4) Card codes successive. 5) Area being 'G' followed by four numeric digits. 6) Numerical date. 7) If scale absent, 'Z' (not printed). 8) General information card and title card for each map. LM010: In this programme, the area master is updated with the area tape. An error message is printed and the record rejected if there is already an addition on the master file or if there is a change or deletion having no corresponding record on the master file. Also the area number is checked against a table to see if it is valid. If it is invalid, an error message is printed but the record will appear in the master file. LM015: This programme lists the alphabetical geographical master. LM025: This programme lists the area master geographically. LM030: This programme updates the subject master. A message is printed and the record rejected if there is an addition which already exists on the master file and if there is a change or deletion which has no corresponding record on the master file. Also, the subject code is checked against a table to see if it is valid. If the subject code is invalid, an error message is printed but the record will appear in the master file. I 112 ]oumal of Library Automation Vol. 2/ 3 September, 1969 LM035: This programme lists the subject master. At the same time, a tape is produced with each subject heading and the page number it appears on. LM036: This programme lists a table of contents for the master sub- ject list. LM037: This programme lists an index for the master subject list. THE CATALOGUE The catalogue is a book catalogue produced in three sections, unburst and top-bound in loose-leaf binders. The first is the classified or shelf list section, which brings maps of adjacent areas together. Within each L.C. number or area, maps are by subject code. In L.C. order, general maps are followed by those with subject emphasis, then by those showing po- litical divisions, ending with cities. Area names and numbers are in bold type. All pages are numbered and there is a table of contents giving area name and L.C. equivalent. There is also a list of subjects with code numbers. Section two is the same list in alphabetical order by area name (Figure 4). Section three (Figure 5) is the subject listing. Maps are ar- ranged by L.C. subject code rather than alphabetically, which gives the advantage of grouping related subjects together. Within each group maps are in class number ( ie. area) order. In the format for this section the L.C. alphanumeric code is given first, with the subject name in bold type ANTIGUA. • • • G5047- G5050 G5047 1959 ANTIGUA, WEST INDIES (ANTIGUA ISLAND). 1:25,000: TRANSVERSE MERCATOR. GREAT BRITAIN. DIRECTORATE OF OVERSEAS SURVEYS, LONDON, 1962. SET OF ·2 MAPS. Fig. 4. Alphabetical Area List. J80 INDUSTRIAL AGRICULTURAL PRODUG.TS G8481 J80 1959 RHODESIA AND NYASALAND- TOBACCO (TOBACCO PRODUCTION ••• RHODESIA AND NYASALAND). 1:3 MILLION. RHODESIA AND NYASALAND. DIRECTOR OF FEDERAL SURVEYS, SALISBURY, 1961. FEDERAL ATLAS MAP NO 20. K FORESTS AND FORESTRY K10 FORESTRY IN GENERAL Fig. 5. Classified Subject List. Computer Produced Map Catalog 113 for major groups and in regular type for the subdivisions. An alphabetical index of subjects refers the user to the page where his subject begins. The call number in all three lists includes only the major subject of the map, but since a map may cover several, up to five additional or "minor" ones may be included when cataloguing. A single sheet may therefore appear under several headings in the subject section. This method is also used to catalogue a single sheet containing several separate maps. EVALUATION Although some modifications may yet be made to the system, the cata- logue has proven highly successful and possesses a number of advantages over existing manual and automated systems. Its clear format and lack of symbols make it easy to use. It is issued each trimester in three copies, of which one is kept in the Library. One copy is sent to the Geography Department, and one to the History Department. The work form is simple enough to be used by skilled non-professional help and as all punched cards are verified there are fewer errors than with card typing. Some errors do occw·, but in almost all instances the record is automatically rejected and corrections made. Filing errors are non-existent. Few codes are needed for input and only the L.C. number, form and location are not readily understood in the printed catalogue. Although language, scale, projection and subject are entered in code or short form, tney appear in full on the lists. The codes used for form, language, and projection are very simple and reference to them is seldom necessary. Main entry, author, and imprint are in variable length fields, allowing complete information to be given without codes or abbreviations. As main entry is the area name and imprint is by publisher rather than place, a gazetteer-index, and a list by publisher, as well as an author list or index, can be produced when required. Although not envisaged as an important element, the provision of a punched card for notes has been most valuable. If a map is withdrawn from a journal, or has an accompanying brochure filed elsewhere, this is stated. Any further explanation necessary for an understanding of the map is also given. Since all elements on the first card are in fixed fields it is possible to obtain lists on demand by subject, date, scale, language projection, etc. Although the extent of the Simon Fraser collection makes this impractical now, its potential for preparing bibliographies and machine searching is apparent. An analysis shows that initial costs were not excessive. The program- ming time of two months was the largest single item at $2,400.00. Com- puter time and forms to produce the three listings totalled $110.00. The projected cost based on the present size of the collection is $280.00 per 114 Journal of Library Automation Vol. 2/ 3 September, 1969 year, a figure which will increase as the collection grows. Keypunching and verifying time is approximately 2~ minutes per map. While this is of course a cost factor, it is done at slack time by the Cataloguing De- partment, whose operators are paid from $360.00 to $400.00 per month. In a manual system, an additional clerk at $3,564.00 would have been needed to type and file cards, and furniture for the cards would also have been required. The disadvantages are now more evident than upon receipt of the first lists in June 1968. Use of the classified section has been slight except by the Library staff and it will be issued only once each year. The alpha- betical area section is the most heavily used, but the arrangement of en- tries by L.C. code under each area is confusing. As the number of maps increases from one page to many the user finds it increasingly difficult to locate a thematic map. The third section, by subject, helps overcome this problem, but here again the list is by L.C. code, not alphabetical within each subject. Topographic series and sets were catalogued with one enb·y so the number of records was considerably less than the 25,000 sheets in the collection. Archival and facsimile maps acquired since the system was designed have presented problems. The Librarian and Library Assistant were new to map work; conse- quently the number of errors was high, and corrections and patching-up were time consuming and therefore costly. CONCLUSION Despite the less than perfect product, however, the results are worth- while. First time users experience some difficulty with the classified ar- rangements but only a simple explanation is needed, and thereafter stu- dents are able to identify and locate most maps with little reference to staff. The Geography Department, and to a lesser extent the History Department, do make use of their copies of the catalogue. Telephone enquiries for holdings are minimal and some faculty have asked that they be given their own copies. The Simon Fraser system is not expensive to operate, the catalogue could be issued more frequently at little extra cost, and the system uses a widely accepted classification scheme that is updated periodically. The programmes employed could be adapted by other libraries with few, if any, modifications and the system could be run on any computer. There will be more sophisticated map catalogues, such as that of the Library of Congress, using MARC II format, and others which will take greater advantage of computer capabilities. Extensive and costly research, liowever, will be needed to develop these systems. The Simon Fraser system is operating now, was developed in a very short time, and has had a successful first year of use. Computer Produced Map Catalog 115 REFERENCES 1. Murphy, Mary: "Will Automation Work for Maps?" Special Libraries, 54 (November 1963 ), 563-567. 2. Thomas, Kenneth A., Sr.: "The San Juan Island Project: Cataloguing Maps by Mechanized Techniques," Special Libraries Association, Geography and Map Division. Bulletin, 54 (December 1963), 8-12. 3. Donkin, Kate; Goodchild, Michael: "An Automated Approach to Map Utility," The Cartographer, 4 (June 1967), 39-45. 4. Stallings, David Lloyd: Automated Map Reference Retrieval. A thesis submitted in partial fulfilment of the requirements for the degree of Master of Arts. (Seattle, University of Washington, 1966), p. 71. 5. Donahue, Joseph C.; Hedges, Charles P.: "CARES- A Proposed Cartographic Retrieval System," American Documentation Institute. Proceedings, (1964), 137-140. 6. Hagen, Carlos B. : "An Information Retrieval System for Maps," UNESCO. Library Bulletin, 20 (January-February 1966) 30-35. 7. Easton, William W.: "Automating the Illinois State University Map Library," Special Libraries Association, Geography and Map Divis~on . Bulletin, 61 (March 1967), 3-9. 8. United States. Library of Congress. Subject Cataloging Division: Classification, Class G: Geography, Anthropology, Folklore, Manners and Customs; 3rd. ed. (Washington: Superintendent of Documents, 1954)' p.502. 4657 ---- 116 LIBRARY COMPUTERIZATION IN THE UNITED KINGDOM Frederick G. KILGOUR: Director, Ohio College Library Center, Columbus, Ohio Library automation in the United Kingdom has evolved rapidly in the past three years. Imaginative, innovative development has produced novel techniques, some of which have yet to be put into practice in the United States. Of greatest importance is the growing cadre of highly effective librarians engaged in development. When the Brasenose Conference in Oxford convened in June 1966, there were represented only two operational library computerization projects from the United Kingdom : W. R. Maidment, Britain's pioneer in library com- puterization, had introduced his bookform catalog at the London Borough of Camden Library in April 1965 ( 1); and M. V. Line and his colleagues at the University Library, Newcastle-upon-Tyne, had introduced an auto- mated acquisitions system just a year later ( 2). During the three years following the summer of 1966, British librarians moved rapidly into com- puterization and have made novel contributions which their American colleagues would do well to adopt. In the spring of 1969 there were more than a couple of dozen major applications operating routinely with per- haps another score being actively developed. The most striking develop- ment in the United Kingdom is computerization in public libraries, whose librarians are considerably more active than their colleagues in the United States; at least nine public libraries have computerization projects that are operational or under active development, and as already mentioned, it was a public library that led the way. ' The sources for this paper are published literature and an-all-too-brief visit to the United Kingdom in April 1969 to see and hear of those ac- Library Computerization in the U. K. 117 tivities not yet reported. The principal literature source is Program: News of Computers in Libraries, now in its third volume. R. T. Kimber, of The Queen's University School of Library Studies at Belfast, edits Program, which he first published as a gratis newsletter in March 1966. Kimber has published the only reviews of library computerization that have contained adequate information on activities in the United Kingdom; the first ap- peared in Program ( 3), and the second, an expansion of the first, 1s m his recently published book ( 4). Program became an immediate success, and beginning with the first issue of Volume 2 in April 1968, it became available on a subscription basis. A year later, Aslib assumed its publica- tion, with Kimber still as editor, and Program will undoubtedly continue to be the major source of published information about library computeri- zation in the United Kingdom. Information & Library Science Abstracts, formerly Library Science Abstracts, is the one other major source of pub- lished information about British library automation. It abstracts articles appearing in other journals and report literature as well. Most library computerization in the United Kingdom has been a genu- ine advance of technology, in that computerization has introduced new methods of producing existing products or products that had existed in the past, such as bookform catalogs. To be sure, relatively more British libraries than United States libraries have maintained catalogs in book- form, but the pioneer project at Camden produced a bookform catalog to take the place of card catalogs. The time has come, however, when it is fruitful to think of products with new characteristics or of entirely new products unknown to libraries heretofore. British librarians have al- ready begun to think in these terms. One example (and others will be reported later in this paper) is the pioneering W. R. Maidment, who feels that the problem of application of computers to produce existing products has been solved intellectually. Maidment is giving serious thought to de- velopment of management information techniques, and automatic collec- tion of data to be used by librarians, sociologists and others as a data base for research. Such research could produce many findings, including knowledge of effectiveness of formal education programs as revealed by subsequent public library usage, as well as better understanding of the social dynamics of public libraries within their communities. CATALOGS Although users searching by subject may find more material by going directly to classified shelves than by any other subject access ( 5,6), the library catalog is nevertheless a major and indispensible tool for making books available to users. Taken together, descriptive cataloging, subject indexing, and subject classification constitute the bridge over which the user must travel to obtain books from a library. In libraries that are user- oriented it can be expected that the greatest gain will be achieved by computerization of the cataloging process. Moreover, acquisitions activities, 118 Journal of Library Automation Vol. 2/3 September, 1969 as well as circulation procedures, are essentially based on, and must be interlocked with, cataloging products. It is, therefore, of much interest that the first routine British computeri- zation was of the catalog at the Camden Borough Library ( 1). Impetus for this event . occurred several years earlier when the London metropoli- tan Boroughs of Hampstead, Holbom and St. Pancras were combined to become the Borough of Camden. The problem thereby generated was how to combine catalogs of three public library systems so that users of the new system could take advantage of the increased number of books available to them. Maidment decided to cope first with the future, and introduced a bookform union catalog in 1965 listing new acquisition in all Camden libraries and giving their locations. Of course, users have consulted both the bookform catalog and older card catalogs, but with the passage of each year, the card catalogs become less useful. H. K. Gordon Bearman, who directs the West Sussex County Library from its lovely new headquarters building in the charming little city of Chichester, is another imaginative pioneering public librarian. Bearman has keenly evaluated potential contribution of computerization to public libraries, and has amusingly assessed the opposition of some to such ad- vances (7). The West Sussex County Library possesses more than a score of branches, for which Bearman has introduced a computerized bookform union catalog ( 8). In April 1969 this computerized catalog contained nearly 23,000 entries. The library at the University of Essex produces computerized accession lists, departmental catalogs, and special listings for its science books ( 9). At least four libraries are putting out computerized alphabetical subject indexes to their classification schemes or to their classified catalogs: the Library of the Atomic Weapons Research Establishment (AWRE) at Aldermaston; The City University Library ( 10), London, formerly the Northampton Technical College; the Loughborough University of Tech- nology ( 11); and the Dorset County Library ( 12), which may be the first library in the United Kingdom to use a computer, for it issued a computerized catalog of sets of plays in 1964. One of the most exciting cataloging computerization projects in the United Kingdom is the British National Bibliography MARC project under the extraordinarily skillful leadership of R. E. Coward ( 13,14). The BNB MARC record is entirely compatible with MARC II, and Coward has introduced worthwhile improvements to it. For example, he uses indicator positions to record the number of initial characters to be omitted when an entry possessing an initial article is to be sorted alphabetically. In April 1966, the British National Bibliography was using its MARC records in its process for production of cards for sale to British libraries. BNB intends to use the same records for production of the British Na- tional Bibliography. In addition, BNB is fostering a pilot project, quite like the MARC pilot project, among a score of British libraries. F. H. Library Computerization in the U. K. 119 Ayres ( 15) has published perceptive suggestions for use of BNB MARC tapes for selections, acquisitions, and cataloging. Although Coward was able to take full advantage of work done at the Library of Congress, it is enormously to his credit that he did take that advantage, and that he has moved so far ahead so rapidly. Since British book production somewhat exceeds American, Coward has doubled the size of the pool of machine readable cataloging records available at the present time. The Bodleian Library at Oxford will be an important early user of BNB MARC tapes. Robert Shackleton, who became Bodley's Librarian shortly before the Brasenose Conference, has worked wonders at that ancient and honorable institution, and his principal wonder is .Peter Brown, who be- came Keeper of the Catalogues late in 1966. Brown is one of the few members of classical librarianship who has trained himself in depth in the programming and operation of computers. Oxford possesses no fewer than 129 separate libraries acquiring current imprints and has no instru- ment that remotely resembles a union catalog. Hence, each user must guess which library out of 129 is most likely to have the book he wis}:tes to use - a guessing game of which Oxonians notoriously tire. Brown has developed a system for bookform catalog production which will place a union catalog of Oxford's holdings in each of its libraries. CONVERSION The Bodleian is also the scene of the most ambitious of retrospective conversion projects. Involving 1,250,000 entries, it is by far the largest conversion project in either the United States or the United Kingdom. Entries being converted constitute the Bodley's so-called "1920 Catalogue," which includes the Bodley's holdings for imprints of 1920 and earlier. For some years the manuscript bookform slip catalog that houses these entries has been in advancing stages of deterioration, and indeed since 1930, entries have been revised in anticipation of printing the catalog. To reprint the catalog would require keyboarding the entries to prepare manuscript copy for the printer, who in turn would keyboard the entries again in set- ting type. There would be only one product from this process, namely a printed catalog. Bodleian officials wisely decided to do a single keyboard- ing that would convert the entries to machine readable form from which a multiplicity of products could be had, including a printed catalog. Brown has worked out details of schedules and procedures whereby conversion will take place during the next five years. A contractor employing optical character recognition techniques performs actual conversion, but the con- tractor does not edit, code, or proofread the entries, although he is respon- sible for accurate conversion. Brown has skillfully developed techniques to diminish the number of keystrokes required in conversion, and what with labor costs being lower in the United Kingdom than in the United States, the contractual cost of 4.17 pence per record is certainly low enough to 120 Journal of Library Automation Vol. 2/ 3 September, 1969 attract work from outside the United Kingdom.The most significant part of this operation is, however, the identification by computer program of the individual elements of information in the text. This puts into practice the concepts of John Jollilfe of the Bdtish Museum on the conversion of catalog data (16); it was Jolliffe who programmed Oxford's KDF 9 computer to convert the text coming on tapes from the contractor to true machine rec- ords that are compatible with the MARC II format. Despite the fact that these entries contain no subject heading tracings, they will constitute the first major source of retrospective machine readable cataloging records. The West Sussex County Library in Chichester and the University Li- brary at Newcastle-on-Tyne have already converted their catalogs to ma- chine readable form, the former having done somewhat less, and the latter somewhat more, than 200,000 entries. At Chichester, former library em- ployes did the job on the piece-work basis; at Newcastle the Computer Laboratory employed a special group ( 17). The large number of records produced by these conversion projects forces urgent consideration of files designed to house huge numbers of entries. Approaches to solutions of this problem have begun at the level of indi- vidual records or of file design as a whole. Nigel S. M. Cox, at Newcastle, one of Britain's most widely k'110wn library computer people and co-author of the best-selling The Computer and the Library (it has been translated even into Japanese ) , has developed a generalized file-handling system ( 18) based on individual records. Cox has demonstrated that his system is hospitable to demographic records as well as bibliographic records. His file handling will surely play a role in future library computerization. CIRCULATION Britain's first computerized circulation system went into operation in October 1966 ( 19) at the University of Southampton. Books contain eighty-colu,mn punched book cards which are passed through a Friden Collectadata together with a machine readable borrower's identification card. Punched paper tape is produced that is input to the computer sys- tem. The principal output is a nightly listing of charges having records in abbreviated form, with a print-out of the complete records being pro- duced once a week. The Southampton circulation system works well, and obviously the staff finds it easy to use. Borrowers also enjoy the system; when the Collectadata is down, as it occasionally is, circulation volume also goes down, for borrowers avoid filling out charge cards manually. F. H. Ayres and his colleagues at the AWRE Library at Aldermaston are a productive group in research and development. Aldermaston has a partially computerized circulation system wherein the computer segment of the system maintains control features of the circulation record file, but the master record is maintained manually ( 20). It is understood that the library of the Atomic Energy Research Estab- Library Computerization in the U. K. 121 lishment at Harwell is developing an on-line circulation system, but the West Sussex County Library is the only British library to have on-line access to a circulation file ( 8,21) . The circulation system in Chichester is both experimental and operational. The punch-paper-tape reading de- vices at the circulation desk and in the discharge room were specially designed by Elliott Computers for experimental application for library purposes. However, it appears that the experimental period is ending, and that the production of a new model is about to be marketed by Automated Library Systems Ltd. The experimental equipment at Chichester was to be replaced during summer, 1969, and six further installations intro- duced at the major regional branch libraries during the next two years. The on-line circulation records are housed on an IBM 2321 data cell in the Computer Centre in an adjacent County Council building. There is an IBM 27 40 terminal in the Library from which special inquiries are put to the file. For example, overdue notices are sent out by computer using the same records to which inquiries can be made, but there are sometimes lag periods, particularly over weekends, so that an overdue notice may be sent on a book already returned. When the borrower re- ports that he has already returned the book, the file is queried from the terminal. Processing of these special and time-consuming tasks is thereby greatly facilitated. On-line circulation files are a rarity, and the West Sussex County Library and the County Computer Centre are to be con- gratulated on their achievement. ACQUISITIONS The already mentioned acquisition system at Newcastle (2,22) has been in continuous and successful operation for over three years. Although the system does not handle large numbers of orders, there being only slightly more than a thousand active orders and four thousand inactive orders in the file at any one time, there is no reason to think that it could not cope with a larger volume. Output from the computer consists of purchase orders, the order file, claim notices, a fund commitment register, and an occasional list of orders by dealers. The City University Library has computerized its book fund accounting ( 23), its general library accounts, and its inventory of over 350 categories of furniture and equipment ( 24). The last procedure is unique. AM COS ( Aldermaston Mechanized Cataloging and Order System) ap- pears to be the British pioneer integrated acquisitions and cataloging sys- tem (25). The IBM 870 Document Writing System originally used for output became overburdened after it had produced the second bookform title catalog with classed subject and author indexes. A title listing is em- ployed in the main catalog because the Aldermaston group found in a separate study ( 26) that users as they approached the catalog possessed less than seventy-five percent accurate author information, while their information about titles was over ninety percent correct. 122 Journal of Library Automation Vol. 2/3 September, 1969 SERIALS The University of Liverpool Library (27) and The City University Li- brary ( 28 ) produce periodicals holding lists by computer. At Liverpool the list is restricted to scientific serials but contains 7,600 entries of hold- ings in 28 libraries, not all of which are university libraries. With each entry are holding information, the name of the library or libraries possess- ing the title, and the call number in each library. Similarly, The City University Library list contains holdings information and frequency of appearance for each title. The computer program at City University also puts out a list of titles for which issues may be expected during the com- ing month as well as of all titles having irregular issues. However, this procedme for checking in issues did not prove to be wholly satisfactory and is not currently in use. The Library of the Atomic Energy Research Establishment at Harwell also puts out a union holdings list for the several sections of the Library (29). In addition, the Harwell programs, which run on an IBM 360/ 65 and are written in FORTRAN IV, produce for review annual lists of cur- rent subscriptions taken by each library; it also produces annual lists of periodicals by subscription agencies supplying the periodicals. Dews (30,31) has described computer production of the Union List of Periodicals in Institute of Education Libraries. This union list first ap- peared about 1950, was republished annually, then biennially, as magni- tude of effort to revise it increased. Both the manipulation and type- setting programs employ the Newcastle file handling system. ASSESSMENT The most gratifying development in library computerization in the United Kingdom dming the last three years has been the rapid expansion of numbers of individuals who have made themselves competent in the field. Among the British participants at the Brasenose Conference were barely a half-dozen who had had first-hand experience in library compu- terization. The group has increased considerably more than tenfold and has brought quality of British library computerization to a level surpassed by none. Continuing advances depend on the calibre of those advancing; the competence of the present cadre assures exciting future developments. Perhaps the most distinguishing characteristic of library computerization in the United Kingdom as compared with that in North America is the relatively larger role played by public libraries. Indeed, it was the public libraries at Dorset and Camden that first used computers. American pub- lic librarians would do well to follow the lead of their British confreres. In general, Americans can learn from British imagination and accomplish- ment, can learn of exquisite refinements and major achievements. ' British librarians, particularly of large British libraries, have not been a notoriously chummy group. It is, therefore, interesting to observe com- Library Computerization in the U. K. 123 puterization bringing them together. The new style in solving problems made possible by the computer has suddenly made it clear that libraries heretofore deemed to have nothing in common now seem surprisingly alike. For example, bookform union catalogs at the Camden and West Sussex Public Libraries and at the Oxford libraries can now be seen to be essentially the same solution to the same problem. Although library computerization in the United Kingdom is but half the age of that in the United States, the quality if not the quantity of British research, development, and operation has rapidly pulled abreast of, and in some areas surpassed, American activities. REFERENCES l. Maidment, ·w. R.: "The Computer Catalogue in Camden," Library World, 07 (Aug. 1965), 40. 2. Line, M. V.: "Automation of Acquisition Records and Routine in the University Library, Newcastle upon Tyne," Program, 1 (June 1966), 1-4. 3. Kimber, R. T.: "Computer Applications in the Fields of Library Housekeeping and Information Processing," Program, 1 (July 1967), 5-25. 4. Kimber, R. T.: Automation in Libraries (Oxford, Pergamon Press, 1968), pp. 118-133. 5. Bundy, Mary Lee: "Metropolitan Public Library Use," Wilson Library Bulletin, (May 1967 ), 950-961. 6. Raisig, L. Miles; Smith, Meredith; Cuff, Renata; Kilgour, Frederick G.: "How Biomedical Investigators Use Library Books," BuUetin of the Medical Library Association, 54 (April 1966), 104-107. 7. Bearman, H. K. Gordon: "Automation and Librarianship-The Com- puter Era," Proceedings of the Public Libraries Conference Brighton, 1968, pp. 50-54. 8. Bearman, H. K. Gordon: "Library Computerisation in West Sussex," Program, 2 (July 1968), 53-58. 9. Sommerlad, M. J.: "Development of a Machine-Readable Catalogue at the University of Essex," Program, 1 (Oct. 1967), 1-3. 10. Cowburn, L. M.; Enright, B. J.: "Computerized U. D. C. Subject Index in The City University Library," Program, 1 (Jan. 1968), 1-5. ll. Evans, A. J.; Wall, R. A.: "Library Mechanization Projects at Laugh- borough University of Technology," Program, 1 (July, 1967), 1-4. 12. Carter, Kenneth : "Dorset County Library: Computers and Cata- loguing," Program, 2 (July 1968 ), 59-67. 13. BNB MARC Documentation Service Publications, Nos. 1 and 2 (Lon- don, Council of the British National Bibliography, Ltd., 1968). 14. Coward, R. E.: "The United Kingdom MARC Record Service," In Cox, Nigel S. M.; Grose, Michael W.: Organization and Handling of 124 Journal of Library Automation Vol. 2/3 September, 1969 Bibliographic Records by Computer (Hamden, Conn., Archon Books, 1967), pp. 105-115. 15. Ayres, F. H.: "Making the Most of MARC; Its Use for Selection, Ac- quisitions and Cataloguing," Program, 3 (April 1969), 30-37. 16. Jolliffe, J. W.: "The Tactics of Converting a Catalogue to Machine- Readable Form," Journal of Documentation, 24 (Sept. 1968), 149-158. 17. University of Newcastle upon Tyne : Catalogue Computerisation Project (September, 1968). 18. Cox, Nigel S. M.; Dews, J. D.: "The Newcastle File Hand~g Sys- tem," In op. cit. (note 13 ), pp. 1-20. 19. Woods, R. G.: "Use of an ICT 1907 Computer in Southampton Uni- versity Library, Report No. 3," Program, 2 (April1968), 30-33. 20. Ayres, F. H.; Cayless, C. F.; German, Janice A.: "Some Applications of Mechanization in a Large Special Library," Journal of Documenta- tU:m, 23 (March 1967 ), 34-44. 21. Kimber, R. T.: "An Operational Computerised Circulation System with On-Line Interrogation Capability," Program, 2 (Oct. 1968 ), 75-80. 22. Grose, M. W.; Jones, B.: "The Newcastle University Library Order System," in op. cit. (note 13 ), pp. 158-167. 23. Stevenson, C. L.; Cooper, J. A.: "A Computerised Accounts System at the City University Library," Program, 2 (April 1968), 15-29. 24. Enright, B. J.; and Cooper, J. A.: "The Housekeeping of Housekeep- ing; A Library Furniture and Equipment Inventory Program," Pro- gram, 2 (Jan. 1969), 125-134. 25. Ayres, F. H.; German, Janice; Loukes, N.; Searle, R. H.: AMCOS ( Aldermaston Mechanised Cataloguing and Ordering System ). Part 1, Planning for the IBM 870 System; Part 2, Stage One Operational. Nos. 67/ 11, 68/ 10, Aug. 1967, Nov. 1968. 26. Ayres, F. H. ; German, Janice; Loukes, N.; Searle, R. H.: "Author versus Title : A Comparative Survey of the Accuracy of the Informa- tion which the User Brings to the Library Catalogue," Journal of Documentation, 24 (Dec. 1968), 266-272. 27. Cheeseman, F .: "University of Liverpool Finding List of Scientific Medical and Technical Periodicals," Program. 1 (April 1967 ) , 1-4. 28. Enright, B. J. : "An Experimental Periodicals Checking List," Program, 1 (Oct. 1967), 4-11. 29. Bishop, S. M.: "Periodical Records on Punched Cards at AERE Li- brary, Harwell," Program, 3 (April 1969), 11-18. 30. Dews, J. D.: "The Union List of Periodicals in Institute of Education Libraries," In op. cit. (note 13), pp. 22-29. 31. Dews, J. D.; Smethurst, J. M.: The Institute of Education Union List , of Periodicals Processing System (Newcastle upon Tyne, Oriel Press, 1969). 4658 ---- THE MARC SORT PROGRAM John C. RATHER: Specialist in Technical Processes Research, and Jerry G. PENNINGTON: Information Systems Mathematician, Library of Congress, Washington, D.C. 125 Describes the characte1·istics, performance, and potential of SKED (Sort- Key Edit), a generalized computer program for creating sort keys for MARC II records at the users option. SKED and a modification of the IBM S/360 DOS tape sort/merge program form the basis for a compre- hensive program for arranging catalog entries by computer. THE ROLE OF SORTING IN THE MARC SYSTEM Many present and potential uses of cataloging data in machine readable form require that the input sequence of the records be altered before output. The production of book catalogs, bibliographical lists, and similar output products benefits from an efficient means for arranging the records in a more sophisticated way than mere alphabetical order or, even worse, the collating sequence of a particular computer. Internal files, such as special tape indexes, also may require sequencing by sort keys that dif- fer from the actual character strings in the records. The demonstration of the feasibility of filing catalog entries by computer hinges on successfully pedorming two tasks: 1) analyzing the require- ments of particular filing arrangements; and 2) programming the com- puter to perform the required operations. Actually, the two tasks are inter- dependent, because the nature of the filing analysis is strongly influenced by the ability of the computer to perform certain types of operations. The requirements for filing arrangement were considered at the genesis of the MARC project ( 1) and they materially affected the characteristics of the MARC II format ( 2,3). Structuring the format of a machine record is only part of the solution to the problem, however. 126 Journal of Library Automation Vol. 2/3 September, 1969 The first requirement for a program for library sorting is a set of gen- eralized computer techniques for creating sort keys from MARC records at the user's option. These techniques will provide the foundation for further refinement of the sorting capability by developing algorithms to resolve specific problems in file arrangement. This article describes the characteristics, performance, and potential of a generalized program de- veloped by the Information Systems Office and the Technical Processes Research Office of the Library of Congress. The present approach to the computer sorting problem was based on the following assumptions: 1) The sort key must be generated on demand. For maximum flexi- bility and economy of storage, it should not be a permanent part of the machine readable record. 2) Data to be sorted must be processed (edited) for sorting by the machine. Input to a data field should be in the form required for cataloging purposes; it should not be contrived simply to satisfy the requirements of filing. 3) All data elements contributing to a sort key must be fully repre- sented. To determine the position of an entry in a large file, the filing elements must be considered in turn until the discrimination point is reached. No element may be truncated to make room for another. 4) At least initially, a manufacturer's program should be used for sort- ing and merging the records with sort keys. Given the Library's present machine configuration, this means using IBM S /360 DOS tape sort/merge program. These assumptions shaped the course that was followed. The requirement that the sort key be generated on demand meant that a program had to be written to build sort keys specifically for records in the MARC II format. To allow maximum flexibility in specifying elements to be included in the sort key, the basic program was to be highly gen- eralized, allowing any combination of fixed and variable field data to be included in the sort key. Since several data elements may have to be considered to determine the proper location of an item in a complex file, it is desirable to construct a single sort key containing as many characters of each element considered in turn as the length of the key will allow. Using a single sort key is more efficient than using separate keys for each element. THE MARC II FORMAT The MARC sort program was written to handle records in the process- ing format used by the Library of Congress. The differences between thi,s format and the MARC II communications format ( 2,3,4) have been de- scribed by Avram and Droz (5). For the purposes of the present article, it is sufficient to give a brief outline of the structure of the format as it MARC Sort Progmm 127 relates to computer sorting capability and to describe the salient features of the content designators that facilitate the creation of sort keys. MARC records are undefined; that is, they vary in length, and infor- mation is not provided at the beginning of each record for use by soft- ware input/ output macros. Since the manufacturer's program used for sorting MARC records cannot handle undefined records, preparation for sorting must include changing them from one type to the other. At the end of the sort/ merge phase, they must be returned to an undefined state. The maximum physical length of a MARC record is 2048 bytes. If a logical record (that is, the bibliographical data plus machine format data) requires more than 2048 bytes, it must be divided into two (or more) physical records. At present, the MARC sort program cannot handle con- tinuation records of this type. The basic structure of the format includes a leader, a record directory, and variable fields. Each variable field is identified by a unique tag in the directory. If necessary, the data in a field can be defined more pre- cisely by indicators and subfield codes. They appear at the beginning of the field separated by delimiters. When no indicator is needed, the field begins with a delimiter. Tags, indicators, and subfield codes are used to specify what variable field data are to be included in the sort key, how the data are to be arranged, and what special treatment may be required. Although the full potential of these content designators has yet to be realized, they provide a basis for programming to achieve content-related filing arrangements; for example, placement of a single surname before other headings with the same character string. CHARACTERISTICS OF THE MARC SORT PROGRAM The MARC sort program has three components: 1) a sort-key edit program ( SKED); 2) the sorting capability of the IBM S / 360 DOS tape sort/merge program (TSRT); and 3) a merge routine written expressly for the MARC sort program. The MARC sort program is activated by a set of control cards supplied by the user. These control cards specify the parameters to be observed in processing each record. Using this information, SKED reads each rec- ord, builds as many sort keys as are required to satisfy the parameters, duplicates the master record each time a different key is constructed, and records information about the sort key and the master record for possible later use. The output of SKED is an intermediate MARC sort file containing records with sort keys. The second phase of the program involves TSRT, which also is con- trolled by parameter cards. The input is the intermediate MARC sort file. The TSRT program sorts the records according to their keys, using a standard collating sequence (that is, according to the order of the bit- configurations of the characters in the keys). The output can take either or both of two forms: 1) MARC format, in which the sort key is stripped 128 Journal of Library Automation Vol. 2/ 3 September, 1969 Merge i Sort-key edit (SKED) program. Sort Fig. 1. MARC Sort Program Data Flow. SKED parameter cards MARC Sort Program 129 from each record and the format is returned to an undefined state; or 2) intermediate MARC sort format, which is identical with the input to the TSRT program (sort key remains with the record). The merge routine written especially for the MARC sort program pro- vides the capability to merge two or more files produced previously by TSRT in the intermediate MARC format and to output files in either or both of the above formats. It is necessary to provide a separate program for the merge function, since the manufacturer-supplied sort/merge pack- age does not have the capability to merge intermediate MARC records while producing MARC II output. Figure 1 shows a simplified flow chart of the program. Sort-Key Edit Program ( SKED) SKED builds sort keys according to the parameters specified at run time. In this process, it uses a table to translate the data to equalize upper- and lower-case letters, eliminate unwanted characters (e.g., dia- critics, punctuation), and to provide a more desirable collating sequence during the sort phase. If the parameters result in more than one sort. key for a record, the record is duplicated each time a new key is built. The sort key is attached to the front of the MARC record when both are written in the intermediate MARC sort file. This is a variable-length, blocked file with a maximum block length of 4612 bytes (minimum block- ing factor of 2). Figure 2 shows diagrammatically how a record looks after it has been processed by SKED. Communi- Block Record Sort Key Sort Leader cations Control Fixed Direc~ Variable Length Length Length Key Area Field Field tory Fields I I I I I I I I l K -----2 or more records blocked-----~ Fig. 2. Schematic Diagram of an Intermediate MARC Sort Record. Records in the master file that do not satisfy the parameters for a par- ticular processing cycle are written in an exception file which is in the same format as the original master file (that is, undefined). A utility pro- gram can be used to list the contents of the exception file. TSRT requires the specification of the number of control fields and certain related information about each such field. As many as twelve control fields (each with a maximum length of 256 bytes) can be accommo- dated by the program. The current implementation of the MARC sort program uses a 256-byte key starting in position 9 of each record. (The first 8 bytes are used for variable record information). Any change in the length 130 Journal of Library Automation Vol. 2/ 3 September, 1969 of the sort key must be reflected in the SKED source deck and on the control cards for TSRT. The specification of control fields shown on a TSRT control card must be changed as follows: If the length of the sort key is shortened, then the control field length specification must be reduced. If the sort key is lengthened, then the control field must be split into two or more control fields, as follows: key length Number of control fields = 256 (If the quotient is a fraction, use the next higher integer.) Parameters The control cards for a SKED processing cycle allow the user to specify the following options: 1) Type of Field. Both fixed and variable fields may be specified as parameters for a sort key. There is no restriction on the order in which they are given. 2) Specification of Fields. Fields may be specified in several ways: a) exact form: a specific tag for a variable field (e.g., 650) or the address of a fixed field (the only option for this type of field); b) generalized form: NXX, NNX, XNN, NXN, where any digit may be substituted for N (e.g., !XX ); and c ) as a range: NNN- XXX (e.g. 600-651) . 3) Selection of Data from a Field. The amount of data from a field to be processed can be determined in any of three ways: a) Specifying the variable field tag without specifying particular sub- field codes associated with it. This results in all data in the field being processed. b) Specifying the number of characters to be processed. This must be done for fixed fields even if all data are desired. With either type of field, the data will be truncated if the number specified is smaller than the number of characters in the field. c) Specifying the particular subfield codes associated with a variable field tag. This results in the sort key containing only the data from the specified subfields. For example, if the data in a 100 field were Smith, John, 1910- ed., failure to include subfield "e, (the designator for a relator like ''ed.") in the specification of subfields would result in its being excluded from the sort key. 4) Alternate Selection. Two or more parameters may be specified for the same position in the sort key with the provision that only the first to be found will be used. For example, if 240 (uniform title) and 245 (bibliographical title) are specified as alternate selections in that order and both occur in a record, preference is given to 240 and only it is used in the sort key. MARC Sort Program 131 5) Multiple Parametric Levels. For efficient processing, mutually exclusively para~eters can be listed in the instructions for the same processing cycle. The program affords a means of distinguishing between primary parameters that must always appear in the sort key and secondary parameters that cannot be combined with one another. The user also has the option of specifying that a sort key is to be generated using only the primary parameters. For example, if a book catalog were to contain main entries, added entries, and sub- jects, the tags for added entries and subjects would be specified as sec- ondary parameters and the tags for the main entry, title, and imprint date as primary parameters. The sort key built for each added entry and each subject entry would always include the main entry (if present), title, and imprint date. This option can be by-passed if, for example, only a subject catalog is desired. 6) Sequence of Subfield Codes. The subfield codes for a variable field may be specified not only to con- trol the data to be included in the sort key but also to determine the order in which it appears. The following example shows how this works: Tag Subfield Codes Data Record 100 abed Charles t II t King of Great Britain,f 1630-1685 SKED parameter 100 acbd Sort key ( CHARLES KING OF GREAT BRITAIN II 1630 1685IJ 7) Separator. The user must specify the character that will separate each data element in the sort key but he has a choice of the character to be used. When the required characters from a given data element have been moved to the sort key, the selected separator is inserted to mark the end of the element. The separator is one of a set of specially contrived characters called superblanks that sort lower than any other character, including a b1ank. Use of the superblank permits the combination of different data elements in the same sort key because it prevents unlike data elements from sorting against one another, as shown below: {L-B_AL_L_J_oHN_o_ARTH __ Ull_TH_!_l._I_M_G ___ 7,.. l BALL JOHN ARTHUR 0 CHESS ~ ....____ ---· __ .r Without the superb lank (shown here for convenience as a bullet) the second sort key would be placed before the first. Later it is expected that use of different superblanks within data elements will enable the sort/merge program to group related headings together. .. . ' 132 Journal of Library Automation Vol. 2/3 September, 1969 8) Acceptance/ Rejection Indicator. At the beginning of the processing cycle, a decision can be made as to whether a MARC record should be processed if it does not include a particular parameter. If the rejection indicator is set, the record will be written to the exception file if that parameter does not occur. For example, if the parameters are 1XX (any main entry), 245 (bibliographical title ), and imprint date (taken from the fixed field), a record without a main entry tag could be accepted but a record without a title rejected. This allows for title main entries while excluding imperfect records that may be present. 9) Duplication. The parameters for variable fields to be included in the sort key may be satisfied by more than one combination of tags in the directory for a single MARC record. To provide for this occurrence, a duplication indi- cator may be set, thereby insuring that a sort key will be generated for each combination that satisfies the parameters. In addition the entire rec- ord will be duplicated so that it can accompany each sort key. 10) Number of Parameters. The number of parameters designating data elements for the sort key is determined by the user at the beginning of the processing cycle. Any number up to twenty may be specified. It is unlikely that any given sort key will contain more than four or five parameters, but the ability to specify a much larger number allows for processing sort keys of several different types during the same cycle. Translation Table Before data characters are moved from a designated field in a MARC record to the sort key, they must be edited to insure that the key will include only characters that are relevant for sorting. This involves trans- lation of the characters into the SKED character set: 1) to equate upper and lower case versions of the same alpha characters; 2) to translate the period, comma, and hyphen to the bit configuration of an ordinary blank; 3) to reduce other punctuation, diacritics, and special characters to a single bit configuration that cannot be moved to the sort key; and 4) to insure the proper machine collating sequence (blank, 0 - 9, A- Z). The SKED character set also provides bit configurations for superblanks (see above). The translation routine is written so that the character set can be changed without programming complications. SKED also includes a feature that safeguards the sort key against the possibility of two consecutive blanks, as would be the case when a period and a blank occur in sequence, or when the data erroneously include two blanks when only one should occur. Before a character with a bit con~ figuration equal to a blank is moved to the sort key, the program deter- mines whether the last character moved has the same configuration. If it does, the second character is not moved. MARC Sort Prog1·am 133 Other Options SKED has the optional capability of adding two variable data fields and their corresponding directory entries to each record. These entries follow the format for data in other MARC II variable fields. 1) 998 Entry. When the SKED capability to duplicate records is used, it may be desirable to label one record of the set as the "master record". This tech- nique or a modification of it might be used to generate a reference from a partial record to the full (master) record in a book catalog. When this option is selected there will be one, and only one, 998 field generated with each record. Infor,mation about the master record will be given by listing the tags used in the sort key to achieve a unique position of that record on file. For example, if a master record is sorted by main entry, then title (if different from main entry), and finally by the date of publi- cation, the 998 field describing this master record should list the lXX tag, followed by the 2XX tag, and finally by the address !lnd length of the fixed field containing the publication date. The order of the tags in the 998 field is the same sequence used in the sort key of the master record. 2) 999 Entry. When a book catalog is produced, it is desirable to show on the first line of an entry the element (e.g., title, subject) that determined the position of the entry in the arrangement. SKED supplies this information by creating a variable field (tagged 999) containing the initial element of the sort tag. If this option is chosen, an indicator can be set in the 999 field to show that the data in it should be printed as the first line of the bibliographic printout. Sort/ Merge The TSRT program used by the MARC sort program is the standard IBM-supplied IBM System/ 360 basic operating system tape sort/ merge program. Design specifications for this program satisfy the sorting and merging requirements of tape-oriented systems with at least 16K bytes of main storage. This program enables the user to sort files of random rec- ords, or merge multiple files of sequential records, into one sequential file. If any inherent sequencing exists within the input file, the program will take advantage of it. The intermediate MARC sort file produced by SKED is acceptable to TSRT. As stated earlier, TSRT can accommodate up to twelve control fields for sorting. The MARC sort program requires only one control field at present. It is important to note that the TSRT comparison routines end as soon as a character in one control field is different from the correspond- ing character in another conh·ol field. TSRT operates in four phases: assignment (Phase 0); internal-sort (Phase 1); external-sort (Phase 2); merge-only (Phase 3). If sorting is 134 Journal of Library Automation Vol. 2/3 September, 1969 to be done, the assignment, internal-sort, and external-sort phases are executed. If only merging is to be done, the assignment and merge-only phases are executed. TSRT provides various exits from the main line to enable a user to insert his own routines. Exit 22 in the external sort has been provided to delete records. In the MARC sort program phase this exit is used as an option for stripping the key and returning the records to standard MARC II format (undefined 2040 bytes maximum). The user exit inter- cepts each sorted record prior to output and converts it to an undefined state. The option is provided by addition of a "MODS" control card . A How chart of TSRT appears as Figure 3. User exit 2 2 Assignment phase (Phase 0 ) Internal sor t phase (Phase 1 ) External sort phase (Phase 2) End o f progr am Merged only ( Pha!e 3) Fig. 3. Flow Chart of the Sort/Merge Phases of the MARC Sort Program. MARC S01·t Program 135 Four work tapes are used by this application of TSRT. A fifth drive is used for input and output. The IBM sort package is not capable of writing undefined length rec- ords. Since the MARC record is in an undefined format, the output rou- tines of TSRT cannot be used. Therefore, the following method is used to develop the MARC output file: 1) a separate file receives the sort output instead of the standard sort out-file; 2) the separate file is written by special coding in Exit 22 of TSRT; and 3) each record is written in such a way as to prevent the sort from also writing the record. The merge routine written especially for the MARC sort program will output either intermediate (with key) or MARC II (without key) format tapes or both. THE PROGRAM IN ACTION Processing Times SKED was written in assembler language, using physical input/ output control system and dynamic buffering assignments to achieve speed. The amount of time required to process a particular record is affected by the applicability of run-time parameters on the record. For example, if the user specifies twelve data elements from twelve different variable fields for inclusion in the sort key, the processing time will be greater than that required for a run with only one specification. Likewise, a run requesting duplication of each record for every added entry will require more com- puter time than another run that does not duplicate records at all. Since the total processing time required for a file of n records will be the same as the time to process n files of one record each (disregarding I/0 considerations) it is possible to project times using a run with various data conditions and numerous SKED parameters as a guide. One such run on the IBM S / 360 at the Library of Congress processed records at the rate of 2,400 per minute. Twelve parameters were specified and records were duplicated for certain subject entries. Except for time spent in changing tape reels, SKED can be expected to process records at the same rate regardless of the size of the file. The processing time for TSRT is affected by the same characteristics that affect most sort programs. Some of these are as follows: 1) amount of memory available to the sort; 2) number of storage units (in LC' s case, tape units are used); 3) types of storage unit (for magnetic tape-inter- record gap, density, and tape length); 4) block size for data; and 5) amount of bias in the input. The only characteristic of SKED that seems to relate to the speed with which TSRT operates has to do with SKED's extended use of a single control field. For example, in many sorting systems, if records are to be arranged by main entry and within main entry by title and then by date, three control fields would be specified. One would be chosen for main entry; one would be chosen for title; and, one would be selected for the 136 Journal of Library Automation Vol. 2/ 3 September, 1969 date. SKED places all of these within the same control field, separating them by a superblank. Since TSRT is required to discriminate only on the single control field, a smaller amount of processing time is needed than would be the case if several control fields were used. Results Although SKED does not have the ability to make the refined distinc- tions among headings required for sophisticated filing arrangements, it performs in a workmanlike way in producing alphabetical sequences that are unaffected by the presence of diacritical marks and vagaries in punc- tuation and spacing. Moreover, the collating sequence (blank, 0 - 9, A - Z) insures that short names will file before longer ones beginning with the same character string. The ability to truncate headings to remove relators (e.g., ed.) also insmes the creation of a single sequence for authors whose names are sometimes qualified in this way. The following consolidated example shows some of the arrangements produced by SKED. To simplify the presentation, generally only the first filing element is given. Other elements have been added if they are needed to show distinctions that were made by the program. Abbott, Charles ABC Company. A "Beckett, Gilbert Acadia University. Alexander III, King of Albania Alexander II, King of Bulgaria Alexander I, King of Russia Bradley, Bradley and Bradley, firm. Bradley, Milton, 1836-1911. Bradley (Milton) Company Katz, Eric, ed. Sound about our ears. Katz, Eric. Sound in space. Lincoln, Abraham, Pres. U. S., 1809-1865. Lincoln Co., Cr.-Directories Lincoln County coast directory Lincoln, David. Lincoln Highway Lincoln, Marshall. Lincoln, Mass.-History Lincoln, Me.-Genealogy - MARC Sort Program 137 London, Albert, joint author. [Mockridge, Norman.] (author not used in sort key) Inside the law. London, Albert. London at night. London at night. London. Central Criminal Court. London. County Council. London, Declaration of, 1909. London-Description London (Diocese) Courts. London, Jack. White fang. 1930. London, Jack. White fang. 1950. London, Jack White. Alaskan adventure. London, Ont. Council London, Ontario; a history London. Ordinances. London-Social conditions Smith, John, 1900- Smith, John, 1901-1965. Smith, John Allan, 1900- Smith, John, clockmaker. ANTICIPATED DEVELOPMENTS At the present stage of the development of the MARC sort program, SKED does not have the ability to treat data in a field according to their semantic content. It cannot, for example, treat a character string in a 100 field in a special way because it is a single surname as opposed to a fore- name or multiple surname. Nor does SKED include routines for treating abbreviations and digits as if spelled out, or for suppressing data in a given field in some cases but not in others. The achievement of these capabilities will require: 1) development of a generalized technique for taking account of indicators in processing data in variable fields; 2) devising algorithms to handle particular filing situations related to the content of the field; and 3) placement of the algorithms within the framework of the SKED program. The refinement of SKED is being undertaken in relation to the prob- lem of maintaining the LC subject heading list in machine readable form. Techniques developed for this purpose will be applicable also to filing for book catalogs and other listings. The result should be a firm founda- tion for a comprehensive program for arranging bibliographic entries by computer. 138 Journal of Library Automation Vol. 2/ 3 September, 1969 AVAILABILITY OF THE PROGRAM Since the MARC sort program should be useful to libraries that sub- scribe to the MARC Distribution Setvice, the package (consisting of SKED and the modified version of TSRT) has been filed with the IBM Program Information Department. Requests should be made through a local branch office of IBM and should cite the following number: 360D- 06.1.005. REFERENCES 1. U. S. Library of Congress. Office of the Information Systems Special- ist: A Proposed Format for a Standardized Machine-Readable Rec- ord. Prepared by Henriette D. Avram, Ruth S. Freitag, Kay D. Guiles. ISS Planning Memorandum, No.3. (Washington, D. C.: 1965), p . 5. 2. U. S. Library of Congress. Information Systems Office. The MARC II Formats A Communications Format for Bibliographic Data. Pre- pared ?Y Henriette D. Avram, John F. Knapp, and Lucia J. Rather. (Washmgton, D. C.: 1968), p. 33. 3. "Preliminary Guidelines for the Library of Congress, National Library of Medicine, and National Agricultural Library Implementation of the Proposed American Standard for a Format for Bibliographic In- formation Interchange on Magnetic Tape as Applied to Records Rep- resenting Monographic Materials in Textual Printed Form (Books)," Journal of Library Automation, 2 (June 1969), 68-83. 4. U. S. Library of Congress, Information Systems Office: Subscriber's Guide to the MARC Distribution Service. (Washington, D. C.: 1968). 5. Avram, Henriette D.; Droz, Julius R.: "MARC II and COBOL," Jour- nal of Library Automation, 1 (December 1968), 261-272. 4659 ---- KWIC INDEX TO GOVERNMENT PUBLICATIONS Margaret NORDEN: Reference Librarian, Rush Rhees Library, University of Rochester, Rochester, New York 139 United States and United Nations publications were not efficiently proc- essed nor readily available to the reader at Brandeis University Library. Data processing equipment was used to make a list of this material which could be referred to by a computer produced KWIC index. Currency and availability to the user, and time and cost efficiencies for the library were given precedence over detailed subject access. United States and United Nations classification schemes> and existing bibliographies and indexes were used extensively. Collections of publications of the United States government and the United Nations are unwieldy and, often, unused. Orne (1), Kane (2), and Morehead ( 3) have acknowledged that much of the output of pro- liferating governmental agencies and government supported research cen- ters is hardly accessible. Successful attempts to control the literature of a particular subject field, such as the indexes to the Human Relations Area Files and the American Political Science Review, have been com- piled by Kenneth Janda ( 4). Others ( 5,6,7,8,) have described projects which apply the KWIC index method of control to industrial research reports. No similar attempt to control government publications has been reported, although at Northeastern University data processing equipment has been used to list United States material. The index developed at Brandeis University Library was designed to accommodate the varied government publications held by a library which served student, faculty and researcher alike. 140 Journal of Lib-rary Automation Vol. 2/3 September, 1969 MATERIALS AND METHOD Brandeis became a selective United States document depository late in 1965. Two years later a Government Documents Department was cre- ated to handle all United States publications, as well as those of the United Nations. About 15,000 United States publications and a smaller number of United Nations publications had previously been acquired and processed as a regular part of the library collection. This material formed the nucleus of the documents collection, to which some 3,000 pieces were added yearly. The new Department ordered and received all publications issued by federal government agencies and the United Nations, but proc- essed and serviced only about 80% of them. Materials that had been acquired for the Science Library or special collections, such as Reserve, were directed to regular library processing departments. The materials retained were classified and arranged according to the Superintendent of Documents classification and the United Nations scheme wherever such numbers were available. All previously cataloged items were re- moved from the regular collection and scheduled for reclassification. Only where Superintendent of Documents and United Nations numbers were not available was Library of Congress classification retained or assigned. The collection then consisted of material arranged in three sections ac- cording to the classifications of the Superintendent of Documents, the United Nations, and the Library of Congress. The KWIC index included all United States and United Nations pub- lications located in the Documents Department. The reader was reminded that additional material issued by those government publishers, housed elsewhere in the libraries, was included in the library catalog. Prefatory material included a list of symbols and abbreviations. A two-part index to issuing agencies, represented by six-letter mnemonic acronyms, was arranged alphabetically by acronym, and by bureau name. The r eader was cautioned to consult the United States Government Organization Manual and a United Nations organization chart for identification of gov- ernment agencies and for tracing frequent changes in their structure and nomenclature. The Documents list consisted of two parts: one, an accession number listing; and two, a KWIC index to part one. Upon arrival at the Library, publications were numbered and IBM cards were punched according to format cards that described allocation of columns: Column Card 1 1-6 7 8-13 14-79 80 Information item number card number author agency title field blank Column Card 3 1-6 7 8-20 21-54 55-79 80 Information item number card number procedural data holdings KWIC Index 141 classification number blank Cards one and three were punched for all documents; however, cards two and four were punched only where data exceded the prescribed spaces on cards one or three. Columns one through six were reserved for the accession numbers. A special punch in column one was used to identify United Nations documents so that they were listed after the United States sequence. Column seven indicated the card number for a given document and was suppressed in the print-out. The title field in- cluded not only the title, but series and number, personal author and monographic date where this information was suitable. The flexible field was used for any information for which the librarian wished KWIC cards. A cross reference or explanatory note about the location of publications of a quasi-independent agency was incorporated in the title field. The procedural data included type of publication, binding and frequency in- formation, accounting data and similar notations. A sample of part one has been reproduced in Figure 1. Part two of the list, the KWIC index, was produced by an IBM 1620 computer, model one with 40K memory. An excerpt has been reproduced in Figure 2. Only cards one and two were put into the computer along with the program and dictionary of exceptions. Cards three and four were not used to produce the KWIC index. The program required pro- duction of cards for author acronyms and for all keywords found in the title field. Except in the cases of author acronyms and first words, a key- word was identified by the fact that it followed a blank space. Blanks were not necessary in these two cases because they were incorporated in the computer program. Single letters, integers, and exceptions were not considered keywords. The index was printed so that the accession number always appeared on the left, and the author agency was followed by an asterik and a space. The wraparound format usually associated with KWIC indexes was abandoned to improve visual clarity. RESULTS About eight months after its inception, 2600 items had been entered on a separate Documents Collection list. The list had been printed off- line on three-part print-out paper interleaved with carbon. Mter the pa- pers were reassembled in looseleaf binders, they were made available in the Documents and Reference Departments and in the Science Library. ·•' ' 142 Journal of Library Automation Vol. 2/3 September, 1969 131) .JNTPUB 130 32 22 TRANS. LATIONS ON COMMUNIST CHINAe 6 50oOONOo1t1968-TO DATE Y3oJ66/13 131 Rt.RAL 131 42 22 FEDERAL PROGRAMS AVAILABLE TO ASSIST RURAL AMERICAo 1968o· 60EPOST 1968 ' A97 o2/L5Z 132 FEDI\Ar 132 ~ DIRECTORY OF RESEARCH NATURAL AR~AS ON FEDERAL LANDS OF THE u.s. 1968. nz 4Z 22 60EPOST 1968 Y3 oF 31/19/9/968 133 JNTPUB 133 TR-ANSLATIONS ON EAST EUROPEAN AGRICUl;TUREt FORESTRY + FOODt INDUSTRIES, 133 32 22 6GIFT I/NOo117ol963-TO' DATE Y3oJ66/13 IRANSLAIIONS ON. EASI EUROPEAN FOREIGN IRA 134 JNIPOB 134 :>2 22 6 15oOO//NOo80tl963-TO DATE Y3oJ66/l3 135 JNTPUB 135 TRANSLATIONS ON ECONOMIC ORGANIZATION AND MANAGEMENT IN EASTERN EUROPE. 135 32 22 6 65,00N0ol44t 1963-TO DATE Y3oJ66/l3 136 JNTPllB TRANSLATIONS ON MONGOI.!Ao 136 32 22 6 13,00NOo19t 1963-TO' DATE 137 JNTPU'B TRANSLATIONS ON NORTH KOREAo 137 32 22 6 20oOONOo1t 1966-TO DATE lio JNTPUB TRANSLATIONS ON NORTH VIETNAMo 138 3'2 22 6 70,00NOt1't 1966-TO DATE 139 JNTPUB TRANSLATIONS ON SOUTH + EAST ASIAo 139 32 22 6GIFT 1963- TO DATE 140 JNTPUB TRANSLATIONS ON LATIN AMERICAo 140 32 22 6150,001967-TO DATE 141 JNTPUB TRANSLATIONS ON THE NEAR EAST. 141 32 22 6 50,001966-TO DATE 143 JNTFUB 143 32 22 TRANSLATIONS ON U,S,S,Ro AGRICULTUREo 6 '10o001967-TO DATE Y3oJ66/l3 Y3oJ66/13 Y3oJ66/13 Y3oJ66/13 Y3tJ66/13 Y3tJ66/13 Y3oJ66/l3 144 Jf\;TPU6 144 32 22 u,S,S,R, ECONOMY AND !NDUSTRYo GENERAL INFORMATION, 6 40,001966-TO DATE Y3oJ66/13 145 L!BCON ACCESSIONS LISTt CEYLONo 145 32 22 6PI..-480//l967-TO DATE LCle3017/VOLt/NOo 146 LIBCON ACCESSIONS L!STt !ND!Ao 146 32 22 5PL-480//l963-TO DATE LC1,301VOLo/NOo 147 LIBCON ACCESSIONS LISTtiNDONESIA o 147 JZ 2 2 6PL-4801964-TO DATE LC1o30/5/VOL.INOo 1'48 LIBCON ACCESSIONS LISTt MIDDLE EAST, 148 3Z 2Z 5PL-480//1963-TO DATE LClo 30/3/VO.Lo/NOo 149 LIBCON ACCESSIO NS LISTt NEPALo Fig. 1. Accession Number List. 166 113 6 68 66 164 117 113 64 56 163 44 19 31 32 3 71 83 108 195 176 190 78 1~ 172 160 66 123 11 74 18 179 30 62 174 l2.3 177 42 93 . 118 179 166 155 9COOC2 96 176 90vv.J3 139 133 134 128 129 135 55 141 148 KWIC Index 143 CONGRESSo 1949-1951. DESPER * REPTo TO THE PRESIDENT AND THE CONGRESSIONAl OTRE(TORYo CONGRS *OFFICIAl CONGRESSIONAL DISTRICT DATA BOOKtREDISTRICTEO STATES• SUPPe CENSUS * CONGRESSIONAL RECORDo CONGRS * CONGRS * CONGRESSIONAL RECORDo CONGRS * FACTUAl.. CAMPAIGN INFORMATIONt1968o CONGRS * MESSAGE OF THE PRESIDENT OF THE U,S, AND ACCOMPANYING DOCUMENTS CONGRS * OFFICIAl (ONG!iESS!ONAL. DIRECTORYo CONREP * CALENDARS OF THE UoSo HOUSE-OF-REPRESENTATIVES, CONREP * LAWS RELATING TO SOCIAL SECURITY + UNEMPLOYMENT COMPENSATJONtl9 CONSEN * CALENDAR OF BUSINESS, CONSEN * NOMINATION + ELECTION OF PRESIDENT + VICE-PRESIDENT OF THE U,St CONSERVATION, AGRSCS * SO!~ CONSIR!!CTION REPTSo oHOUS!NG AUTHORIZED BY......B.\.I.l.L.D.lNG. P.E.RMJJ_a_~El!~Y._S_* __ CONSTRUCTION REPTSttHOUSING SALESo CENSUS * CONSTRUCTION REPTSot HOUSING AUTHORIZED BY BUILDING PERMITS, CENSUS * CONSUMER PRICE INDEXES, LSBLAB * CONTEMPORARY ARTISTt PUBo 4730o 1968, SMITHS * THE ARMED FORCES OF THE U CONTRIBUTIONSt OASI-NOo HEWSAA * SOCIAL SECURITY CO•lPERAT I VE WATER RESOJJ_RC..E.S_f3~.E.ft..B~l:LA.~-D_.I.R~ U:U.I'l.Y.Lli~_1_a_l_N_T_~.!H__::*---­ CORONARY DRUG PROJECTt PHS PUBo 1965t REVo l968o HEWPHS * THE CORPS• l967o OPPORT * JOB COST OF CLEAN WATERo 1968o WTRPOL * COUNTRIES FOR ~!SA OF u,s, PASSPORTS, STATED * FEES CHARGED BY FOREIGN COUNTY BUSINESS PATTERNSe CENSUS * COURT DECISIONS, SPCTJU * SUPREME COURTS, 1966 t1967 • CHILDB •. LEGAL- !lil!C'iOGRAT'H;rF"® ·.ruvE'NlLE- ARo-F'A'FiltV·- CRIME + DELINQUENCY ABSTRACTS, HEWPHS * CRIME DELINQUENCY ABSTRACTS, BEFORE 1965 SEE INTERNATIONAL BIBLIOGRAPHY ON CRIME + DELINQUENCYo CRISIS OF OUR CITIESo 1968, PRESDT * MEETING THE INSURANCE CRISIStFACTSt MYTH + SOCIAL CHANGEo 1967• HEWWEL * RURAL YOUTH IN CR')PS + MARKET So FORAGR * FOREIGN AG.RICULT(iR-E INCLUDING • • - • --·- ·~ CT-1o 1967o CENSUS .* DATA ACCESS DESCRIPTIONSt CENSUS TABULATIONS AVAtLA DEAF PEOPLEo1968o VOCREH * VOCATIONAL REHABI~ITATION OF DEFCIV * SAFETY REVIEWo DEFWRO 4 NAVAL RESEARCH REVIEWSt DEL!NQUENCY ABSTRACTS'!. . .':!.~ WPI-)S * CRIME '!' . DEMONSTRATION FINDINGSo LABORD * MDTAtEXPERIMENTAL + DEPARTMENT OF STATE BULLe STATED * DEPARTMENT STORE SALES IN SELECTED AREAS, CENSUS * SPECIAL cURRENT BUSIN DEPTot OR MISSOURI-MISCELLANEOUSo 1863e COMWAR 4 REPTo WESTERN DESCRIPTIONS• CENSUS TABULATIONS AVAILABLE ON COMPUTER TAP! SERIES• CT-1 DESPER * REPTo TO THE PR.E.S.UlE.Iil._AN_D_.JJ:I.E. _CON~R.ES.So_,_ ].9!+9-J.9.-=5~1.t.•=--:-:--=---::=::::­ DEVELOPMENTS 1834-1962t l962o STATED * A HISTORICAL SUMMARY OF UoSo-KORE DEVELOPMENTSol96l-65o1968o ECOSOC * CAPITAL PUNISHMENTt PTtl REPTo1960tP DRUG ABUSE CONTROLo HEWFDA * BOAC BULLETIN ISSUED BY THE BUREAU OF DRUG PROJECTt PHS PUBo 1965t REVo 1968o HEWPHS * THE CORONARY DUTIES• GUARDIANSHIPo 1968o WOMENP * ARENTAL RIGHTS AND EAST AS l A. JNTP..UJL! _ _T_RAI'iS.L.A .. LloN_S_Qtl_. SOUTH_.+ __ __ ·~ -~=c=;;:---:T.=c:-=---::--=:,...,-;-= EAST EUROPEAN AGRICUL TURF t FORESTRY + FOODt I NDUSTRI ES o JNTPUB * TRANSL-,;" EAST EUROPEAN FORF.IGN TRADEo JNTPUB * TRANSLATIONS ON EASTERN EUROPEo JNTPUB * POLITICAL TRANSLATIONS ON EASTERN EUROPEo JNTPUB * SOCIOLOGICAL TRANSLATIONS ON EASTERN EUROPE o JNTPUB * TRANSLATIONS ON ECONOMIC ORGANIZATION AND MANAG EASTERN EUROP~-L-STATE!L!'_t;_~_Qi}._t;I.§_E_?_W!TH THE. s_ovg,T .UNION .• + .•• . ·----- EASTo JNTPUB * TRANSLATIONS ON THE NEAR EAST, LIBCON * ACCESSIONS Ltsro MIDDLE Fig. 2. KWIC Index. 144 Journal of Library Automation Vol. 2/3 September, 1969 The copies were usable only on the temporary basis for which they were intended. Pages ripped easily in the binders. The printing on copies two and three, which were carbon copies, smudged readily. Production of more permanent copies of the list was deferred until the catalog should be more complete. Because of the preliminary nature of this project, no specific time ac- counting was made. There was an attempt to increase student assistant duties in order to save regular staff time. The Librarian annotated Super- intendent of Documents shipping lists to indicate which items required new punched cards. She omitted, for example, journals that were entered once with an open holdings statement. After annotation, punched cards for United States depository documents were made by student assistants who had been previously introduced to the allocation of columns and to punching procedures. For non-depository United States and United Na- tions publications, the Librarian mapped cards on 80-column format sheets. In the production of part two, the KWIC index, staff was involved only to make cross references. Since the KWIC program had been designed to make entries for all words in the title field other than the dictionary of exceptions, cross references had to be interfiled manually after the KWIC entries had been made and alphabetized. The cost of materials for the addition of 100 entries to the list is tabu- lated below: Materials IBM cards ( 800) Print-out paper ( 8 sets) IBM 1620 computer rental ( 4 minutes) Costs $ .80 + freight .32 + freight 1.68 $ 2.80 + freight There was no charge for the use of the keypunch (IBM 026), sorter (IBM 082), and Accounting Machine (IBM 407), nor was the Library charged by the computer center persom:iel who wrote the program. For the first 100 items, all cards were duplicated in production as insurance against destruction (thus the card expense itemized above was doubled). The duplicate deck was later eliminated because the time spent in dupli- cating and interpreting these cards was greater than that required to re- punch the deck from the list entries. Storage space was available without cost, and no new storage equipment was purchased. The KWIC program was written so that keyword cards were made for all words in the title field except listed exceptions, single letters and in- tegers. It seemed at the inception of the project that such a program, which allowed untrained assistants to punch cards with a minimum of difficulty, was preferable to one that involved tagged keywords. However, ' the necessary filing and removal of cross references subsequently proved an inconvenience when the list was updated and reprinted. KWIC Index 145 DISCUSSION The productivity of government publishers has directed so much mate- rial into the library that ordinary procedures have been overtaxed. Card catalog entries, for example, have become tardy, cumbersome, and in- comprehensible to the Library user and expensive for the Library. The KWIC list was designed as a substitute; however, it was useful only where the subject of a publication had been fairly reflected by its title. The possibility of incorporating descriptors in the title field of the list was considered, but rejected in the interests of speed and efficiency. The list depended upon standard reference sources for more complete subject and analytic cataloging. Most often used in the case of United States publica- tions were the Superintendent of Documents Monthly Catalog (9) and its auxiliaries such as the Bureau of the Census Catalog ( 10). Other sources included: Wilson, Popular Names (11), the Readers Guide, the Social Science and Humanities Index, the Business Periodicals Index, the Index to Legal Periodicals, and the Commerce Clearing House Index. For the United Nations publications, greater use was made of the trade publications such as the Periodic Check List (formerly the Monthly Sales Bulletin), the International Reporter, the UNESCO Catalogue, and the Publishers Trade List Annual section for "UNESCO." The KWIC index also was limited in that it covered only documents in the Library's collection. While the user was convenienced by the ready availability of all items listed, he was obliged to consult reference sources for other existing documents. The new tool had advantages similar to book catalogs in terms of space saving and ease of duplicating. Although originally only three copies were made, the possibility of duplication and distribution of this list to inter- ested academic departments had been considered. It was also intended that new punched cards would be used to produce lists of new acces- sions, which would be duplicated and circulated. The problem of updat- ing involved reprinting of part two, the KWIC index, after previous inter-alphabetizing of entries. Part one, however, was not reprinted as new entries were added successively. Corrections were made by dupli- cating parts of cards and punching where necessary. The availability and cun-entness of such a list would presumably have encouraged the faculty and students to make greater use of these materials, and eliminated dupli- cation of purchase orders. A major drawback to the list was that its arrangement, by accession munbers, bore no particular logic. A classification number an-angement would have been more meaningful to the reader; it would also have served as a shelf list and provided material for subject holdings lists. However, the IBM cards were not so arranged because neither the mechanical nor manual sorting of multi-digit and letter numbers was practical. Arrange- ment by Superintendent of Documents numbers was employed at North- 146 Journal of Library Automation Vol. 2/ 3 September, 1969 eastern University, Boston, Massachusetts, and proved so inadequate that the Librarian added subject headings to documents punched cards. This extra time-consuming step, plus the need to manually file punched cards, influenced the author to abandon shelf list order. A second difficulty involved in the KWIC project was the dependence of the Library upon use of equipment owned by another agency. It was conceivable that alterations in the equipment, policies or personnel of the University Computer Center could enforce changes on the Library's listing procedure. This evaluation of the KWIC index excluded considerations of the sepa- ration of the Documents and Reference Departments. This matter has been thoroughly discussed elsewhere ( 12) . Two other subjective consid- erations appeared during the first year of operation. Most serious was the estrangement between Documents and Reference Personnel. Since both Departments served the public, and their material was distinguished only by publisher, each staff relied extensively upon the other. Cooperation and acquaintance with library material was difficult to maintain in two separate Departments. Because Documents staff were primarily public service personnel, their extensive involvement in technical processes was not an efficient use of staff expertise. On the other hand, complete respon- sibility for this portion of library holdings insured that the staff became thoroughly acquainted with the collection and were better able to serve the public. CONCLUSION The KWIC index to government publications at Brandeis is difficult to evaluate before the tests of time and use have been made. The system was suitable for the University Library in that it was frequently consulted by the same, relatively sophisticated, users who were eager to familiarize themselves with library material. The KWIC list itself emphasized cur- rentness and flexibility at the expense of detailed subject access. This system attempted to utilize a potential goldmine of material without major investment or upheaval in the Library. It has been sufficiently resilient to withstand a complete change of Department personnel and was suc- cessful enough so that the possibility of expansion is being considered. NOTE This report described the Documents Department as it functioned at its inception in September, 1967. The author left Brandeis Universityin June of 1968. The scope of the Documents list was changed in September 1968 to include all United States and United Nations publications ac- quired by the library. Any inquiries about the present system should be directed to the current Documents Librarian: Mr. Michael Abaray, Gold- farb Library, Brandeis University, Waltham, Massachusetts 02154. KWIC Index 147 REFERENCES 1. "Report on the Sixty-Ninth Meeting of the Association of Research Libraries, New Orleans, La. 1/8/67," LC Information Bulletin, 26 (January 26, 1967), 70. 2. Kane, Rita: "The Future Lies Ahead: the Documents Depository Li- brary of Tomorrow," Library Journal, 92 (November 1, 1967), 3971- 3973. 3. Morehead, Joe: "United States Government Documents-A Mazeway Miscellany," RQ, 8 (Fall1968), 47-50. 4. Janda, Kenneth ed.: "Advances in Information Retrieval in the Social Sciences," American Behavioral Scientist, 10 (January and February 1967). 5. Sternberg, V. A.: "Miles of Information by the Inch at the Library of the Bettis Atomic Power Laboratory, Westinghouse Electric Cor- poration," Pennsylvania Library Association Bulletin, 22 (May 1967), 189-194. 6. Lawson, Constance : "Report Documentation at Texas Instruments, In- corporated," Special Libraries Association, Texas Chapter Bulletin, 15, (February 1964), 14-17. 7. Minton, Ann: "Document Retrieval Based on Keyword Concept," Special Libraries Association, Texas Chapter Bulletin, 15 (February 1964)' 8-10. 8. Bauer, C. B.: "Practical Application of Automation in a Scientific In- formation Center-A Case Study," Special Libraries, 55 (March 1964), 137-142. 9. United States. Superintendent of Documents: Monthly Catalog of United States Government Publications (Washington : Government Printing Office, 1895- ) . 10. U. S. Bureau of the Census : Bureau of the Census Catalog ('Wash- ington: Government Printing Office, 1945- ). 11. Wilson, Donald F. and William P. Kilroy, comps.: Popular Names of United States Government Reports, a Catalog (Washington: Govern- ment Printing Office, 1966). 12. Shaw, Thomas Shuler, ed.: "Federal, State and Local Government Publications," Library Trends, 15 (July 1966), 3-194. 4660 ---- 148 TELECOMMUNICATIONS PRIMER Joseph BECKER: Vice-President, Interuniversity Communications Council (EDUCOM), Bethesda, Maryland A description of modern telecommunications devices which can be useful in inter-library communications, including their capacities, types of signaL<; and carriers. Described are telephone lines, radio broadcasting, coaxial cable, microwave and communications satellites. This article, and the one following, were presented as tutorials by the autho1's to participants at the American Library Association's Atlantic City Convention on June 25, 1969. As greater emphasis is placed on the development of regional and national library network programs to facilitate interinstitutional services, a concom- mitant requirement emerges to understand and apply communications technology. A great variety of communications methods has been used for interlibrary communications in the past, ranging from the simplest use of the U.S. !Ilails up to the telephone, the teletype, the radio, and even experiments with microwave telefacsimile transmission. Of all the different kinds of equipment used by libraries for inter- library communications, the one which has received widest acceptance for its practical value and immediate usefulness is the teletype machine. The earliest use of the teletype machine can be traced back to the Free Li~ brary of Philadelphia, which in 1927 used the teletype as part of a closed circuit system for communicating book information from the loan desk in the main reading roo'm to the stacks and vice versa. Following World Telecommunications Primer 149 War II, an installation connecting distant libraries yvas established in Wis- consin between the Milwaukee Public Library and the Racine Public Library. Racine's limited collection was considered inadequate to the demands of its patrons and its Director, instead of increasing the book budget sig- nificantly, negotiated an access arrangement with the larger collection at Milwaukee via teletype. Daily messenger service was instituted between the two libraries to effect pickup and delivery of library materials. The teletype machine enabled the two libraries to have the speed of the telephone with the authority of the printed word. This advantage con- tinues today and can be considered mainly responsible for the prolifer- ation of t eletype communications for interlibrary loan. Teletype communi- cations between and among libraries are beginning to emerge in both in- formal and formal network configurations. In addition to its obvious appli- cation to interlibrary loan, teletype has also been used to augment library holdings on a reciprocal basis, to provide for general communications with other libraries, to serve as a channel for querying union catalogs, to ac- commodate reference questions and services, and to handle internal com- munications. Perhaps the most important benefit to accrue to users of library teletype service is the ability to communicate immediately with any other teletype user anywhere in the world. Thus, it becomes possible for any participant in the teletype network to communicate reference inquiries to information points outside the formal network. (A classified teletype directory exists which lists library subscribers in the United States and Canada.) As ref- erence demands increase, it is likely that libraries wpl begin to make wider use of the teletype machine even though it may have been acquired initially for a more limited purpose. In addition, expanded uses in the future are a virtual certainty both because of the low cost of teletype operation and because of the technical improvements in the equipment itself. Although the advantages of other means of telecommunication have been known to libraries for many years, their utilization has been retarded by problems of cost and systems planning. However, in recent years, as libraries have made greater use of computers and as they h ave moved towards new programs of interlibrary cooperation and resource sharing, interest in telecommunications in general has grown more intense. The purpose of this article, therefore, is to provide a brief explanation of the fundamentals of communications technology in order to establish a basis of understanding for current and future library planning. TELECOMMUNICATIONS CAPACITY Telecommunications may be simply defined as the "exchange of infor- mation by electrical transmission over great distances." For the past forty years, the United States, through its commercial carriers, the Bell Tele- 150 Journal of Library Automation Vol. 2/ 3 September, 1969 phone System and Western Union, has built an increasingly effective system of wires, trunk stations, and switching centers for the transmission of human speech from point to point. The telephone network is a techno- logical marvel despite the occasional busy signal one gets on the line. However, with the increasing use of computers and television in science, business, and industry, this network is being asked to carry digital and video signals in addition to voice, and its facilities are fast becoming over- loaded. In the library field one can observe the trend toward use of machine readable data and non-print materials. These are but a few ex- amples of library data forms that one will wish to communicate between and among libraries. Voice can be efficiently transmitted over telephone lines, but data, like the digital language of the computer or the video language of the television camera and facsimile scanner, need a broader band-width for their efficient transmission than the narrow-band-width telephone line can provide. Band-width is a measure of the signal-carrying capacity of a communications channel in cycles per second. It is the numerical differ- ence between the highest frequency and the lowest frequency handled by a communications channel. The broader the band, the greater the signal transmission rate. The tens of thousands of bits which make up a computer message or TV picture, if sent by telephone, have to be squeezed through the narrow line over a longer period of time to transmit a given message. This consumes telephone capacity that would normally be used to carry other conversations. A good example of the problem can be illustrated with the "picture-phone". This is the telephone company service now be- ing tested which permits a caller to see and hear the other person at the distant end. The two-way picture part of this dialogue requires more than 100 times more telephone transmission capacity than the voice portion. There are 100,000,000 telephones in the U.S. today. Thus, if only 1% of the subscribers had picture phones we would theoretically exhaust our national telephone capacity for any other use. · Fortunately, the problem of telecommunications capacity is not without solution. New channels of communication are being opened that do pro- vide capacity for broad band-width exchange. The new technology of laser communications, for example, stands in the wings with a long-range answer. The word LASER stands for Light Amplification by Stimulated Emission of Radiation. Its theoretical beginnings go back half a century, but fifteen years ago scientists working in high-energy physics learned how to amplify high-energy molecules so as to produce a powerful, narrow, co- herent beam of light. This strange kind of light remains sharp and co- herent over great distances and can therefore be used as a reliable chan- nel or pipe for telecommunications. All other long-distance transmission systems tend to spread or disperse their signals, but laser beams provide a tight, confined highway over which signals can travel back and forth. Telecommunications Primer 151 A few years ago seven New York television channels, in an experiment, transmitted their programs simultaneously over the same laser beam. In terms of telephone conversations, one laser communications system could theoretically carry 800,000,000 voice conversations! The intense pencil-thin laser beam is so powerful and reliable that it can and is being used as a communications channel for space exploration. The Apollo 11 astronauts left a laser beam reflector on the Moon's surface to facilitate future com- munications experiments. TYPES OF SIGNALS There are three principal types of signals that telecommunications sys- tems are designed to carry: 1) Audio-originating as human speech or recorded tones and transmitted over conventional telephone lines. 2) Dig- ital-originating with computers or other machines in which data is en- coded in the binary language. The data, instead of being represented as zeros and ones, take the form of an electrical pulse or no pulse. 3) Video -originating with TV recorders, facsimile scanners, or other devices which change light particles into electrical energy in the form of small, discrete bits of information. Each of the three types of telecommunication signals is associated with a telecommunication channel that can carry it most efficiently. Audio, of course, was designed to travel over the telephone line. How- ever, it can be carried just as well over the broader band-width channels. Digital and video signals are carried over the wider band-width channels because of the great number of bits that must be accommodated per unit of time. Sending computer data or pictures over telephone lines is possible if data phones are used; they convert digital and video data to their tone equivalents at the transmission end and reconvert them at the receiving end. This is, however, a very slow process and from a communi- cations viewpoint it is most inefficient. When reference is made to "slow scan television.. it means that the video signal is being carried over a telephone line. Library experimentation with telefacsimile has by and large been restricted to transmission of the facsimile signals over telephone lines. An 8"x10" page carried by telephone lines takes about six minutes, as compared to 30 seconds if it were sent over a broad band-width channel. A telecommunications system used for library purposes will eventually need to integrate audio, digital, and video signals into a single system. This integrated media concept is an important aspect of the design of an interlibrary communications system but it is poorly understood analyti- cally in today's practice. The idea of an "integrated telecommunications system" became practical only during the past few years and commercial and governmental efforts are underway to provide these unified facilities as rapidly as possible. 152 Journal of Library Automation Vol. 2/3 September, 1969 SIGNAL CARRIERS A number of methods exist by which audio, digital, and video informa- tion can flow back and forth for information exchange purposes. These telecommunications facilities are furnished for lease or private line use by the commercial carriers. A dedicated system may also be installed for the sole use of a particular customer. For example, the U.S. Government has more than one dedicated system: the Federal Telecommunications System (FTS ), which is available for official use only by civilian agencies; and it has similar dedicated facilities for use by the military. Large com- panies, such as General Electric, Weyerhauser, and IBM, have exclusive- use telecommunications systems also. In all cases, however, private or dedi- cated systems are planned in such a way that they interface smoothly with commercial dial-up facilities-thus increasing the overall distributive capacity of any one system. As might be expected, the tariff structure for these combined interconnections is very complex. The Federal Com- munications Commission is reviewing the overall question of cost for voice and data communication and is also investigating the policy issues raised by the growing interdependence of computers and communications. Technically speaking, there are five means by which audio, digital, and video signals may be carried to their destination and returned: by tele- phone line, by radio, by coaxial cable, by microwave relays, and by com- munications satellite. An explanation of each is given below and they are presented in ascending order of their band-width capacity. Telephone Lines The telephone as a means of communication is beyond compare. It is simple, quick, reliable, accurate, and provides great geographic flexibility. Quite ofien the telephone can supply all the communications capability required for an information system, especially when it is coupled with the teletypewriter. A good toll quality telephone circuit has a frequency response of about 300-3400 cycles, which is adequate to supply good quality and a natural sounding voice. Regular telephone lines are referred to as narrow band calTiers because of the low cycle range needed to carry human speech. Radio Broadcasting As the word "broadcasting" implies, signals are radiated in all directions and the omnidirectional antennas which are used in radio broadcasting are designed to have this effect. Frequencies used are 500 to 1500 kilocycles for AM (amplitude modulation), and 88 to 108 megacycles for FM (frequency modulation). The number of radio waves that travel past a point in one second is called the frequency. The number of waves s~nt out by a radio station each second is the frequency of that station. One complete wavelength is called a cycle. A kilocycle is one thousand cycles and a megacycle is one million cycles. Broadcasting, in general, is used Telecommunications Primet· 153 as a one-way system. Any radio or TV set equipped to receive certain frequencies can tune in to a particular station or channel. Low-frequency systems, in the kilocycle range, require less power to operate. The signals are propagated close to the ground and the effective radius of reception ts small. With ultra-high frequency, vast distances can be covered by striking upper layers of the atmosphere and having the signal deflected to earth; this can happen more than once before the signal is received. High-frequency systems, however, are subject to atmospheric interference, which causes fading. Coaxial Cable (and CATV) A remarkable extension of the carrier art was provided by the develop- ment of the coaxial cable. Within the sheath of most coaxial cables are a number of copper tubes. Within each tube is a copper wire, supported by insulating disks spaced one inch apart. The name coaxial reflects the fact that both the wire and the tube have the same axis. Coaxial cables can carry many times the voice capacity of telephone lines and are thus considered to be broad band-width carriers able to accommodate digital and video data with equal efficiency. The coaxial cable has the additional advantage that the electrical energy confined within the tube can be guided directly to its destination, instead of spread- ing in all directions as is the case in radio broadcasting. To provide necessary amplification along the route, repeater stations are placed at designated intervals. Repeater stations are unnecessary, how- ever, within a half-mile radius and many libraries, planning new buildings, are including special ducts to accommodate known or potential require- ments for communication between computer units, terminals, dial access stations, etc. The technology of Community Antenna Television (CATV) incorpo- rates extensive use o.f coaxial cables. CATV operates very similarly to the way a closed circuit television system works. A company in a locality sets up a powerful receiving antenna capable of importing television signals from many cities hundreds of miles away. On a subscription basis (about $6.00 per month), it will run a coaxial cable from the receiving station to the subscriber's home. Subscribers benefit in several ways: 1) the in- coming signals are sharper and clearer because there is no atmospheric interference; 2) a roof-top antenna is unnecessary; 3) more channels are available than a local TV station normally provides (some CATV stations already offer the potential of 20 channels); and, 4) CATV stations have close interrelationships with Educational Television Stations (ETV) and by law are required to make available to subscribers at least one channel for .. public service" and .. educational" purposes. The latter benefit has special implications for libraries. School libraries in a town or city where CATV is proposed might well inquire whether the operator is willing to provide a school library programming service. 154 Journal of Library Automation Vol. 2/3 September, 1969 It is hardly possible to predict what effect CATV and its coaxial cables will have on libraries. It is clear, however, that many homes will soon have coaxial cables as well as telephone lines, and this implies a new capability for bi-directional broad band-width information exchange. At- tachment of a coaxial cable from a CATV trunk station to the horne pro- vides an electronic pathway 300 megacycles wide. The telephone line is only 4000 cycles wide. Since a megacycle is one million cycles, the relative practical difference in an operational environment is in the order of 50,000:1. It is this significant difference that causes some people to suggest that advanced telecommunications will someday bring newspapers and books into the home by electronic facsimile, along with computer infor- mation from data banks, individualized instruction from schools, and a much greater variety of educational materials. Microwave The term microwave applies to those systems where the transmitting and receiving antennas are in view of each other. The word is not very definitive but generally describes systems with frequencies starting at 1000 megacycles and extending up to 15,000 megacycles, a range which includes the ultra- and super-high frequency bands of the radio spectrum. Microwave is, therefore, without question, one of the larger broad band- width carriers. Microwave systems are used to transmit data and multi- channel telephone or video signals. Antennas are in the form of parabolic dishes mounted on high towers and lined up in sight of each other. These antenna produce very sharp beams to minimize power requirements. Since microwaves do not bend, transcontinental microwave systems consist of relay towers spaced at approximately thirty-mile, line-of-sight intervals across the country. Because of the earth's curvature, transoceanic micro- wave systems are hardly possible without a repeater station. It is this limitation which helped give rise to the development of the communica- tions satellite. Many state governments have, or are planning, private microwave sys- tems for handling the mix of official, internal communications. Here again, state libraries might investigate the use of such systems for interlibrary communications. Communications Satellites The newest and most promising telecommunication development is the communications satellite. A communications satellite is an object which is placed in orbit above the earth to receive and retransmit signals re- ceived from different points on earth. A communications satellite is launched by a conventional rocket, which sends it into an eliptical orbit with a high point, or apogee, of about 23,000 miles and a low point, or perigee, of 195 miles. On command from earth, a small motor aboard the satellite is fired Telecommunications Primer 155 just as the satellite reaches the high point of its orbit. This action thrusts the satellite into a circular path over the equator at an altitude of approxi- mately 22,300 miles. Subsequently, the satellite's orbital velocity is then synchronized with the speed of the earth's rotation. Thus, a satellite in synchronous equatorial orbit with the earth appears to remain in a fixed position in space. Three satellites can cover the globe with communica- tions except for the north and south poles. Or the antennas can be squinted to focus exclusively on one country or on part of a country. Early Bird's antenna was positioned to cover Europe and the northeastern part of the United States, thus making it possible to link North America with Europe. A satellite is not very large; Early Bird, which is still operating, is about seven feet in diameter. It contains a receiver to catch the signal, an am- plifier to increase the signal's intensity, and a transmitter. Signals received from one earth station on one frequency are amplified and transmitted on another frequency to a second earth station. The satellite receives light energy from the sun, and its solar batteries convert it into electrical energy for transmitting power. Communications satellites are, in essence, broad band-width signal re- peaters whose height enables them to provide coverage over a very large area. They can be "dedicated"; that is, designed for a single class of serv- ice, such as television relay; or they may be multipurpose and integrate a mix of different signals at the same time. Generally, we tend to think of satellites as an extension of satellite broadcasting, mainly because most of their use up to now has been for television broadcasting. However, the enormous band-width capacity which they possess also makes them very attractive channels for two-way voice and picture applications for education, business, and libraries. Within the next decade, domestic com- munications satellites will be available as "switchboards in the sky" for just such uses. CONCLUSION Libraries, like other institutions in our society, have learned the hard way that the new technology must be treated as an opportunity and not as a panacea. The same is true of telecommunications. Before telecom- munications can be applied effectively to interlibrary functions and serv- ices, many non-technical problems have to be solved. Librarians must answer questions such as: How shall we organize our libraries to make optimum use of the advantage of telecommunications? What segment of our information resources and daily library business should flow over these lines? Will our users accept machines as intermediates in the information exchange process? How can the copyright principle be safeguarded if libraries expand their interinstitutional communications? And, of course, how do we measure cost/ effectiveness before moving ahead with an oper- ating program? To provide answers professional librarians must become more familiar with telecommunications technology and principles. 156 Journal of Library Automation Vol. 2/3 September, 1969 BIBLIOGRAPHY 1. Becker, Joseph: "Communications Networks for Libraries," Wilson Li- brary Bulletin, 41 (December 1966), 383-387. 2. Gentle, Edgar C.: Data Communications in Business: An Introduction (New York: American Telephone and Telegraph Company, 1965 ), 200 p. 3. Kenney, Brigitte L.: A Survey of Interlibrary Communications Sys- tems (Jackson, Mississippi: Rowland Medical Library, April1967), 74 p. Prepared for the National Library of Medicine under NIH Contract No. PH-43-67-1152. 4. Library Telecommunications Directory: Canada- United States. 2d edition, revised. (Toronto and Durham: 1968). 5. U. S. President. Task Force on Communications Policy: Final Re- port. (Washington, D. C.: U.S. Government Printing Office, Decem- ber 1968). 4661 ---- 157 LIBRARY NETWORK ANALYSIS AND PLANNING (LIB-NAT) Maryann DUGGAN: Director, Industrial Information Services Program, Southern Methodist University, Dallas, Texas A preliminary report on planning for networl< design undertaken by the Reference Round Table of the Texas Library Association and the State Advisory Council to Library Services and Construction Act Title III Texas Program. Necessary components of a network are discussed, and network transactions of eighteen Dallas area libraries analyzed using a methodology and quantitative measures developed fm· this project. To be a librarian in 1969 is to stand at the crossroads of change, with a real opportunity to put libraries and professional experience to work on immediate problems of today's world. In mobilizing total library resources for effective service to a variety of patron groups in a variety of ways, the librarian has at hand an exciting new tool of great potential and equally great challenge: the library network LIBRARY NETWORKS AND REFERENCE SERVICES Networks and all that they imply are simply an extension of good ref- erence services as they have been practiced for years, but their existence and potential capability require redefinition of the reference function, which, being no longer limited to one collection, has been given new dimensions of time, depth and breadth. Networks, and the inter-library cooperation they require, offer an op- portunity to combine materials, services and expertise in order to achieve more than any one library can do alone. In this case, the whole is greater than the sum of its parts, for each library can offer its particular patron group the total capability of the network, including outside resources not previously available. 158 Journal of Library Automation Vol. 2/ 3 September, 1969 With the new tool of library networks, it is possible to provide respon- sive, personalized, in-depth reference service, and to provide it so rapidly that a patron can receive a pertinent bibliography covering his desired topic within an hour of his original inquiry. The reference librarian be- comes an expert in resources and resource availability at the national level. His reference desk becomes a switching center, at which he receives and analyzes inquiries, decides the level of service required, identifies available sources or resources that match an inquiry, transmits the latter ( restruc- tured to be compatible with the network language), conducts a dialog with the source, receives the response and interprets it to the patron. This procedure is not markedly different from what has been done for years in any reference library, but with greater potential the process must be more formalized and structured. Networks do require new expertise and crystallizing the reference phi- losophy. Clarification is needed as to 1) types or levels of reference serv- ices, and unit operations in reference services; 2) the role of in-depth subject analysis of reference queries; 3) decisions on alternate choices of sources and of communications links; 4) structuring of large blocks of resources to permit fast access; and 5) the role of each library in the network and its responsibility to the network. APPROACH TO NETWORK DESIGN The Reference Round Table of the Texas Library Association and the State Advisory Council to Library Services and Construction Act Title III Texas Program have been struggling with the challenge of inter-library network design for the past two years. This paper is written to share with reference librarians some of their preliminary findings and to urge the involvement of reference librarians in planning and developing networks and network parameters. For identification the project herein described is referred to as Lib-NAT, for Library Network Analysis Theory. Although only the author can be blamed for any faults of this "theory," many persons have contributed to the development of it. The Reference Round Table of the Texas Library Association has provided the forum for exploring and developing ideas on inter-library cooperation. Title III of the Library Services and Construction Act has provided the legal and financial impetus enabling the field testing of some of those ideas. Texas Chapter, Special Libraries Association, has sparked and catalyzed ideas and clarified needs. The State Technical Services Act provided the vehicle for experimental development of new approaches to reference services. Southern Methodist University provided the haven and ivory tower from which these new approaches could be tried under the cloak of academic respectability. But, of greatest importance of all, individual librarians, with vision and desire to be of service and willingness to try new things, have been the driving force in helping to develop new concepts of library use and purpose in the Texas area. Lib-NAT 159 The basic philosophy back of Lib-NAT is simply that any person any- where in the State of Texas should have access to any material in any library anywhere in the State through a planned, orderly, effective system that will preserve the autonomy of each library while serving the needs of all the citizens of the State. Particular needs of special user groups (such as the blind or the accelerated student or the industrial researcher) should also be identified and provided for in a cooperative mode through local libraries throughout the State. NETWORK COMPONENTS In the process of developing Lib-NAT, twelve critical components were identified that are essential to orderly, planned development of the objec- tives stated above. As a minimum, such a network must have the following: 1) Organizational structure that provides for fiscal and legal responsi- bility, planning, and policy formulation. It must require commit- ment, operational agreement and common purpose. 2) Collaborative development of resources, including provision for co- operative acquisition of rare and research material and for strength- ening local resources for recurrently used material. The develop- ment of multi-media resources is essential. 3) Identification of nodes that provide for designation of role speciali- zation as well as for geographic configuration. 4) Identification of primary patron groups and provision for assign- ment of responsibility for library service to all citizens within the network. 5) Identification of levels of service that provide for basic needs of patron groups as well as special needs, and distribution of each service type among the nodes. There must be provision for "refer- ral" as well as "relay" and for "document" as well as "information" transfer. 6) Establishment of a bi-directional communication system that pro- vides "conversational mode" format and is designed to carry the desired message/document load at each level of operation. 7) Common standard message codes that provide for understanding among the nodes on the network. 8) A central bibliographic record that provides for location of needed items within the network. 9) Switching capability that provides for interfacing with other net- works and determines the optimum communication path within the network. 10) Selective criteria of network function, i. e., guidelines of what is to be placed on the network. 11 ) Evaluation criteria and procedures to provide feedback from users and operators and means for network evaluation and modification to meet specified operational utility. 160 Journal of Library Automation Vol. 2/ 3 September, 1969 12) Training programs to provide instruction to users and operators of the system, including instruction in policy and procedures. The foregoing components of the ideal inter-library network (one so designed that any citizen anywhere in the state can have access to the total library and information resources of the state through his local li- brary) may be considered the conceptual model, or the floor plan from which the network of the program can be constructed. Although these twelve components might be labeled "ideal," they are achievable and they are within reach of the present capability of all libraries today. They have also weathered the unrelenting critique of 288 reference librarians in the March 27, 1969, TLA Reference Round Table ("The 1969 Reference Round Table Pre-Conference Institute: An Overview," Texas Library Journal, Vol. 45 (Summer 1969), No. 2.). During that Reference Round Table the twelve components were tested in a simulated network, using 42 cases. In this behavioral model actual, current inter-library practices were observed during game-playing in the simulated network. The expe- rience verified that the components outlined above are essential to the development of planned, cooperative, inter-library systems. ANALYSIS OF NETWORK TRANSACTIONS As part of the LSCA Title III project, and to test the twelve compo- nents, exploration was instituted into the existing inter-library relations among eighteen libraries of all types in the Dallas area to see how cur- rent practices compared with the ideal conceptual model The essential minimum requirement of a library is document transfer, i. e., the ability to supply a known item on request; and on-going inter-library loan trans- actions are a valid indicator of emerging network patterns in the current environment. This microscopic study of 1967 individual library loans among eighteen libraries of different types has provided a wealth of insight into network developments. As a pilot model it has offered a means of observing and studying existing practices, identifying problems, and experimentally eval- uating the effect of changes in the system or environment. More must be known about on-going inter-library transactions for the design of improved networks. In the attempt to find out who was attempting to borrow what from whom and how successfully requests were filled, the following vari- ables were considered: 1) Type of library, both borrowing and lending, such as academic, public, special, or public school. 2) Type of message format, i. e., telephone, TWX, TELEX, letter, or interlibrary loan. 3) Type of item requested in the transaction, such as monograph, serial, map, document. 4 ) Geographic location of borrowing and lending library, i. e., local, area, state, regional, national or international. Lib-NAT 161 The complexity of even a small pilot model required the formulation of some rigor in the analysis and the development of analytical tools and symbolic models. Figure 1, for example, is a symbolic model that permits comparison of two variables simultaneously, e. g., the type of library par- ticipating in the transactions and the geographic level of the participants. For modeling purposes, it was assumed all libraries fall into one of four 1 = Local 3 = State 2 = Area 4 = Re ion SWITCHING CENTERS Fig. 1. Symbolic Model of Inter-Library Networks. classes represented by the quadrants in Figure 1. Also it was assumed that each library can be identified as to a specific geographic level, as indicated by the numbers 1 through 6. In the analysis of the pilot model data it was observed that transactions occur among libraries of the same type and at the same geographic level, and between libraries of different types at different geographic levels. Figure 1 provides a symbolic model for conceptualizing these various types of transactions. Switching centers, represented on Figure 1 by the circles around the geographic numbers, participate in transactions at varying geographic levels, as well as between and among various types of library sectors. The role and the location of switching centers is an important aspect of Lib-NAT. 162 Journal of Library Automation Vol. 2/ 3 September, 1969 Within the framework of the symbolic model, the simple form of inter- library loan may be represented as a two-body transaction between the borrowing library and the lending library, as shown in Figure 2. Applying these transactions on the symbolic model of Figure 1 and considering both A B Fig. 2. Two-Body Transaction. type of library and geographic level, four general classes of two-body transactions can be identified: 1 ) Homogeneous vertical, i. e., between two libraries of the same type but at different geographic levels (Pt _..,.. P~; St _..,.. Sa) ; 2) Heterogeneous horizontal, i. e., between two different types of libraries at different levels ( Pt _..,.. A1; St _..,.. P1); 3) Heterogeneous vertical, i. e., between two different types of libaries at different levels (Pt _ ..,.. A4; sl _..,.. PG); 4) Homogeneous horizontal, i.e., between two libraries of the same type and the same geographic level (Pt _..,.. Pt; S2 - .... S2). The formulas serve as a shorthand symbolic representations of some typi- cal transactions of these four classes. The final report on Lib-NAT will contain statistical data on distribution of pilot model transactions by type and by geographic level, showing type interdependency and geographic dependency or self-sufficiency. Further analysis of the pilot model data revealed another type of trans- action, the three-body transaction, in which a third agent becomes in- volved. The third agent may act as a referral center, as illustrated in Figure 3, or as a relay center, as illustrated in Figure 4 ( SW indicates switching center) . Part of the Lib-NAT theory specifies that there is a dis- tinction between referral and relay, and that the latter is a valid function of a true switching center. Figure 5 illustrates the various types of possible three-body transactions with different geographic levels of switching among the different types of libraries. Which of these transactions is the most efficient or has the greatest utility is one of the basic design param- eters needing further analysis. It should be noted that the variable, of message format, that is, the channel of communication or type of com- munication link, has not yet been investigated in the symbolic modeling of these transactions. Lib-NAT 163 ..... .... A ... B c ~ .4~ t Fig. 3. Three-Body Transaction: Referral. ... .... A sw B Fig. 4. Three-Body Transaction: Relay. At • SWt ., St At • SWs ~ A4 Ps • SWs • Pt sc1 ~sw2 .... sc2 p2 .. P1 ~sw3 .. p3 p2 ... P1 ~sw3 ~ A3 p2 • P1 ~sw3 .. SW1 ~s1 Fig. 5. Three-Body Transactions at Various Geographic Levels. - .. . . 'I 164 Journal of Library Automation Vol. 2/ 3 September, 1969 NETWORK CONFIGURATION Another very important design parameter is the network configuration or organizational hierarchy specifying the communication channels and message flow pattern. Figure 6 illustrates symbolically a non-directed con- figuration of communication. If each dot represents a node in the network ( i. e., a participating library), and each line represents a communication link, it can be seen that each node can communicate directly with every other node, providing (or requiring) a total of fifteen links among the six nodes . N·l C = N (-2-) =15 Fig. 6. Non-Directed Network . By contrast, Figure 7 illustrates a directed configuration to which the six nodes are interconnected through a switching center and requiring only six channel links. In like manner, if a non-directed network desires to inter- face with a specialized center, such as the Library of Congress or a spe- cial bibliographic center or search center, a total of twenty-one channels is required (Figure 8), whereas a di- rected network can interface with a specialized center via only seven channels, as illustrated in Figure 9. __ .... ------ ------ jc~N-t=sl Fig. 7. Directed Network. Lib-NAT 165 Fig. 8. Non-Directed Network Including Specialized Center. Fig. 9. Directed Network Including Specialized Center. - 166 Journal of Library Automation Vol. 2/ 3 September, 1969 As local or area networks begin to develop, there will be a need for tying together two area networks to develop larger units of service. The interfacing of an original network of six libraries in one area with an ad- joining area network of SL"( libraries will result in the network configuration shown in Figure 10 in the case of a non-directed network, and sixty-six communication links among twelve nodes will be required. Whereas, if two directed networks of six libraries each desire to interface, a type of linkage requiring only thirteen channels may be envisioned (Figure 11). Which is the best type of network configuration? What are the decision parameters that should be considered in designing or planning network configuration? How can alternate configurations be evaluated ? Alternate channel requirements? And alternate geographic levels of switching? In the pilot model study, a mathematical model has been devised which can be used for simulating various configurations and channel capacities, Fig. 10. Interface of Two Non-Directed Networks. Fig. 11 . Interface of Two Directed Networks. Lib-NAT 167 thereby permitting some desired criteria function of network performance to be maximized or optimized. The details of the mathematical model will be published as part of the final report on Lib-NAT; in the meantime it can be said that this is a fascinating area of network analysis which will be useful to any group of libraries planning network configurations. The mathematical model-a multi-commodity, multi-channel, capacitated network model, developed by Dr. Richard Nance at Southern Methodist University as part of the Title III project-promises to have a high poten- tial application in network design and performance evaluation. It does require that the librarian make some hard-nosed decisions on operational and performance parameters of the inter-library systems discussed in the preceding article, but this is part of the challenge of Lib-NAT. MEASURES OF PARTICIPATION It is obvious that types of libraries, geographic level, types of transac- tions, various network configurations, alternate communication links and switching levels are all important in planning inter-library systems. Next it is necessary to take an in-depth look at the relationship between the individual participating library and the total network. In the pilot model study of eighteen libraries a noticeable difference appeared in the magni- tude and type of participation. In surveying only the two-body transac- tions, it was observed that some libraries were primarily borrowers and others primarily lenders, and some were heavy and some light. In pursuit of a quantitative method of representing these relationships some formulae were evolved which are helpful in understanding node/network dynamics. Starting with the individual library or node, let B .. equal the number of borrowing transactions originating at that node and L,. equal the num- ber of lending transactions; then L.. plus B,. will equal the total number of all transactions at that particular node. In like manner, looking at the total network (in this case all eighteen participating libraries), let Bt equal the total number of borrowing trans- actions originating in the network and L, the total number of lending transactions; then B, plus Lt will equal the total number of both types of transactions in the network. In the analysis of node/network dynamics, it was felt there should be some way of quantitatively expressing the individual node's dependency on the total network and also a way of expressing the relative degree of activity of each node. In other words, a participating library that was a net borrower (compared to its lending) was obviously more dependent on the network than would be a library that borrowed very little compared to its lending. The extent of dependency can be expressed as a node dependency coefficient calculated as follows: B. B,. + L,. Relative amount of borrowing compared to total node transactions .. .. 168 Journal of Library Automation Vol. 2/3 September, 1969 Among its other uses, the dependency coefficient of a node may give some insight into the extent to which it should share in network expenses, but the dependency coefficient alone should not be a final criterion, since magnitude of activity is of equal importance. For developing a method of quantitatively expressing activity of a node compared to total activity of the network a factor called the node activity coefficient may be calculated as follows: Relative activity of both types at _ one node compared to total activity Be + Lt in total network Bn + Ln Then, to quantitatively express the dependency of a given node on the network, one can calculate the node/ network dependency coefficient as follows: B B+L Fig. 12. 0. 0.6 0.5 0.4 0.3 o' o I 0.2 cp I 0.1 I I I 100 Be + Le > o. 5 = Net Borrower < 0.5 = Net Lender I I I I Q I I I I I ·I I I I I 200 300 400 B+ L Node Dependency Coefficient. I I I I I I B>L1 I B
  • Q0.2241, the null hypothesis is rejected and it must be accepted that 15,123 is an outlier. Once it is determined with statistical certainty that the suspected outlier is indeed an outlier, it needs to be replaced with the median calculated from all values found in Dataset 2. For the case of Polymer, the median was calculated to be 27 from all values in table 2. Replacing an outlier with the median to accommodate the data has been proven to be quite effective in dealing with outliers by introducing less distortion to that dataset.39 Extreme values are therefore replaced with values more consistent with the rest of the data.40 January February March April May June July August September October November December Polymer 2009 27 14 35 22 15 28 24 19 11 8 13 7 Polymer 2010 12 15 26 33 38 64 39 5 13 27 109 44 Polymer 2011 113 159 638 345 52 57 94 70 39 36 221 65 Polymer 2012 130 4 98 24 27 18 13 16 18 25 9 5 Table 3. The Identified Outlier is Replaced with the Median (Highlighted in Bold). THE IMPORTANCE OF IDENTIFYING AND ACCOMMODATING E-RESOURCES USAGE DATA FOR THE PRESENCE OF OUTLIERS | LAMOTHE 37 Table 3 represents the number of full-text articles downloaded for Polymer after the outlier had been replaced with the median. The confirmed outlier of 15,123 articles downloaded recorded in October 2010 is replaced with the median of 27, highlighted in bold. This then becomes the accepted value for the number of articles downloaded from Polymer in October 2010. The outlier is discarded. The new value of 27 articles downloaded in October 2010 replaces the extreme value of 15,123 in the original 2010 JR1 Report (see table 4). This is the final step. January February March April May June July August September October November December Polymer 12 15 26 33 38 64 39 5 13 27 109 44 Surface and Coatings Technology 3 1 2 1 22 17 17 0 12 3,771 5,428 601 International Journal of Radiation Oncology 11 18 35 22 17 6,436 176 13 25 29 24 19 Journal of Catalysis 0 1 5 1 2 2 16 4 0 2 6,693 1 Table 4. Sample from a 2010 JR1 COUNTER-Compliant Report Indicating the Number of Articles Downloaded per Journal Over a Twelve-Month Period. Polymer’s Identified Outlier Is Replaced with the Median Calculated from Table 2 (Highlighted in Bold). Once the first outlier is corrected, the same procedures need to be followed for the other suspected outliers highlighted in table 1. If it is determined that they are outliers, they are replaced with their associated median values. Although the steps and calculations used to identify and correct for outliers are relatively simple to follow, it is admittedly a very lengthy and time- consuming process. But in the end, it is well worth the effort. RESULTS AND DISCUSSION Table 5 details the changes in the overall number of articles downloaded from J. N. Desmarais Library e-journals that resulted from the elimination of outliers. The column titled “Recorded Downloads” details the number of articles downloaded between 2000 and 2012, inclusively, prior to outlier testing. The column titled “Corrected Downloads” represents the number of articles downloaded during the same period of time but after the outliers had been positively identified and the data cleaned. The affected values are highlighted in bold. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2014 38 Year Recorded Downloads Corrected Downloads 2000 806 806 2001 1034 1034 2002 1015 1015 2003 4890 4890 2004 72841 72841 2005 251335 251335 2006 640759 640759 2007 731334 731334 2008 710043 710043 2009 725019 725019 2010 857360 757564 2011 869651 696973 2012 716890 716890 Table 5. Comparison of the Recorded Number of Articles Downloaded to the Corrected Number of Articles Downloaded, Over a Thirteen-Year Period. All data from all available years were tested for outliers. Only data recorded in 2010 and 2011 tested positive for outliers. Replacing outliers with the median values for those affected journal titles dramatically reduced the total number of downloaded articles (see table 5). Between 2007 and 2009, inclusively, the actual number of full-text articles downloaded recorded from the library’s e-journal collection totaled between 731,334 and 725,019 annually (see table 5). The annual average for those three years is 722,132 articles downloaded. But in 2010 that number dramatically increased to 857,360 downloaded articles, which was followed by 869,651 downloaded articles in 2011 (see table 5). The elimination of outliers from the 2010 data resulted in the number of downloads dropping from 857,360 to 757,564, a difference of nearly 99,796 downloads, or 12 percent. Similarly, in 2011, the number of articles downloaded decreased from 869,651 to 696,973 once outliers were replaced with median values. This represents a reduction of over 172,678 downloaded articles, or 20 percent. A staggering 20 percent of articles downloaded in 2011 can therefore be considered as erroneous and, in all likelihood, the result of illicit downloading. Figure 1 is a graphical representation of the change in the number of articles downloaded before and after the identification of outliers and their replacement by median values. The line “Recorded Downloads” clearly indicates a surge in usage between 2010 and 2011 with usage returning to levels recorded prior to the 2010 increase. The line “Corrected Downloads” depicts a very different picture. The plateau in usage that began in 2007 continues through 2012. Evidently, the observed spike in usage was artificial and the result of the presence of outliers in certain datasets. If the data had not been tested for outliers, it would have appeared that usage THE IMPORTANCE OF IDENTIFYING AND ACCOMMODATING E-RESOURCES USAGE DATA FOR THE PRESENCE OF OUTLIERS | LAMOTHE 39 had substantially increased in 2010 and it would have been incorrectly assumed that usage was on the rise once more. Instead, the corrected data bring usage levels for 2010 and 2011 back in line with the plateau that had begun in 2007 and reflects a more realistic picture of usage rates at Laurentian University. Figure 1. Comparing the Recorded Number of Articles Downloaded to the Corrected Number of Articles Downloaded Over a Thirteen-Year Period. Accuracy in any data gathering is always extremely important, but accuracy in e-resource usage levels is critical for academic libraries. Academic libraries having e-journal subscription rates based either entirely or partly on usage can be greatly affected if usage numbers have been artificially inflated. It can lead to unnecessary increases in cost. Since it was determined that outliers were present only during the period in which the library had found itself under “attack,” it can be assumed that the vast majority, if not all, of the extreme usage values were a result of illegal downloading. It would therefore be a shame to need to pay higher costs because of inappropriate or illegal downloading of licensed content. Accurate usage data is also important for academic libraries that integrate usage statistics into their collection development policy for the purpose of justifying the retention or cancellation of a particular subscription. The J. N. Desmarais Library is such a library. As indicated earlier, if the cost-per-download of a subscription is consistently greater than the cost of an interlibrary loan for three or more years, it is marked for cancellation. At the J. N. Desmarais Library, the average cost of an interlibrary loan had been previously calculated to be approximately Can$15.00.42 Therefore, subscriptions recording a “cost-per-download” greater than the Can$15.00 target for more than three years can be eliminated from the collection. INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2014 40 Any artificial increase in the number of downloads would have as result to artificially lower the cost-per-use ratio. This would reinforce the illusion that a particular subscription was used far more than it really was and lead to the false belief that it would be less expensive to retain rather than rely on interlibrary loan services. The true cost-per-use ratio may be far greater than initially calculated. The unnecessary retention of a subscription could prevent the acquisition of another, more relevant, one. For example, after adjusting the number of articles downloaded from ScienceDirect in 2011, the cost-per-download ratio increased from Can$0.74 to Can$1.59, a 53 percent increase. For the J. N. Desmarais Library, this package was obviously not in jeopardy of being cancelled. but a 53 percent change in the cost-per-use ratio for borderline subscriptions would definitely have been affected. It must also be stated that none of the library’s subscriptions having experienced extreme downloading found themselves in the position of being cancelled after the usage data had been corrected for outliers. Regardless, it is important to verify all usage data prior to any data analysis to identify and correct for outliers. Once the outlier detection investigation has been completed and any extreme values replaced by the median, there would be no further need to manipulate the data in such a fashion. The identification of outliers is a one-time procedure. The corrected or cleaned datasets would then become the official datasets to be used for any further usage analyses. CONCLUSIONS Outliers can have a dramatic effect on the analysis of any dataset. As demonstrated here, the presence of outliers can lead to the misrepresentation of usage patterns. They can artificially inflate average values and introduce severe distortion to any dataset. Fortunately, they are fairly easy to identify and remove. The following steps were used to identify outliers in JR1 COUNTER- Compliant reports: 1. Identify possible outliers: Visually inspect the values recorded in a JR1 report dataset (Dataset 1) and mark any extreme values. 2. For each suspected outlier identified, take the usage values for the affected e-journal title and incorporate them into a separate blank spreadsheet (Dataset 2). Incorporate into Dataset 2 all other usage values for the affected journal from all available years. It is important that Dataset 2 contain only those values for the affected journal. 3. Test for the outlier: Perform Dixon Q Test on the suspected outlier to confirm or disprove existence of the outlier. 4. If the suspected outlier tests as positive, calculate the median of Dataset 2. 5. Replace the outlier in Dataset 1 with the median calculated from Dataset 2. 6. Perform steps 1 through 5 for any other suspected outliers in Dataset 1. 7. The corrected values in Dataset 1 will become the official values and will be used for all subsequent usage data analysis. THE IMPORTANCE OF IDENTIFYING AND ACCOMMODATING E-RESOURCES USAGE DATA FOR THE PRESENCE OF OUTLIERS | LAMOTHE 41 The identification and removal of outliers had a noticeable effect on the usage statistics for J. N. Desmarais Library’s e-journal collection. Outliers represented over 100,000 erroneous downloaded articles in 2010 and nearly 200,000 in 2011. A total of 20 percent of recorded downloads in 2011 were anomalous, and in all likelihood a result of illicit downloading after Laurentian University’s EZProxy server was breached. New technologies have made digital content easily available on the web, which has caused serious concern for both publishers43 and institutions of higher learning, which have been experiencing an increase is illicit attacks.44 The history of Napster supports the argument that users “will freely steal content when given the opportunity.”45 Since web robot traffic will continue to grow in pace with the Internet, it is critical that this traffic be factored into the performance and protection of any web servers.46 REFERENCES 1. Victoria J. Hodge and Jim Austin, “A Survey of Outlier Detection Methodologies,” Artificial Intelligence Review 85 (2004): 85–126, http://dx.doi.org/10.1023/B:AIRE.0000045502.10941.a9; Patrick H. Menold, Ronald K. Pearson, and Frank Allgöwer, “Online Outlier Detection and Removal,” in Proceedings of the 7th Mediterranean Conference on Control and Automation (MED99) Haifa, Israel—June 28-30, 1999 (Haifa, Israel: IEEE, 1999): 1110–30. 2. Hodge and Austin, “A Survey of Outlier Detection Methodologies,” 85–126. 3. Vic Barnett and Toby Lewis, Outliers in Statistical Data (New York: Wiley, 1994). 4. Hodge and Austin, “A Survey of Outlier Detection Methodologies,” 85–126; R. S. Witte and J. S. Witte, Statistics (New York: Wiley, 2004); Menold et al., “Online Outlier Detection and Removal,” 1110–30. 5. Menold et al., “Online Outlier Detection and Removal,” 1110–30. 6. Hodge and Austin, “A Survey of Outlier Detection Methodologies,” 85–126. 7. Laurentian University (Sudbury, Canada) is classified as a medium multi-campus university. Total 2012 full-time student population was 6,863, of which 403 were enrolled in graduate programs. In addition, 2012 part-time student population was 2,652 with 428 enrolled in graduate programs. Also in 2012, the university employed 399 full-time teaching and research faculty members. Academic programs cover a multiple of fields in the sciences, social sciences, and humanities and offers 60 undergraduate, 17 master’s, and 7 doctoral degrees. 8. Alain R. Lamothe, “Factors Influencing Usage of an Electronic Journal Collection at a Medium- Size University: An Eleven-Year Study,” Partnership: The Canadian Journal of Library and Information Practice and Research 7, no. 1 (2012), https://journal.lib.uoguelph.ca/index.php/perj/article/view/1472#.U36phvmSy0J. https://journal.lib.uoguelph.ca/index.php/perj/article/view/1472#.U36phvmSy0J INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2014 42 9. Ben Tremblay, “Web Bot—What is it? Can It Predict Stuff?” Daily Common Sense: Scams, Science and More (blog), January 24, 2008, http://www.dailycommonsense.com/web-bot- what-is-it-can-it-predict-stuff/. 10. Derek Doran and Swapna S. Gokhale, “Web Robot Detection Techniques: Overview and Limitations,” Data Mining and Knowledge Discovery 22 (2011): 183–210, http://dx.doi.org/10.1007/s10618-010-0180-z. 11. C. Lee Giles, Yang Sun, and Isaac G. Councill, “Measuring the Web Crawler Ethics,” in WWW 2010 Proceedings of the 19th International Conference on World Wide Web (Raleigh, NC: International World Wide Web Conferences Steering Committee, 2010): 1101–2, http://dx.doi.org/10.1145/17772690.1772824. 12. Shinil Kwon, Kim Young-Gab, and Sungdeok Cha, “Web Robot Detection Based on Pattern- Matching Technique,” Journal of Information Science 38 (2012): 118–26, http://dx.doi.org/10.1177/0165551511435969. 13. David Watson, “The Evolution of Web Application Attacks,” Network Security (2007): 7–12, http://dx.doi.org/10.1016/S1353-4858(08)70039-4. 14. Eric Kin-wai Lau, “Factors Motivating People toward Pirated Software,” Qualitative Market Research 9 (2006): 404–19, http://dx.doi.org/1108/13522750610689113. 15. Huan-Chueh Wu et al., “College Students’ Misunderstanding about Copyright Laws for Digital Library Resources,” Electronic Library 28 (2010): 197–209, http://dx.doi.org/10.1108/02640471011033576. 16. Ibid. 17. Ibid. 18. Emma McCulloch, “Taking Stock of Open Access: Progress and Issues,” Library Review 55 (2006): 337–43; C. Patra, “Introducing E-journal Services: An Experience,” Electronic Library 24 (2006): 820–31. 19. Wu et al., “College Students’ Misunderstanding about Copyright Laws for Digital Library Resources,” 197–209. 20. Ibid. 21. Vincent J. Calluzzo and Charles J. Cante, “Ethics in Information Technology and Software Use,” Journal of Business Ethics 51 (2004): 301–12, http://dx.doi.org/10.1023/B:BUSI.0000032658.12032.4e. 22. S. L. Solomon and J. A. O’Brien “The Effect of Demographic Factors on Attitudes toward Software Piracy,” Journal of Computer Information Systems 30 (1990): 41–46. 23. J. N. Desmarais Library, “Collection Development Policy” (Sudbury, ON: Laurentian University, 2013), http://www.dailycommonsense.com/web-bot-what-is-it-can-it-predict-stuff/ http://www.dailycommonsense.com/web-bot-what-is-it-can-it-predict-stuff/ http://dx.doi.org/10.1007/s10618-010-0180-z http://dx.doi.org/10.1145/17772690.1772824 http://dx.doi.org/10.1177/0165551511435969 http://dx.doi.org/10.1016/S1353-4858(08)70039-4 http://dx.doi.org/1108/13522750610689113 http://dx.doi.org/10.1108/02640471011033576 http://dx.doi.org/10.1023/B:BUSI.0000032658.12032.4e THE IMPORTANCE OF IDENTIFYING AND ACCOMMODATING E-RESOURCES USAGE DATA FOR THE PRESENCE OF OUTLIERS | LAMOTHE 43 http://biblio.laurentian.ca/research/sites/default/files/pictures/Collection%20Development %20Policy.pdf. 24. Lamothe, “Factors Influencing Usage”; Alain R. Lamothe, “Electronic Serials Usage Patterns as Observed at a Medium-Size University: Searches and Full-Text Downloads,” Partnership: The Canadian Journal of Library and Information Practice and Research 3, no. 1 (2008), https://journal.lib.uoguelph.ca/index.php/perj/article/view/416#.U364KvmSy0I. 25. Martin Zimerman, “E-books and Piracy: Implications/Issues for Academic Libraries,” New Library World 112 (2011): 67–75, http://dx.doi.org/10.1108/03074801111100463. 26. Ibid. 27. Peggy Hageman, “Ebooks and the Long Arm of the Law,” EContent (June 2012), http://www.econtentmag.com/Articles/Column/Ebookworm/Ebooks-and-the-Long-Arm-of- the-Law--82976.htm. 28. “dataset, n.,” OED Online, (Oxford, UK: Oxford University Press, 2013), http://www.oed.com/view/Entry/261122?redirectedFrom=dataset; “Dataset—Definition,” OntoText, http://www.ontotext.com/factforge/dataset-definition; W. Paul Vogt, “Data Set,” Dictionary of Statistics and Methodology: A Nontechnical Guide for the Social Sciences (London, UK: Sage, 2005); Allan G. Bluman, Elementary Statistics—A Step by Step Approach (Boston: McGraw-Hill, 2000). 29. David B. Rorabacher, “Statistical Treatment for Rejection of Deviant Values: Critical Values of Dixon’s ‘Q’ Parameter and Related Subrange Ratios at the 95% Confidence Level,” Analytical Chemistry 63 (1991): 139–45; R. B. Dean and W. J. Dixon, “Simplified Statistics for Small Numbers of Observations,” Analytical Chemistry 23 (1951): 636–38, http://dx.doi.org/10.1021/ac00002a010. 30. Surenda P. Verma and Alfredo Quiroz-Ruiz, “Critical Values for Six Dixon Tests for Outliers in Normal Samples up to Sizes 100, and Applications in Science and Engineering,” Revista Mexicana de Ciencias Geologicas 23 (2006): 133–61. 31. Robert R. Sokal and F. James Rohlf, Biometry (New York: Freeman, 2012); J. H. Zar, Biostatistical Analysis (Upper Saddle River, NJ: Prentice Hall, 2010). 32. “null hypothesis,” AccessScience (New York: McGraw-Hill Education, 2002), http://www.accessscience.com. 33. Ibid. 34. “critical value,” AccessScience, (New York: McGraw-Hill Education, 2002), http://www.accessscience.com. 35. Ibid. 36. Verma and Quiroz-Ruiz, “Critical Values for Six Dixon Tests for Outliers,” 133–61. http://biblio.laurentian.ca/research/sites/default/files/pictures/Collection%20Development%20Policy.pdf http://biblio.laurentian.ca/research/sites/default/files/pictures/Collection%20Development%20Policy.pdf https://journal.lib.uoguelph.ca/index.php/perj/article/view/416#.U364KvmSy0I http://dx.doi.org/10.1108/03074801111100463 http://www.econtentmag.com/Articles/Column/Ebookworm/Ebooks-and-the-Long-Arm-of-the-Law--82976.htm http://www.econtentmag.com/Articles/Column/Ebookworm/Ebooks-and-the-Long-Arm-of-the-Law--82976.htm http://www.oed.com/view/Entry/261122?redirectedFrom=dataset http://www.ontotext.com/factforge/dataset-definition http://dx.doi.org/10.1021/ac00002a010 http://www.accessscience.com/ http://www.accessscience.com/ INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2014 44 37. Rorabacher, “Statistical Treatment for Rejection of Deviant Values,” 139–45. 38. Ibid. 39. Jaakko Astola and Pauli Kuosmanen, Fundamentals of Nonlinear Digital Filtering (New York: CRC, 1997); Jaakko Astola, Pekka Heinonen, and Yrjö Neuvo, “On Root Structures of Median and Median-Type Filters,” IEEE Transactions of Acoustics, Speech, and Signal Processing 35 (1987): 1199–201; L. Ling, R. Yin, and X. Wang, “Nonlinear Filters for Reducing Spiky Noise: 2- Dimensions,” IEEE International Conference on Acoustics, Speech, and Signal Processing 9 (1984): 646–49; N. J. Gallagher and G. Wise, “A Theoretical Analysis of the Oroperties of Median Filters,” IEEE Transactions of Acoustics, Speech, and Signal Processing 29 (1981): 1136–41. 40. Menold et al., “Online Outlier Detection and Removal,” 1110–30. 41. Ibid. 42. Lamothe, “Factors Influencing Usage”; Lamothe, “Electronic Serials Usage Patterns.” 43. Paul Gleason, “Copyright and Electronic Publishing: Background and Recent Developments,” Acquisitions Librarian 13 (2001): 5–26, http://dx.doi.org/10.1300/J101v13n26_02. 44. Tena McQueen and Robert Fleck Jr., “Changing Patterns of Internet Usage and Challenges at Colleges and Universities,” First Monday 9 (2004), http://firstmonday.org/issues/issue9_12/mcqueen/index.html. 45. Robin Peek, “Controlling the Threat of E-Book Piracy,” Information Today 18, no. 6 (2001): 42. 46. Gleason, “Copyright and Electronic Publishing,” 5–26. http://dx.doi.org/10.1300/J101v13n26_02 http://firstmonday.org/issues/issue9_12/mcqueen/index.html 5365 ---- President’s Message Cindi Trainor INFORMATION TECHNOLOGIES AND LIBRARIES | DECEMBER 2013 1 Hi, LITAns! Forum 2013 I'm excited that 2014 is almost here. Last month saw a very successful Forum in Louisville, in my home state of Kentucky. There were 243 people in attendance, and about half of those were first- time attendees. It's also typical of our yearly conference that there are a large number of attendees from the surrounding area; this is one of the reasons that it travels around the country. Louisville's Forum was the last of a few in the "middle" of the country--these included St. Louis, Atlanta, and Columbus. Next year, Forum will move back out west, to Albuquerque, NM. The theme for next year's conference will be "Transformation: from Node to Network." See the LITA blog (http://litablog.org/2013/11/call-for-proposals-2014-lita-forum/) for the call for proposals for concurrent sessions, poster sessions, and pre-conference workshops. Goals of the Organization At the Board meeting in the fall, we took a stab at updating LITA's major goal areas. The Strategic Plan had not been updated since 2010, so we felt it was time to update the goal areas, at least for the short term. The goals that we agreed upon will carry us through Annual Conference 2015 and will give us time to mount a more complete planning process in the meantime. They are: • Collaboration & Networking: Foster collaboration and encourage networking among our members and beyond so the full potential of technologies in libraries can be realized. • Education & Sharing of Expertise: Offer education, publications, and events to inspire and enable members to improve technology integration within their libraries. • Advocacy: Advocate for meaningful legislation, policies, and standards that positively impact on the current and future capabilities of libraries that promote equitable access to information and technology. • Infrastructure: Improve LITA’s organizational capacity to serve, educate, and create community for its members. Midwinter Activities In other governance news, the Board will have an online meeting in January 2014, prior to Cindi Trainor (cindiann@gmail.com) is LITA President 2013-14 and Community Specialist & Trainer for Springshare, LLC. http://litablog.org/2013/11/call-for-proposals-2014-lita-forum/ mailto:cindiann@gmail.com PRESIDENT’S MESSAGE | TRAINOR 2 Midwinter conference. Our one-hour meeting will be spent asking and answering questions of those who typically submit written reports for Board meetings: the Vice-President, the President, and the Executive Director. As always, look to ALA Connect for these documents, which are posted publicly. We welcome your comments, as well as your attendance at any of our open meetings. Our Midwinter meeting schedule is: • the week of January 13 - Online meeting, time and date TBA • Saturday, January 25, 1:30 - 4:30 p.m. - PCC 107A • Monday, January 27, 1:30 - 4:30 p.m. - PCC 115A As always, Midwinter will also hold a LITA Happy Hour (Sunday, 6-8 pm, location TBA), the Top Tech Trends panel (Sunday, 10:30 a.m., PCC 204A), and our Annual membership meeting, the LITA Town Meeting (Monday 8:30 a.m., PCC 120C). We look forward to seeing you, in Philadelphia or virtually. Make sure to check the Midwinter Scheduler (http://alamw14.ala.org/scheduler) for all the details, including the forthcoming Happy Hours location. It's the best party^H^H^H^H^H networking event at Midwinter! I would be remiss if I did not mention LITA's committees and IGs and their Midwinter meetings. Many will be meeting Saturday morning at 10:30 a.m. (PCC 113ABC)--so you can table- hop if you like. Expressing interest at Midwinter is a great way to get involved. Can't make it to Philadelphia? No problem! Fill out the online form to volunteer for a committee, or check out the Connect groups of our Interest groups. Some of the IGs meet virtually before Midwinter; some committees and IGs also invite virtual participation at Midwinter itself. Join us! http://alamw14.ala.org/scheduler 5367 ---- Microsoft Word - ital_december_gerrity_final.docx   Editor’s Comments Bob Gerrity     INFORMATION  TECHNOLOGIES  AND  LIBRARIES  |  SEPTEMBER  2013   3     This  Month’s  Issue   We  have  an  eclectic  mix  of  content  in  this  issue  of  Information  Technology  and  Libraries.   LITA  President  Cindi  Trainor  provides  highlights  of  the  recent  LITA  Forum  in  Louisville  and   planned  LITA  events  for  the  upcoming  ALA  Midwinter  Meeting  in  Philadelphia,  including  the  LITA   Town  Meeting,  the  always-­‐popular  Top  Tech  Trends  panel,  and  the  Association’s  popular   “networking  event”  on  Sunday  evening.     ITAL  Editorial  Board  member  Jerome  Yavarkosky  describes  the  significant  benefits  that   immersive  technologies  can  offer  higher  education.  The  advent  of  Massive  Open  Online  Courses   (MOOCs)  would  seem  to  present  an  ideal  framework  for  the  development  of  immersive  library   services  to  support  learners  who  may  otherwise  lack  access  to  quality  library  resources  and   services.   Responsive  web  design  is  the  topic  of  a  timely  article  by  Hannah  Gascho  Rempel  and  Laurie  M.   Bridges,  who  examine  what  tasks  library  users  actually  carry  out  on  a  library  mobile  website  and   how  this  has  informed  Oregon  State  University  Libraries’  adoption  of  a  responsive  design   approach  for  their  website.   Piotr  Praczyk,  Javier  Nogueras-­‐Iso,  and  Salvatore  Mele  present  a  method  for  automatically     extracting  and  processing  graphical  content  from  scholarly  articles  in  PDF  format  in  the  field  of   high-­‐energy  physics.  The  method  offers  potential  for  enhancing  access  and  search  services  and   bridging  the  semantic  gap  between  textual  and  graphical  content.   Elizabeth  Thorne  Wallington  describes  the  use  of  mapping  and  graphical  information  systems   (GIS)  to  study  the  relationship  between  public  library  locations  in  the  St.  Louis  area  and  the   socioeconomic  attributes  of  the  populations  they  serve.  The  paper  raises  interesting  questions   about  how  libraries  are  geographically  distributed  and  whether  they  truly  provide  universal  and   equal  access.     Vadim  Gureyev  and  Nikolai  Mazov  present  a  method  for  using  bibliometric  analysis  of  the   publication  output  of  two  research  institutes  as  a  collection-­‐development  tool,  to  identify  journals   most  important  for  researchers  at  the  institutes.       Bob  Gerrity  (r.gerrity@uq.edu.au)  is  University  Librarian,  University  of  Queensland,  Australia.       Editor’s Comments Bob Gerrity   EDITOR’S  COMMENTS  |  GERRITY       4         5377 ---- A Candid Look at Collected Works: Challenges of Clustering Aggregates in GLIMIR and FRBR Gail Thornburg INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2014 53 ABSTRACT Creating descriptions of collected works in ways consistent with clear and precise retrieval has long challenged information professionals. This paper describes problems of creating record clusters for collected works and distinguishing them from single works: design pitfalls, successes, failures, and future research. OVERVIEW AND DEFINITIONS The Functional Requirements for Bibliographic Records (FRBR) was developed by the International Federation of Library Associations (IFLA) as a conceptual model of the bibliographic universe. FRBR is intended to provide a more holistic approach to retrieval and access of information than any specific cataloging code. FRBR defines a work as a distinct intellectual or artistic creation. Put very simply, an expression of that work might be published as a book. In FRBR terms, this book is a manifestation of that work.1 A collected work can be defined as “a group of individual works, selected by a common element such as author, subject or theme, brought together for the purposes of distribution as a new work.”2 In FRBR, this type of work is termed an aggregate or “manifestation embodying multiple distinct expressions .”3 Zumer describes aggregate as “a bibliographic entity formed by combing distinct bibliographic units together.”4 Here the terms are used interchangeably. In FRBR, the definition of aggregates applies only to group 1 entities, i.e., not to groups of persons or corporate bodies. The IFLA Working Group on Aggregates has defined three distinct types of aggregates: (1) collections of expressions, (2) aggregates resulting from augmentation or supplementing of a work with additional material, and (3) aggregates of parallel expressions of one work in multiple languages.5 While noting the relationships between the categories, this paper will focus on the first type. Aggregates of the first type include selections, anthologies, series, books with independent sections by different authors, and so on. Aggregates may occur in any format, from a volume containing both of the J. D. Salinger works Catcher in the Rye and Franny and Zooey to a sound recording containing popular adagios from several composers to a video containing three John Wayne movies. Gail Thornburg (thornbug@oclc.org) is Consulting Software Engineer and Researcher at OCLC, Dublin, Ohio. mailto:thornbug@oclc.org A CANDID LOOK AT COLLECTED WORKS | THORNBURG 54 THE ENVIRONMENT The OCLC WorldCat database is replete with bibliographic records describing aggregates. It has been estimated that that database may contain more than 20 percent aggregates.6 This proportion may increase as WorldCat coverage of recordings and videos tends to increase. In the Global Library Manifestation Identifier (GLIMIR) project, automatic clustering of the records into groups of instances of the same manifestation of a work was devised. GLIMIR finds and groups similar records for a given manifestation and assigns two types of identifiers for the clusters. The first type is Manifestation ID, which identifies parallel records differing only in language of cataloging or metadata detail, some of which are probably true duplicates whose differences cannot be safely deduplicated by a machine process. The second type is a Content ID, which describes a broader clustering, for instance, physical and digital reproductions and reprints of the same title from differing publishers. This process started with the searching and matching algorithms developed for WorldCat. The GLIMIR clustering software is a specialization of the matching software developed for the batch loading of records to WorldCat, deduplicating the database, and other search and comparison purposes.7 This form of GLIMIRization compares an incoming record to database search results to determine what should match for GLIMIR purposes. This is a looser match in some respects than what would be done for merging duplicates. The initial challenges of tailoring matching algorithms to suit the needs of GLIMIR have been described in Thornburg and Oskins8 and in Gatenby et al.9 The goals of GLIMIR are (1) to cluster together different descriptions of the same resource and to get a clearer picture of the number of actual manifestations in WorldCat so as to allow the selection of the most appropriate description, and (2) to cluster together different resources with the same content to improve discovery and delivery for end users. According to Richard Greene, “The ultimate goal of GLIMIR is to link resources in different sites with a single identifier, to cluster hits and thereby maximize the rank of library resources in the web sphere.”10 GLIMIR is related conceptually to the FRBR model. If the goal of FRBR is to improve the grouping of similar items for one work, then GLIMIR similarly groups items within a given work. Manifestation clusters specify the closest matches. Content clusters contain reproductions and may be considered to represent elements of the expression level of the FRBR model. The FRBR and GLIMIR algorithms this paper discusses have evolved significantly over the past three years. In addition, it should be recognized that the FRBR algorithms use a map/reduce keyed approach to cluster FRBR works and some GLIMIR content while the full GLIMIR algorithms use a more detailed and computationally expensive record comparison approach. The FRBR batch process starts with WorldCat enhanced with additional authority links, including the production GLIMIR clusters. It makes several passes through WorldCat, each pass constructing keys that pull similar records together for comparison and evaluation. As described by Toves, “Successive passes progressively build up knowledge about the groups allowing us to refine and INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2014 55 expand clusters, ending up with the work, content and manifestation clusters to feed into production.”11 Each approach to clustering has its limits of feasibility, but the FRBR and GLIMIR combined teams have endeavored to synchronize changes to the algorithms and to share insights. Some materials are easier to cluster using one approach, and some in the other. Clustering meets Aggregates In the initial implementation of GLIMIR, the issue of handling collected works was considered out of scope for the project. With experience, the team realized there can be no effective automatic GLIMIR clustering if collected works are not identified and handled in some way. Why is this? Suppose a record exists for a text volume containing work A. This matches to a record containing work A, but actually also containing work B. This matches to a work containing B and also containing works C, D, and E. The effect is a snowballing of cluster members that serves no one. How could this happen? In a bibliographic database such as WorldCat, items representing collected works can be catalogued in several ways. Efforts to relax matching criteria in just the right degree to cluster records for the same work are difficult to devise and apply. The GLIMIR and FRBR teams consulted several times to discuss clustering strategies for works, content, and manifestation clusters. Practical experience with GLIMIR led to rounds of enhancements and distinctions to improve the software’s decisions. While GLIMIR clusters can and have been undone and redone on more than one occasion, it took experience from the team to realize that the clues to a collected work must be recognized. Bible and Beowulf As are many initial production startups, the output of GLIMIR processing was monitored. Reports for changes in any clusters of more than fifty were reviewed by quality control catalogers for suspicious combinations. And occasionally a library using a GLIMIR- or FRBR-organized display would report a strange cluster. This was the case with a huge malformed cluster of records for the Bible. Such a work set tends to be large and unmanageable by nature; there are a huge number of records for the Bible in WorldCat. However, it was noticed the set had grown suddenly over the previous two months. User interface applications stalled when attempting to present a view organized by such a set. One day, a local institution reported that a record for Beowulf had turned up in this same work set. This started the team on an investigation. After much searching and analysis of the members of this cluster, the index case was uncovered. In many cases bibliographic records are allowed to cluster based on a uniform title. What the team found connecting these disparate records was a totally unexpected use of the uniform title, a field A CANDID LOOK AT COLLECTED WORKS | THORNBURG 56 240 subfield a, contents: “B.”. That’s right, “B.”. Once the first case was located, it was not hard to figure out that there were numerous uniform “titles” with other single letters of the alphabet. So in this odd usage, Bible and Beowulf could come together, if insufficient data were present in two records to discriminate by other comparisons. Or potentially, other titles which started with “B.” Seeing this unanticipated use of uniform title field, the FRBR and GLIMIR algorithms were promptly modified to beware. The FRBR and GLIMIR clusters were then unclustered and redone. This was a data issue, and unanticipated uses of fields in a record will crop up, if usually with less drama. Further experience showed more. In the examination of another ill-formed cluster, a reviewer realized that one record had the uniform title stated as “Illiad” but the item title was Homer’s “Odyssey.” Of course these have the same author, and may easily have the same publisher. Even the same translator (e.g., Richard Lattimore) is not improbable for a work like this. This was a case of bad data, but it imploded two very large clusters. Music and Identification of Collected Works As music catalogers know, musical works are very frequently presented in items that are collections of works. The rules for creating bibliographic records for music, whether scores or recordings or other, are intricate. The challenges to software to distinguish minor differences in wording from critical differences seem to be endless. Moreover, musical sound recordings are largely collected works due to the nature of publication. As noted by Papakhian, personal author headings are repeated oftener in sound recording collections than in the general body of materials.12 There are several factors that may contribute to such an observation. There are likely to be numerous recordings by the same performer of different works and numerous records of the same work by different performers. Composers are also likely to be performers. The point is, for sound recordings an author statement and title may be less effective discriminators than for printed materials. Vellucci13,14 and Riley15 have written extensively on the problems of music in FRBR models. The problems of distinguishing and relating whole/part relationships is particularly tricky. Musical compositions often consist of units or segments that can be performed separately. So they are generally susceptible to extraction. These extractive relationships are seen in cases where parts are removed from the whole to exist separately, or perhaps parts for a violin or other instrument are extracted from the full score. Software must be informed with rules as to significant differences in description of varying parts and varying descriptions of instruments, and in this team’s experience that is particularly difficult. Krummel has noted that the bibliographic control of sound recordings has a dimension beyond item and work, that is, performance.16 Different performances of the same Beethoven symphony INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2014 57 need to be distinguished. Cast and performer list evaluation and dates checking are done by the software. However, the comparisons the software can make are susceptible to fullness or scarcity of data provided in the bibliographic record. There is great variation observed in the numbers of cast members stated in a record. Translator and adapter information can prove useful in the same sense of roles discrimination for other types of materials. This is close scrutiny of a record. At the same time consider that an opera can include the creative contributions of an author (plot), a librettist, and a musical composer. Yet these all come together to provide one work, not a collected work. Tillett has categorized seven types of bibliographic relationships among bibliographic entities, including the following: 1. Equivalence, as exact copies or reproduction of a work. Photocopies, microforms are examples. 2. Derivative relationships, or, a modification such as variations, editions, translations. 3. Descriptive, as in criticism, evaluation, review of a work. 4. Whole/part, such as the relation of a selection from an anthology. 5. Accompanying, as in a supplement or concordance or augmentation to a work. 6. Sequential, or chronological relationships. 7. Shared characteristic relationships, as in items not actually related that share a common author, director, performer, or other role. 17 While it is highly desirable for a software system to notice category 1 to cluster different records for the same work, that same software could be confused by “clues,” such as in category 7. And the software needs to understand the significance of the other categories in deciding what to group and what to split. To handle these relations in bibliographic records, Tillett discusses linking devices including, for instance, uniform titles. Yet uniform titles are used for the categories of equivalence relationships, whole/part relationships, and derivative relationships. This becomes more and more complex for a machine to figure out. Of course, uniform titles within bibliographic records are supposed to link to authority records via text string only. Consideration should ideally be given to linking via identifiers, as has been suggested elsewhere.18 Thematic Indexes Review of scores and recordings GLIMIR clusters showed a case where Haydn’s symphonies A and B were brought together. These were outside the traditional canon of the 104 Haydn symphonies and were referred to as “A” and “B” by the Haydn scholar H. C. Robbins Landon. This mis- clustering highlighted the need for additional checks in the software. A CANDID LOOK AT COLLECTED WORKS | THORNBURG 58 The original GLIMIR software was not aware of thematic indexes as a tool for discrimination. Thematic indexes are numbering systems for the works of a composer. The Kochel Mozart catalog, as in K. 626, is a familiar example. These designations are not unique to a given composer, that is, they are intended to be unique for a given composer, but identical designators may coincidentally have been assigned to multiple composers. While “B” series numbers may be applied to works of Chambonnières, Couperin, Dvořák, Pleyel, and others, the presence of more than one B number is suggestive of collected work status. For more on the various numbering systems, see the interesting discussion by the Music Library Association.19 However, the software cannot merely count likely identifiers in the usual place. This could lead to falsely flagging aggregates; one work by Dvořák could have B.193, which is incidentally equivalent to opus 105. Clearly, any detection of multiple identifiers of this sort must be restricted to identifiers of the same series. String Quartet Number 5, or Maybe 6 Cases of renumbering can cause problems in identifying collected works. An early suppressed or lost work, later discovered and added to the canon of the composer’s work, can cause renumbering of the later works. Clustering software needs must be very attentive to discrete numbers in music, but can it be clever enough? Paul Hindemith (1895–1963) works offer an example. His first string quartet was written in 1915, but long suppressed. His publisher was generally Schott. Long after Hindemith’s death, this first quartet was unearthed, and then was published by Schott. The publisher then renumbered all the quartets. So quartets previously 1 through 6 became 2 through 7. The rediscovered work was then called “No. 1,” though sometimes called “No. 0” to keep the older numbering intact. Further, the last two quartets did not even have opus numbers assigned and were both in the same key.20 This presents a challenge. Anything Musical Another problem case emerged when reviewers noticed a cluster contained both the unrelated songs “Old Black Joe” and “When You and I were Young Maggie.” On investigation, the cluster held a number of unrelated pieces. Here the use of alternate titles in a 246 field had led to overclustering, and the rules for use of 246 fields were tightened in FRBR and GLIMIR. As in the other problem cases, cycles of testing were necessary to estimate sufficient yet not excessive restrictions. Rules too strict split good clusters and defeat the purpose of FRBR and GLIMIR. At this point the GLIMIR/FRBR team recognized that rules changes were necessary but not sufficient. That is, a concerted effort to handle collected works was essential. INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2014 59 Strategies for Identifying Collected Works The greatest problem, and most immediate need, was to stop the snowballing of clusters. Clusters containing some member records that are collected works can suddenly mushroom out of control. Rule 1 was that a record for a collected work must never be grouped with a record for a single work. If all in a group are collected works, that is closer to tolerable (more on that later). With time and experimentation, a set of checks were devised to allow collected works to be flagged. These clues were categorized as types: (1) considered conclusive evidence, or (2) partial evidence. Type 2 needed another piece of evidence in the record. Finding the best clues was a team effort. It was acknowledged that to prevent overclustering, overidentification of aggregates was preferable to failure to identify them. Several cycles of tests were conducted and reviewed, assessing whether the software guessed right. Table 1 illustrates the types of checks done for a given bibliographic record. Here the “$” is used as abbreviation for subfield, and “ind” equals indicator. Area Field Rule Notes Uniform Title 240 $a and no $m, $n, $p, or $r Title in $ a on list of terms, without the other subfields listed, IS collected work This is a long list of terms such as “symphonies,” “plays,” “concertos,” and so on. Title 245 Contains “selections,” IS collected 245 245 with multiple semi colons and doc type “rec” 246 If four or more v246 fields with ind2 = 2, 3, or 4, IS collected. If more than 1 246, consider partial evidence Extent 300 If 300$a has “pagination multiple” or “multiple pagings,” IS collected Contents Notes 505$a and $t 1. Check $a for first and last occurrences of “movement”. If Not multiple movement occurrences and does have IF all / any the above produce more than one pattern instance or more A CANDID LOOK AT COLLECTED WORKS | THORNBURG 60 multiple “ / ” pattern. 2. If the above doesn’t find multiple patterns, also look for “ ; “ patterns. 3. If the above checks don’t produce more than 1 pattern, look for multiple “ – ” patterns. 4. Count 505s $t cases. 5. Count $r cases. than one $t, or more than one $r, IS collected. Various fields for Thematic Index clues 505a If any v505 $a, check for differing Opuses. (This also checks for thematic index cases too.) If found, IS collected. For types Score and Recording Related work 740 If 1 or more 740 and 1 has indicator 2 = 2”, IS collected . If only multiple 740s, partial evidence Author 700/710/711/730 Check for $t and $n. And check 730 ind 2 value of “2.” If 730 with ind2 = 2 or multiple $t is found, IS collected. If only 1 $t, partial evidence 100/110/111, 700/710 730 If format recording, and both records are collected work, require cast list match to cluster anything but manifestation matches. That is, do not cluster at content level without verifying by cast. Table 1. Checks on Bibliographic Records. Frailties of Collected Works Identification in Well-Cataloged Records The above table illustrates many areas in a bibliographic record that can be mined for evidence of aggregates. The problem is that cataloging practice offers no one rule mandatory to catalog a collected work correctly. Moreover, as WorldCat membership grows, the use of multiple schemes of cataloging rules for different eras and geographic areas adds to the complexity, even assuming that all the bibliographic records are cataloged “correctly.” Correct cataloging is not assumed by the team. INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2014 61 Software Confounded With all the checks outlined in the table, the team still found cases of collected works that seemed to defy machine detection. One record had the two separate works, Tom Sawyer and Huckleberry Finn, in the same title field, with no other clues to the aggregate nature of the item. The work Brustbild was another case. For this electronic resource set, Brustbild appeared to be the collection set title, but the specific title for each picture was given in the publisher field. A cluster for the work Gedichte von Eduard Morike (score) showed problems with the uniform title which was for the larger work, but the cluster records each actually represented parts of the work. The bad cluster for Si ku quan shu zhen ben bie ji, an electronic resource, contained records which each appeared to represent the entire collection of 400 volumes, but the link in each 856 field pointed only to one volume in the set. Limitations of the Present Approach The current processing rules for collected works adopt a strategy of containment. The problem may be handled in the near term by avoiding the mixing of collected works with noncollected works, but the clusters containing collected works need further analysis to produce optimal results. For example, it is one thing to notice scores “arrangements” as a clue to the presence of an aggregate. The requirement also exists that an arrangement should not cluster with the original score. The rules for clustering and distinguishing different sets of arrangements present another level of complexity. Checks to compare and equate the instruments involved in an arrangement are quite difficult; in this team’s experience, they fail more often than they succeed. Without initial explication of the rules for separating arrangements, reviewers quickly found clusters such as Haydn’s Schopfung, which included records for the full score, vocal score, and an arrangement for two flutes. An implementation that expects one manifestation to have the identifier of only one work is a conceptual problem for aggregates. A simple case: if the description of a recording of Bernstein’s Mass has an obscurely placed note indicating the second side contains the work Candide, Mass is likely to be dominant in the clustering effect, with the second work effectively “hidden.” This manifestation would seem to need three work IDs, one for the combination, one for Mass, and one for Candide. This does not easily translate to an implementation of the FRBR model but could perhaps be achieved via links. Several layers of links would seem necessary. A manifestation needs to link to its collected work. A collected work needs links to records for the individual works that it contains, and vice versa, individual works need to link to collective works. This can be important for translations, for example, into Russian, where collective works are common even where they do not exist in the original language. A CANDID LOOK AT COLLECTED WORKS | THORNBURG 62 Lessons Learned First and foremost, plan to deal with collected works. For clustering efforts this must be addressed in some way for any large body of records. Secondly, formats will gain the focus. The initial implementation of the GLIMIR algorithms used test sets mainly composed of a specific work. After all, GLIMIR clusters should all be formed within one work. These sets were carefully selected to represent as many different types of work sets as possible, whether clear or difficult examples of work set members. Plenty of attention was given to the compatibility of differing formats, given the looser content clustering. These were good tests of the software’s ability to cluster effectively and correctly within a set that contained numerous types of materials. Random sets of records were also tested to cross check for unexpected side effects. What in retrospect the team would have expanded was sets that were focused on specific formats. Recordings, scrutinized as a group, can show different problems than scores or books. The distinctions to be made are probably not complete. Another lesson learned in GLIMIR concerned the risks of clustering. The deliberate effort to relax the very conservative nature of the matching algorithms used in GLIMIR was critical to success in clustering anything. Singleton clusters don’t improve anyone’s view. In the efforts to decide what should and should not be clustered, it was initially hard to discern the larger scale risks of overclustering. Risks from sparse records were probably handled fairly well in this initial effort, but risks from complex records needed more work. Collected works is only one illustration of risks of overclustering. FUTURE RESEARCH The current research suggests a number of areas for possible further exploration: • The option for human intervention to rearrange clusters not easily clustered automatically would seem to be a valuable enhancement. • There is next the general question, what sort of processing is needed, and feasible, to distinguish the members of clusters flagged as collected works? • Part versus whole relationships can be difficult to distinguish from the information in bibliographic records. Further investigation of these descriptions is needed. • Arrangements of works in music are so complex as to suggest an entire study by themselves. Work on this area is in progress, but it needs rules investigation. • Other derivative relationships among works: Do these need consideration in a clustering effort? Can and should they be brought together while avoiding overclustering of aggregates? • How much clustering of collected works may actually be helpful to persons or processes searching the database? How can clusters express relationships to other clusters? INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2014 63 CONCLUSION Clustering bibliographic records in a database as large as WorldCat takes careful design and undaunted execution. The navigational balance between underclustering and overclustering is never easy to maintain, and course corrections will continue to challenge the navigators. ACKNOWLEDGMENTS This paper would have been a lesser thing without the patient readings by Rich Greene, Janifer Gatenby, and Jay Weitz, as well as their professional insights and help in clarifying cataloging points. Special thanks to Jay Weitz for explicating many complex cases in music cataloging and music history. REFERENCES 1. Barbara Tillett, “What is FRBR? A Conceptual Model for the Bibliographic Universe,” last modified 2004, accessed November 22, 2013, http://www.loc.gov/cds/FRBR.html. 2. Janifer Gatenby, email message to the author, November 10, 2013. 3. International Federation of Library Associations (IFLA) Working Group on Aggregates, Final Report of the Working Group on Aggregates, September 12, 2011, http://www.ifla.org/files/assets/cataloguing/frbrrg/AggregatesFinalReport.pdf. 4. Maja Zumer and Edward T. O’Neill, “Modeling Aggregates in FRBR,” Cataloging and Classification Quarterly 50, no. 5–7 (2012): 456–72. 5. IFLA Working Group on Aggregates, Final Report. 6. Zumer and O’Neill, “Modelling Aggregates in FRBR.” 7. Gail Thornbug and W. Michael Oskins, “Misinformation and Bias in Metadata Processing: Matching in Large Databases,” Information Technology & Libraries 26, no. 2 (2007): 15–22. 8. Gail Thornburg and W. Michael Oskins, “Matching Music: Clustering versus Distinguishing Records in a Large Database,” OCLC Systems and Services 28, no. 1 (2012): 32–42. 9. Janifer Gatenby et al., “GLIMIR: Manifestation and Content Clustering within WorldCat,” Code{4}Lib Journal 17 (June 2012),http://journal.code4lib.org/articles/6812. 10. Richard O. Greene, “Cataloging Alchemy: Making Your Data Work Harder” (slideshow presented at the American Library Association Annual Meeting, Washington, DC, June 26–29, 2010), http://vidego.multicastmedia.com/player.php?p=ntst323q. 11. Jenny Toves, email message to the author, December 17, 2013. 12. Arsen R. Papakhian, “The Frequency of Personal Name Headings in the Indiana University Music Library Card Catalogs,” Library Resources & Technical Services 29 (1985): 273–85. http://www.loc.gov/cds/FRBR.html http://www.ifla.org/files/assets/cataloguing/frbrrg/AggregatesFinalReport.pdf http://journal.code4lib.org/articles/6812 http://vidego.multicastmedia.com/player.php?p=ntst323q A CANDID LOOK AT COLLECTED WORKS | THORNBURG 64 13. Sherry L. Vellucci, Bibliographic Relationships in Music Catalogs (Lanham, MD: Scarecrow, 1997). 14. Sherry L. Vellucci, “FRBR and Music,” in Understanding FRBR: What It Is and How It Will Affect Our Retrieval Tools, ed. Arlene G. Taylor (Westport, CT: Libraries Unlimited, 2007), 131–51. 15. Jenn Riley, “Application of the Functional Requirements for Bibliographic Records (FRBR) to Music,” www.dlib.indiana.edu/~jenlrile/presentations/ismir2008/riley.pdf. 16. Donald W. Krummel, “Musical Functions and Bibliographic Forms,” The Library, 5th ser. 31 (1976): 327–50. 17. Barbara Tillett, “Bibliographic Relationships: Toward a Conceptual Structure of Bibliographic Information used in Cataloging,” (PhD diss., Graduate School of Library & Information Science, University of California, Los Angeles, 1987), 22–83. 18. Program for Cooperative Cataloging (PCC) Task Group on the Creation and Function of Name Authorities in a Non MARC Environment, “Report on the PCC Task Group on the Creation and Function of Name Authorities in a Non MARC Environment,” last modified 2013, http://www.loc.gov/aba/pcc/rda/RDA%20Task%20groups%20and%20charges/ReportPCC TGonNameAuthInA_NonMARC_Environ_FinalReport.pdf. 19. Music Library Association, Authorities Subcommittee of the Bibliographic Control Committee, “Thematic Indexes Used in the Library of Congress/NACO Authority File,” http://bcc.musiclibraryassoc.org/BCC-Historical/BCC2011/Thematic_Indexes.htm. 20. Jay Weitz, email message to the author, May 6, 2013. http://www.dlib.indiana.edu/~jenlrile/presentations/ismir2008/riley.pdf http://www.loc.gov/aba/pcc/rda/RDA%20Task%20groups%20and%20charges/ReportPCCTGonNameAuthInA_NonMARC_Environ_FinalReport.pdf http://www.loc.gov/aba/pcc/rda/RDA%20Task%20groups%20and%20charges/ReportPCCTGonNameAuthInA_NonMARC_Environ_FinalReport.pdf http://bcc.musiclibraryassoc.org/BCC-Historical/BCC2011/Thematic_Indexes.htm OVERVIEW AND DEFINITIONS THE ENVIRONMENT Clustering meets Aggregates In the initial implementation of GLIMIR, the issue of handling collected works was considered out of scope for the project. With experience, the team realized there can be no effective automatic GLIMIR clustering if collected works are not identified ... Why is this? Suppose a record exists for a text volume containing work A. This matches to a record containing work A, but actually also containing work B. This matches to a work containing B and also containing works C, D, and E. The effect is a snowb... Bible and Beowulf Music and Identification of Collected Works Thematic Indexes String Quartet Number 5, or Maybe 6 Anything Musical Strategies for Identifying Collected Works The greatest problem, and most immediate need, was to stop the snowballing of clusters. Clusters containing some member records that are collected works can suddenly mushroom out of control. Rule 1 was that a record for a collected work must never be grouped with a record for a single work. If all in a group are collected works, that is closer to tolerable (more on that later). Frailties of Collected Works Identification in Well-Cataloged Records Software Confounded Limitations of the Present Approach Lessons Learned FUTURE RESEARCH CONCLUSION ACKNOWLEDGMENTS This paper would have been a lesser thing without the patient readings by Rich Greene, Janifer Gatenby, and Jay Weitz, as well as their professional insights and help in clarifying cataloging points. Special thanks to Jay Weitz for explicating many co... REFERENCES 5378 ---- assignFAST: An Autosuggest-Based Tool for FAST Subject Assignment Rick Bennett, Edward T. O’Neill, and Kerre Kammerer INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2014 34 ABSTRACT Subject assignment is really a three-phase task. The first phase is intellectual—reviewing the material and determining its topic. The second phase is more mechanical—identifying the correct subject heading(s). The final phase is retyping or cutting and pasting the heading(s) into the cataloging interface along with any diacritics, and potentially correcting formatting and subfield coding. If authority control is available in the interface, some of these tasks may be automated or partially automated. A cataloger with a reasonable knowledge of Faceted Application of Subject Terminology (FAST)1,2 or even Library of Congress Subject Headings (LCSH)3 can quickly get to the proper heading but usually needs to confirm the final details—was it plural? am I thinking of an alternate form? is it inverted? etc. This often requires consulting the full authority file interface. assignFAST is a web service that consolidates the entire second phase of the manual process of subject assignment for FAST subjects into a single step based on autosuggest technology. BACKGROUND Faceted Application of Subject Terminology (FAST) Subject Headings were derived from the Library of Congress Subject Headings (LCSH) with the goal of making the schema easier to understand, control, apply, and use while maintaining the rich vocabula ry of the source. The intent was to develop a simplified subject heading schema that could be assigned and used by nonprofessional cataloger or indexers. Faceting makes the task of subject assignment easier. Without the complex rules for combining the separate subdivisions to form an LCSH heading, only the selection of the proper heading is necessary. The now-familiar autosuggest4,5 technology is used in web search and other text entry applications to help the user enter data by displaying and allowing the selection of the desired text before typing is complete. This helps with error correction, spelling, and identification of commonly used terminology. Prior discussions of autosuggest functionality in library systems have focused primarily on discovery rather than on cataloging.6-11 Rick Bennett (Rick_Bennett@oclc.org) is a Consulting Software Engineer in OCLC Research , Edward T. O’Neill (oneill@oclc.org) is a Senior Research Scientist at OCLC Research and project manager for FAST, and Kerre Kammerer (kammerer@oclc.org) is a Consulting Software Engineer in OCLC Research, Dublin, Ohio. http://www.oclc.org/research/activities/fast.html http://www.loc.gov/catdir/cpso/lcc.html mailto:Rick_Bennett@oclc.org mailto:oneill@oclc.org mailto:kammerer@oclc.org INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2014 35 The literature often uses synonyms for autosuggest, such as autocomplete or type-ahead. Since assignFAST can lead to terms that are not being typed , autosuggest seems most appropriate and will be used here. The assignFAST web service combines the simplified subject choice capabilities of F AST with the text selection features of autosuggest technology to create an in -interface subject assignment tool. Much of a full featured search interface for the FAST authorities, such as searchFAST ,12 can be integrated into the subject entry field of a cataloging interface. This eliminates the need to switch screens, cut and paste, and make control character changes that may differ between the authority search interface and the cataloging interface. As a web service, assignFAST can be added to existing cataloging interfaces. In this paper, the actual operation of assignFAST is described , followed by how the assignFAST web service is connected to an interface, and finally by a description of the web service construction. assignFAST Operation An authority record contains the Established Heading, See headings, and control numbers that may be used for linking or other future reference. The relevant fields of the FAST record for Motion Pictures are shown here: Control Number fst01027285 Established Heading Motion pictures See Cinema See Feature films -- History and criticism See Films See Movies See Moving-pictures In FAST, the facet of each heading is known. Motion pictures is a topical heading. The See references are unauthorized forms of the established heading. If someone intended to enter Cinema as a subject heading, they would be directed to use the established heading Motion pictures. For a typical workflow, the subject cataloger would need to leave the cataloging interface, search for “cinema” in an authority file interface, find that the established heading was Motion Pictures, and return to the cataloging interface to enter the established heading. The figure below shows the same process when assignFAST is integrated into the cataloging interface. Without leaving the cataloging interface, typing only “cine” shows both the See term that was initially intended and the Established Heading in a selection list. ASSIGNFAST: AN AUTOSUGGEST-BASED TOOL |BENNETT, O’NEILL, AND KAMMERER 36 Figure 1. assignFAST typical selection choices. Selecting “Cinema USE Motion pictures” enters the Established term, and the entry process is complete for that subject. Figure 2. assignFAST selection result. The text above the entry box provides the fast ID number and facet type. INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2014 37 As a web service, assignFAST headings can be manipulated by the cataloging interface software after selection and before they are entered into the box. For example, one option available in the assignFAST Demo is MARCBreaker format.13 MARCBreaker combines MARC field tagging and allows diacritics to be entered using only ASCII characters. Using MARCBreaker output, assignFAST returns the following for “ ”: =651 7$aBrazil$zS{tilde}ao Paulo$0(OCoLC)fst01205761$2fast In this case, the output includes MARC tagging of 651 (geographic), as well as subfie ld coding ($z) that identifies the city within Brazil, that it’s a FAST heading, and the FAST control number. The information is available in the assignFAST result to fill one or multiple input boxes and to reformat as needed for the particular cataloging interface. Addition to Web Browser Interfaces As a web service, assignFAST could be added to any web-connected interface. A simple example is given here to add assignFAST functionality to a web browser interface using JavaScript and jQuery (http://jquery.com). These technologies are commonly used, and other implementation technologies would be similar. Example files for this demo can be found on the OCLC Developers Network under assignFAST.14 The example uses the jQuery.autocomplete function.15 First, the script packages jquery.js, jquery- ui.js, and the style sheet jquery-ui.css are required. Version 1.5.2 of JQuery and version 1.8.7 for jquery-ui was used for this example, but other compatible versions should be fine. These are added to the html in the script and link tags. The second modification to the cataloging interface is to surround the existing subject search input box with a set of div tags.
    The final modification is to add JavaScript to connect the assignFAST web service to the search input box. This function should be called from ASSIGNFAST: AN AUTOSUGGEST-BASED TOOL |BENNETT, O’NEILL, AND KAMMERER 38 function setUpPage() { // connect the autoSubject to the input areas jQuery('#existingBox').autoc omplete( { source: autoSubjectExample, minLength: 1, select: function(event, ui) { jQuery('#extraInformation').html("FAST ID " + ui.item.idroot + " Facet "+ getTypeFromTag(ui.item.tag)+ ""); } //end select } ).data( "autocomplete" )._renderItem = function( ul, item ) { formatSuggest(ul, item);}; } //end setUpPage() The source: autoSubjectExample tells the autocomplete function to get the data from the autoSubjectExample function, which in turns calls the assignFAST web service. This is in the assignFASTComplete.js file. In select: function, the extraInformation text is rewritten with additional information returned with the selected heading. In this case, the fast number and facet are displayed. The generic _renderItem of the jQuery.autocomplete function is overwritten by the formatSuggest function (found in assignFASTComplete.js) to create a display that differentiates the See from the Authorized headings that are returned in the search. The version used for this example shows: See Heading USE Authorized Heading when a See heading is returned, or simply the Authorized Heading otherwise. WEB SERVICE CONSTRUCTION The autosuggest service for a FAST heading was constructed a little differently than the typical autosuggest. For a typical autosuggest for the term Motion picture from the example given above, you would index just that term. As the term was typed, Motion picture and other terms starting with the text entered so far would be shown until you resolved the desired heading. For example, typing in “m t” might give Motion pictures Motion picture music Employee motivation INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2014 39 Diesel motor Mothers and daughters For the typical autosuggest, the term indexed is the term displayed and is the term returned when selected. For assignFAST, both the Established and See references are indexed. However, when typing resolves a See heading, both the See heading and its Established Heading are displayed. Only the Established Heading is selected, even if you are typing the See heading. For assignFAST, the “m t” result now becomes Features (Motion pictures) USE Feature films Motion pictures Motorcars (Automobiles) USE Automobiles Motion picture music Background music for motion pictures USE Motion picture music Motion pictures for the hearing impaired USE Films for the hearing impaired Documentaries, Motion picture USE Documentary films Mother of God USE Mary, Blessed Virgin, Saint The headings in assignFAST are ranked by how often they are used in WorldCat, so headings that are more common appear at the top. To place the Established Heading above the See heading when they are similar, the Established Heading is also ranked higher than the See for the same usage. assignFAST can also be searched by facet, so if only topical or geographic headings are desired, only headings from these facets will be displayed. The web service uses a SOLR16 search engine running under Tomcat.17 This provides full text search and many options for cleaning and manipulating the terms within the index. The particular option used for assignFAST is the EdgeNGramFilter.18 This option is used for autosuggest and has each word indexed one letter at a time, building to its entire length. The ndex f “c nem ” w then c nt n “c,” “c ,” “c n,” “c ne,” “c nem,” nd “c nem .” SOLR handles UTF-8 encoded Unicode for both input and output. The assignFAST indexes and queries are normalized using FAST normalization19 to remove punctuation, diacritics, and capitalization. FAST normalization is very similar to NACO normalization, although in FAST nor malization the subfield indicator is replaced by a space and no commas retained. assignFAST is accessed using a REST request.20 REST requests consist of URLs that can be invoked via either HTTP POST or GET methods, either programmatically or via a web browser. http://fast.oclc.org/searchfast/fastsuggest?&query=[query]&queryIndex=[queryIndex]&qu eryReturn=[queryReturn]&suggest=autosuggest&rows=[numRows]&callback=[callbackFun ction] ASSIGNFAST: AN AUTOSUGGEST-BASED TOOL |BENNETT, O’NEILL, AND KAMMERER 40 where parameter description query The query to search queryIndex The index corresponding to the FAST facet. These include name description suggestall All facets suggest00 Personal names suggest10 Corporate names suggest11 Events suggest30 Uniform titles suggest50 Topicals suggest51 Geographic names suggest55 Form/Genre queryReturn Information requested list, comma separated. These include: names description idroot FAST Number Auth Authorized Heading, formatted for display with—as subfield separator Type alt or auth—indicates whether the match on the queryIndex was to an Authorized or See heading Tag MARC Authority tag number for the heading—100= Personal name, 150 = Topical, etc. Raw Authorized Heading, with subfield indicators. Blank if this is identical to auth (i.e., no subfields) breaker Authorized Heading in marcbreaker format. Blank if this is identical to raw (i.e., no diacritics) indicator Indicator 1 from the Authorized Heading numRows headings to return maximum restricted to 20 Callback the callback function name for jsonp Table 1. assignFAST web service results description. INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2014 41 Example Response: http://fast.oclc.org/searchfast/fastsuggest?&query=hog&queryIndex=suggestall&queryReturn=s uggestall%2Cidroot%2Cauth%2Ctag%2Ctype%2Craw%2Cbreaker%2Cindicator&suggest=autoSu bject&rows=3&callback=testcall yields the following response: testcall({ "responseHeader":{ "status":0, "QTime":148, "params":{ "json.wrf":"testcall", "fl":"suggestall,idroot,auth,tag,ty pe,raw,b reaker,indicator", "q":"suggestall:hog", "rows":"3"}}, "response":{"numFound":1031,"start":0,"docs" :[ { "idroot":"fst01140419", "tag":150, "indicator":" ", "type":"alt", "auth":"Swine", "raw":"", "breaker":"", "suggestall":["Hogs"]}, { "idroot":"fst01140470", "tag":150, "indicator":" ", "type":"alt", "auth":"Swine--Ho using", "raw":"Swine$xHousing", "breaker":"", "suggestall":["Hog houses"]}, { "idroot":"fst00061534", "tag":100, "indicator":"1", "type":"auth", "auth":"Hogarth, William, 1697-1764", "raw":"Hogarth, William,$d1697-1764", "breaker":"", "suggestall":["Hogarth, William, 1697-1764"]}] }}) Table 3. Typical assignFAST JSON data return. ASSIGNFAST: AN AUTOSUGGEST-BASED TOOL |BENNETT, O’NEILL, AND KAMMERER 42 The first response heading is the Use For headin Hogs, which has the Authorized heading Swine. The second is the Use For heading for Hog houses, which has the Authorized heading Swine-- Housing. This Authorized heading is also given in its raw form, including the $x subfield separator, which is unnecessary for the first heading. The third response matches the Authorized heading for Hogarth, William, 1697–1764, which is also given in its raw form. The breaker (MARCBreaker) format is only added if it differs from the raw form, which is only when diacritics are present. CONCLUSIONS Subject assignment is a combination of intellectual and manual tasks. The assignFAST web service can be easily integrated into existing cataloging interfaces, greatly reducing the manual effort eq ed f g d s bject d t ent y nc e s ng the c t ge ’s p d ct v ty. REFERENCES 1. Lois Mai Chan and Edward T. O’Neill, FAST: Faceted Application of Subject Terminology, Principles and Applications (Santa Barbara, CA: Libraries Unlimited, 2010), http://lu.com/showbook.cfm?isbn=9781591587224 . 2. OCLC Research Activities associated with FAST are summarized at http://www.oclc.org/research/activities/fast. 3. Lois M. Chan, Library of Congress Subject Headings: Principles and Application: Principles and Application (Westport, CT: Libraries Unlimited, 2005). 4. “A t c mp ete ” Wikipedia, last modified on October 1, 2013, http://en.wikipedia.org/wiki/Autocomplete. 5. Tony Russell-Rose, “Des gn ng e ch: As-You-Type Suggestions,” UX Magazine, article no. 828, May 16, 2012, http://uxmag.com/articles/designing-search-as-you-type-suggestions. 6. David Ward, Jim Hahn, and Kirsten Fe st “A t c mp ete s Rese ch T : A t dy n Providing Search Suggestions ” Information Technology & Libraries 31, no. 4 (December 2012), 6–19. 7. Jon Je mey “A t m ted Indexing: Feeding the AutoComplete Monster,” Indexer 28, no. 2 (June 2010), 74–75. 8. Holger Bast, Christian W. Mortensen, and Ingmar Webe “O tp t-Sensitive Autocompletion Search,” Information Retrieval 11 (August 2008), 269–286. 9. Elías Tzoc, “Re-Using Today’s Metadata for Tomorrow’s Research: Five Practical Examples for Enh nc ng Access t D g t C ect ns ” Journal of Electronic Resources Librarianship 23, no. 1 (January–March 2011) http://lu.com/showbook.cfm?isbn=9781591587224 http://www.oclc.org/research/activities/fast/ http://en.wikipedia.org/wiki/Autocomplete http://uxmag.com/articles/designing-search-as-you-type-suggestions INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2014 43 10. Holger Bast and Ingmar Webe “Type Less F nd M e: F st A t c mp et n e ch w th Succinct Index,” SIGIR ’06 Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (New York: ACM, 2006), 364–71. 11. Demian Katz, Ralph LeVan, and Ya’aqov Ziso “Us ng A th ty D t n V F nd ” Code4Lib Journal 11 (June 2011). 12. Edward T. O’Ne , Rick Bennett, and Kerre Kammerer, “Using Authorities to Improve Subject Searches ” in M j Ž me nd K. R e nd Edw d T. O’Ne eds., “ ey nd L b es— Subject Metadata in the Digital Environment and Semantic Web ,” special issue, Cataloging & Classification Quarterly 52, no. 1/2 (in press). 13. “MARCM ke nd MARC e ke Use ’s M n ” Library of Congress, Network Development and MARC Standards Office, revised November 2007, http://www.loc.gov/marc/makrbrkr.html . 14. “OCLC Deve pe s Netw k— ss gnFA T ” s bm tted eptembe 28 2012 http://oclc.org/developer/services/assignfast [page not found] 15. “jQ e y t c mp ete ” ccessed Oct be 1 2013 http://jqueryui.com/autocomplete. 16. “Ap che L cene—Ap che ” ccessed Oct be 1 2013 http://lucene.apache.org/solr. 17. “Ap che T mc t ” ccessed Oct be 30 2013 http://tomcat.apache.org. 18. “ OLR W k —Analyzers Tokenizers TokenFilters,” last edited October 29, 2013, http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters . 19. Thomas B. Hickey, Jenny Toves, and Edward T. O’Neill, “Naco Normalization: A Detailed Examination of the Authority File Comparison Rules,” Library Resources & Technical Services 50, no. 3 (2006), 166–72. 20. “Rep esent t n t te T nsfe ” Wikipedia, last modified on October 21, 2013, http://en.wikipedia.org/wiki/Representational_State_Transfer . http://www.loc.gov/marc/makrbrkr.html http://oclc.org/developer/services/assignfast http://jqueryui.com/autocomplete/ http://lucene.apache.org/solr/ http://tomcat.apache.org/ http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters http://en.wikipedia.org/wiki/Representational_State_Transfer 5403 ---- Editorial Board Thoughts: A Considerable Technology Asset that Has Little to Do with Technology Mark Dehmlow INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2014 4 For this issue’s editorial, I thought I would set aside the trendy topics like discovery, the clo ud, and open . . . well, everything—source, data, science—and instead focus on an area that I think has more long-term implications for technologists and libraries. For technologists in libraries, probably any industry really, I believe our most important challenges aren’t technical at all. For the average “techie,” even if an issue is complex, it is often finite and ultimately traceable to a root cause—the programmer left off a semi-colon in a line of code, the support person forgot to plug in the network cable, or the systems administrator had a server choke after a critical kernel error. Debugging people issues, on the other hand, is much less reductive. People are nothing but variables who respond to conflict with emotion and can become entrenched in their perspectives (right or wrong). At a minimum, people are unpredictable. The skill set to navigate people and personalities requires patience, flexibility, seeing the importance of the relationship through the 1s and 0s, and often developing mutual trust. Working with technology benefits from one’s intelligence (IQ), but working with people requires a deeper connection to perception, self-awareness, body language, and emotions, all parts of emotional intelligence (EQ). EQ is relevant to all areas of life and work, but I think particularly relevant to technology workers. Of particular importance are EQ traits related to emotional regulation, self-awareness, and the ability to pick up social queues. My primary reasoning for this is that technology is (1) fairly opaque to people outside of technology areas and (2) technology is driving so much of the rapid change we are experiencing in libraries. IT units in traditional organizations have a significant challenge because many root issues in technology are not well understood, and change is uncomfortable for most, so it is easy to resent technology for being such a strong catalyst for change. As a result, it is becoming more incumbent upon us in technology to not only instantiate change in our organizations but also to help manage that change through clear communication, clear expectation setting, defining reasonable timeframes that accommodate individuals’ needs to adapt to change, a commitment to shift behavior through influence, and just plain old really good listening. I would like to issue a bit of a challenge to technology managers as you are making hiring decisions. If you want the best possible working relationships with other functional areas in the library, especially traditional areas, spend time evaluating candidates for soft skills like a relaxed demeanor; patience; clear, but not condescending, communication; and a personal commitment to Mark Dehmlow (mdehmlow@nd.edu), a member of LITA and the ITAL editorial board, is Director, Information Technology Program, Hesburgh Libraries, University of Notre Dame, South Bend, Indiana. EDITORIAL BOARD THOUGHTS: A CONSIDERABLE TECHNOLOGY ASSET | DEHMLOW 5 serving others. These skills are very hard to teach. They can be developed if one is committed to developing them, but more often than not, they are innate. If a candidate has those traits as a base but also has an aptitude for understanding technology, that individual will likely be the kind of employee people will want to keep, certainly much more so than someone who has incredible technical skill but little social intelligence. For those who are interested in developing their EQ, there are many of tools available—a million management books on team building, servant leadership, influencing coworkers, providing excellent service, etc. Personally, I have found that developing a better sense of self-awareness is one of the best ways to increase one’s EQ. Tests such as the Meyers Briggs Type Indicator ,1 the Strategic Leadership Type Indicator ,2 and the DISC,3 which categorize your personality and work-style traits, can be very effective tools for understanding how you approach your work and how your work style may affect your peers. Combined with a willingness to flex your style based on the personalities of your coworkers, these can be very powerful tools for influencing outcomes. Most importantly, I have found putting the importance of the relationship above the task or goal can make a remarkable difference in cultivating trust and collaboration. Self-awareness and flexible approaches not only have the opportunity to improve internal relationships between technology and traditional functional areas of the library, but between techies and end users. We are using technology in many new creative ways to support end users, meaning techies are more and more likely to have direct contact with users. In many ways, our reputation as a committed service profession will be affected by out tech staffs’ ability to interact well with end users, and ultimately, I believe the proportion of our tech staff that have a high EQ could be one the strongest predictor s of the long-term success for technology teams in libraries. REFERENCES 1. “My MBTI Personality Type,” The Myers Briggs Foundation, http://www.myersbriggs.org/my- mbti-personality-type/mbti-basics. 2. “Strategic Leadership Type Indicator —Leader’s Self Assessment,” HRD Press, http://www.hrdpress.com/SLTI. 3. “Remember that boss who you just couldn’t get through to? We know why…and we can help,” Everything DISC, http://www.everythingdisc.com/Disc-Personality-Assessment-About.aspx. http://www.myersbriggs.org/my-mbti-personality-type/mbti-basics/ http://www.myersbriggs.org/my-mbti-personality-type/mbti-basics/ http://www.hrdpress.com/SLTI http://www.everythingdisc.com/Disc-Personality-Assessment-About.aspx 5404 ---- Book Reviews INFORMATION TECHNOLOGY AND LIBRARIES | MARCH 2014 44 EPUB 3: Best Practices, by Matt Garrish and Markus Gylling. Sebastopol, CA: O'Reilly. 2013. 345 pp. ISBN: 978-1-449-32914-3. $29.99. There is much of value in this book—there aren't really that many books out right now about the electronic book markup framework, EPUB 3—yet I have a hard time recommending it, especially if you're an EPUB novice like me. So much of the book assumes a familiarity with EPUB 2. If you aren't familiar with this version of the specification, then you will be playing a constant game of catch-up. Also, it's clear that the book was written by multiple authors; the chapters are sometimes jarringly disparate with respect to pacing and style. The book as a whole needs a good edit. This is surprising since O'Reilly is almost unifo rmly excellent in this regard. The first three chapters form the core of the book. The first chapter, "Package Document and Metadata," illustrates how the top level container of any EPUB 3 book is the "package document." This document contains metadata about the book as well as a manifest (a list of files included in the package as a whole), a spine (a list of the reading order of the files included in the book), and an optional list of bindings (a lookup list similar to the list of helper applications contained in the configurations of most modern Web browsers). The second chapter, "Navigation," addresses and illustrates the creation of a proper Table of Contents, a list of Landmarks (sort of an abbreviated Table of Contents), and a Page List (useful for quickly navigating to a specific print-equivalent page in the book). The third chapter, "Content Documents," is the heart of the core of the book. This chapter addresses markup of actual chapters in a book, pointing out that EPUB 3 markup here is mostly a subset of HTML5, but also pointing out such things as the use of MathML for mathematical markup, SVG (Scalable Vector Graphics), page layout issues, use of CSS, and the use of document headers and footers. After reading these first three chapters, my sense is that one is ready to dive into a markup project, which is exactly what I did with my own project. That said, I think a reread of these core chapters is due, which I intend to do presently. The rest of the book is devoted to specialty subjects such as how to embed fonts, use of audio and video clips, "media overlays" (EPUB 3 supports a subset of SMIL, the Synchronized Multimedia Integration Language, for creating synchronized text/audio/video presentations), interactivity and scripting (with Javascript), global language support, accessibility issues, provision for automated text-to-speech, and a nice utility chapter on validation of EPUB 3 XML files. Of these, the chapter on global language support I found to be fascinating. For us native English speakers, it's not immediately obvious some of the problems one will inevitably encounter when trying to create an electronic publication that can work in non-Western languages. Just consider languages that read vertically and from right to left, for one! As an EPUB novice, my greatest desire would be for the book to provide, maybe in an Appendix, a fairly comprehensive example of an EPUB 3 marked -up book. Maybe this is a tall BOOK REVIEWS 45 order? Nevertheless, I would love to see an example of marked up text including bidirectional footnotes, pagination, a table of contents, etc.; simple, foundational things, really. Examples of each of these are included in the book, but not in one place. Having such an example in one place would be something that could be used as a quick-start template for us EPUB beginners. To be fair, code examples of all of this is up on the accompanying Website, and I am using these examples as I learn to code EPUB 3 for my own project. But having a single, relatively comprehensive example as an appendix to the book would be very useful. As I read this book, something kept bothering me. EPUB2 and EPUB 3 are so very different, with reading systems designed to render EPUB 3 documents being fairly rare at this point. So if different versions of the same spec are so different, with no guarantee that a future reading system will be able to read documents adhering to a previous version, then the prospect of reading EPUB documents into the future is pretty sketchy. Are e-books, then, just convenient and cool mechanisms for currently reading longish narrative prose—convenient and cool, but transitory? Mark Cyzyk is the Scholarly Communication Architect in The Sheridan Libraries, Johns Hopkins University, Baltimore, Maryland, USA. 5480 ---- Adventure Code Camp: Library Mobile Design in the Backcountry David Ward , James Hahn, and Lori Mestre INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2014 45 ABSTRACT This article presents a case study exploring the use of a student coding camp as a bottom-up mobile design process to generate library mobile apps. A code camp sources student programmer talent and ideas for designing software services and features. This case study reviews process, outcomes, and next steps in mobile web app coding camps. It concludes by offering implications for services design beyond the local camp presented in this study. By understanding how patrons expect to integrate library services and resources into their use of mobile devices, librarians can better design the user experience for this environment. INTRODUCTION Mobile applications offer an exciting opportunity for libraries to expand the reach of their services, to build new connections, and to offer unique, previously unavailable services for their users. Mobile apps not only provide the ability to present library services through mobile views (e.g., the library catalog and library website), but they can tap into an ever-increasing list of mobile-specific features. By understanding how patrons expect to integrate library services and resources into their use of mobile devices, librarians can better design the user experience for this environment. By adjusting the normal app production workflow to directly involve students during the formative stages of mobile app conception and design, libraries have the potential to generate products that more accurately anticipate real-life student needs. This article details one such approach, which sources student talent to code apps in a fast-paced, collaborative setting. As part of a two-year Institute of Museum and Library Services (IMLS) grant, an academic library– based research team investigated three different methods for involving users in the app development process—a student competition, integrated computer science class projects, and the coding camp described in this article. The coding camp method focuses on a trend in mobile software development of having intensive two-to-three-day coding events that result in working prototypes of applications (e.g., iPhoneDevCamp, http://www.iphonedevcamp.org/). Coders typically work in groups to simultaneously learn how new software works, and also develop a functioning app that matches an area of personal interest. Camps promote collaboration, which provides additional networking and social outcomes to attendees. Additionally, camps provide an David Ward (dh-ward@illinois.edu) is Reference Services Librarian, James Hahn (imhahn@illinois.edu) is Orientation Services & Environments Librarian, and Lori Mestre (lmestre@illinois.edu) is Head, Undergraduate Library and Professor of Library Administration, University of Illinois at Urbana-Champaign. http://www.iphonedevcamp.org/ mailto:dh-ward@illinois.edu mailto:imhahn@illinois.edu mailto:lmestre@illinois.edu ADVENTURE CODE CAMP: LIBRARY MOBILE DESIGN IN THE BACKCOUNTRY | WARD, HAHN, AND MESTRE 46 opportunity for software makers to promote their services and products, and they can result in new code and ideas on which to base future products. For academic libraries, a camp environment provides an educational opportunity for students, particularly those in a field with a computing or engineering focus, to learn new coding languages and techniques and to gain experience with a professional software production process that runs the full timeline from conception to finished product. Coding camps offer a chance for librarians to get direct student feedback on their own software development goals. The resulting applications provide potential benefits to both groups—students have a functional prototype to enhance their classroom experiences and a codebase to build on for future projects, and the librarians gain an insight into students’ desires for the content of mobile apps, code to integrate into existing apps, and direct student input into the iterative design process. This article presents the results of a mobile application coding camp held in fall 2013. The camp method was tested as a way to explore a less time- and staff-intensive process for involving students in the creation of library mobile apps. Three specific research questions framed this investigation: 1. What library and course-related needs do students believe would benefit from the development of a mobile application? 2. Is the library providing access to data that is relevant to these needs, and is it available in a format (e.g., Restful APIs) that end users can easily adopt into their own application design process? 3. How viable is the coding camp method for generating usable mobile app prototypes? LITERATURE REVIEW In line with efforts in academic libraries to operationalize participatory design for novel service implementation,1 the library approach to code camps included sourcing student technical expertise in line with other tech companies’ approaches to quickly iterating prototypes that may advance or enhance company services. While coding camps happen in corporate settings, other types of camps try to publicize technologies, like a programming language, while others still are directed toward a specific cohort.2 The departure point for the library was in understanding other ways the library might consider organizing and pairing its resources of APIs with other available campus services. A few highly visible and notable corporate “hackfests” or “hackdays” include the Facebook hackdays, in which Facebook timeline was developed (http://www.businessinsider.com/facebook-timeline-began-as-a-hackathon-project-2012-1). The mobile app company Twitter also has monthly hackfests where employees from across the company work for a sustained period (a weekend or Friday) on new ideas putting together prototypes that may transition into new services for the company. http://www.businessinsider.com/facebook-timeline-began-as-a-hackathon-project-2012-1 INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2014 47 An example of code camps from academia are the MHacks camps at the University of Michigan (http://mhacks.challengepost.com/), among the largest code camps for university students in the Midwest. These camps are notable for their funding from corporations and for their support of student travel from colleges around the country to participate at the University of Michigan. At each event, coders are encouraged to make use of the corporate APIs that student programmers may make use of once they graduate or form companies after graduation. On the professional front, digital library code meet-ups (such as that of the code4lib pre- conference: http://code4lib.org/) are an opportunity for library technologists to share strategies and new directions in software using hands-on group coding sessions that last a half or full day. A recent digital event for the Digital Public Library of America (DPLA) hosted hackfests to demonstrate interface and functional possibilities with the source content in the DPLA. Similarly, the Hathi Trust Research Center organized a developer track for API coaching at their conference so that participants would have hands-on opportunities to use the Hathi Trust API (http://www.hathitrust.org/htrc_uncamp2013). Goals of coding camps include development of new services or creation of value-added services on already existing operations. Code is not required to be complete, but functional prototypes help showcase new ways of approaching problems or novel solutions. Recently, MHacks issued the call to form new businesses at their winter hackathon (http://www.mhacks.org). Libraries are typically less interested in new businesses, but rather seek new service ideas and new principles for organizing content via mobile and to do so in such a way that will source student preferences for location specific services, a key focus for the research team’s Student/Library Collaborative IMLS grant. METHOD While the camp itself took only two days, there was a significant amount of lead-time needed to prepare. In addition to obtaining standard campus institutional-review-board permissions for the study, it was also necessary to consult the Office of Technology Management to devise an assignment agreement covering the software generated by the camp. The research team chose a model that gave participating students the option to assign co-ownership of the code they developed to the library. This meant that both students and the library could independently develop applications using the code generated during the camp. Marketing for the camp specifically targeted departments and courses where students with interest and skills for mobile application development were likely to be found, particularly in computer science and engineering. Individual instructors were contacted, as well as registered student organizations, to help promote the camp. Attendees were directed to an online application form, where they were asked to provide information on their coding skills and details on their interest in mobile application development. http://mhacks.challengepost.com/ http://code4lib.org/ http://www.hathitrust.org/htrc_uncamp2013 http://www.mhacks.org/ ADVENTURE CODE CAMP: LIBRARY MOBILE DESIGN IN THE BACKCOUNTRY | WARD, HAHN, AND MESTRE 48 Ten students were ultimately selected from the pool and, of those, six attended the camp. A pre- camp package was sent to these students to help them prepare for the short, intense timeframe the event entailed. This package included details on library data that were available to base applications on through web APIs, as well as brief tutorials on the coding languages and data formats participants needed to be familiar with (e.g., JavaScript, JSON, XML, etc.). Participants were also provided with information on parking and other logistics for the event. The research team consisted of librarians and academic professionals involved in public services and mobile design, and student coders employed by the library to serve as peer mentors. The team designed the camp as a two-day experience occurring over a weekend (Friday evening to Saturday late afternoon). The first day was scheduled as an introduction to the camp, with details on library and related APIs that could be used for apps and an opportunity for participants to brainstorm app ideas and form design teams. The day ended with some preliminary coding and consultation with camp organizers about planned directions and needs for the second day. The second day of the camp mostly for coding, with breaks scheduled for food, presentations of work-in-progress, and an opportunity to ask questions of the research team. The day ended with each team presenting their app, describing their motivation in designing it and the functionality they had been able to code into it. Given the brief turnaround time, the research team put a heavy focus during the orientation session on clearly articulating the need to develop apps germane to student library needs. Examples from the student mobile app design competition conducted in February 2013 were provided as starting points for discussion, as these reflected known student desires for mobile library applications.3 After the camp ended, students who elected to share their code with the library were given details on how and where to deposit the code. Post-camp debriefing interviews (lasting 30 to forty-five minutes each) were scheduled individually with all participants to get their feedback on the setup of the event as well as what they felt they learned from the experience. DISCUSSION Researcher observations and feedback from students, both during the camp and in individual interviews afterwards, led to several insights about what sorts of outcomes libraries might anticipate from running camps, how to best structure library coding camps, what outcomes students anticipate from participating in a sponsored camp environment, and what features and preferences students have for mobile apps designed to support their academic endeavors. A key student assumption, which emerged from comments at the event and through subsequent student interviews, was that students anticipated completing a fully functioning mobile app by the end of camp. Instead, the two student teams each finished with an app that, while it included some of the features they desired, still required additional coding to be fully realized. Several INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2014 49 suggestions were made for how this need might be met at future events. The most consistent feedback from the students was that they would have liked an additional day of coding (three total camp days), so that they could have gotten further on the implementation of their app ideas. During the exit interviews, one student noted that the two-day timeframe really only allowed for sketching out an idea for an app, not coding it from scratch. A pair of related suggestions from students included having templates for mobile apps available to review to get up to speed on new frameworks (particularly jQuery), and secondly, a longer meet- and-greet for teams prior to beginning work during which they could compare available coding skills and have some extended brainstorming of app ideas. Students were somewhat mixed in their desire for assistance in developing app ideas—some appreciated the open-endedness of the camp, but others wanted a more organizer-driven approach. Some students suggested having time to work with library staff after the camp to finish or polish their apps. This observation suggests the enthusiasm students had for the camp itself, and specifically for having a social, structured, and mentored opportunity to develop their coding skills. Based on these requests, the research team created “office hours” on the fly after the camp ended to support this request. Research team members and coding staff communicated times when team members could come into the library and get additional help with developing their apps. The students had very similar themes for app features to those that the research team observed in an earlier student mobile app competition study. Notable categories included the following: • Identify and build connections with peers from courses. • Discover campus resources and spaces. • Facilitate common activities such as studying and meeting for group work. Students remarked that the camp was an opportunity to both meet people with similar coding interests as well as to learn more about specific functional areas of app development (specific coding languages, user interface design, etc.) in which they had little experience. JQuery and JavaScript for user-facing designs were particular areas of interest. Many students had some in- depth background working on pieces of a finished software product but had not previously done start-to-finish software design; this was a big selling point for the camp. The collaborative nature of the camp also matched students’ preferences to work in teams and to learn from peers. While the research team had coders on hand to assist with both the library APIs, as well as jQuery basics, most teams did the majority of their work themselves, and preferred self-discovery. Each team did eventually ask for coding advice, but this occurred toward the end of the camp, once their apps were largely coded and they needed assistance overcoming particular sticking points. The other piece of advice students asked organizers about concerned identifying APIs for locations of campus maps, and other related resources to serve as data sources powering their apps. In the course of assisting with these requests, researchers discovered another key issue facing library mobile app development—the lack of campus standards for presenting information across ADVENTURE CODE CAMP: LIBRARY MOBILE DESIGN IN THE BACKCOUNTRY | WARD, HAHN, AND MESTRE 50 different colleges and departments. In particular, maps of rooms inside campus buildings were not provided in a consistent or comprehensive way. This was particularly frustrating to the team that was attempting to develop an app featuring turn-by-turn navigation and directions to meeting rooms and computer labs. In addition to sharing information on known APIs and data sources, camp organizers also learned about previously unknown data sources from the student teams. One example was a JSON feed for the current availability of computers in labs provided by the College of Engineering. While this feed was beneficial to starting work on an app for one team, it also led to frustration because feeds for other campus computer labs did not exist, and the team was limited to designing around the specific labs that did have this information available. Observed student discussions about the randomness of data availability also highlighted one of the key themes of student-centered design—the conceptualization of a university as a single entity, the various parts of which combine and come in and out of focus depending on the current student task. Related student feedback from one of the post-event interviews described a strong desire to create integrated, multifunction apps to meet student needs as opposed to a variety of apps that each did one thing. The siloed nature of campus infrastructures frustrates this desire to some extent but also creates opportunities for students to build a tool that meets a real need among their peers to comprehend and organize their academic environment. This observation also matches those found during the aforementioned student competition. CONCLUSION AND FUTURE DIRECTIONS Student feedback on the camp, as a whole, was very positive, and in the individual interviews, students noted they would like to participate in another camp if it was offered. On the library side, the research team felt that the camp was useful to their ongoing mobile app development process, partially for the code generated but primarily for the direct feedback on what types of apps students wanted to see. The start-up time and costs for the project were low, as expected, and the insights into student mobile preferences seemed proportionate to this outlay. The camp method should be reproducible in a variety of library environments. The key assets other libraries will need to have in place to run a camp include staff with knowledge of client-side API use (in particular jQuery, CORS, or related skills), and knowledge of campus data sources that students may wish to pull from. Third-party APIs with bibliographic data (e.g., Good Reads) could also be used as placeholders for libraries that do not have access to APIs for their own catalogs or discovery systems. Student suggestions for extending the camp by a day, and their ideas for how to structure it for student success, were very specific and actionable and provided excellent guidance. One of their ideas was to develop tutorials and templates that could be introduced as a pre-camp meeting. This would not add too much prep time. Another idea for a future camp would be to develop a specific theme for teams, which would allow for more documentation of and practice with specific APIs. INFORMATION TECHNOLOGY AND LIBRARIES | SEPTEMBER 2014 51 The low attendance was a concern, so for the next camp twice the number of desired participants will be invited to ensure both a variety of coding skills and interests as well as opportunities for more teams to be formed. Additionally, partnerships with student coding groups or related classes should help to drive up attendance. The biggest difficulty moving forward will be developing campus standards for data that can be made available to students about resources, spaces, and services. As noted above, students typically do not design a “library app,” rather they look to build a “student app” that pulls in a variety of data from across campus. Functions of apps are therefore more oriented toward common student activities like studying, socializing, and learning. A related challenge will be to provide adequate format and delivery mechanisms for access to supporting data feeds. Cognizant of the silo issue, noted above, as libraries present their own data for student consumption, these tendencies towards a unified view need to be taken into account. Completion of an assignment is more than identifying three scholarly sources; it might involve identifying a space to do the research, locating peers or mentors for either the research or writing process, locating suitable technology to complete an assignment, and a variety of other needs. The features and information presented on a library’s website should be designed as modular building blocks that can fit into other campus services in a similar way to how course reserves are sometimes presented in campus learning management services alongside syllabi and assignments. Separating library content (e.g., full-text articles, room information, research services) from library process can help with freeing information about what libraries have to offer and can facilitate broader discovery of services and resources at point of need. Key to this process is recognizing the student desire to shape the resources they need into a comprehensible format that matches their workflow rather than forcing students to learn a specific, isolated, and inflexible path for each part of the projects they work on. This study has shown that a collaborative process in technology design can yield insights into students’ conceptual models about how spaces, resources, and services can be implemented. While the traditional model of service development often leaves these considerations until the very end in a summative assessment of service, the coding camp and collaborative methods presented here provide librarians a new tool for adding depth to service design and implementation, ultimately resulting in services and platforms that are informed by a more well- rounded and deeper understanding of the student mobile-use experience. In that regard, the initial research questions that framed this study could also be used by other libraries as they explore the library and course-related needs that students could benefit from with the development of mobile applications, as well as to determine if their library provides access to data that is relevant to those needs. The results from this study have affirmed that, for at least the library from this study, that the coding camp method is viable for generating usable mobile app prototypes. It also affirmed that by directly involving students during the formative stages of mobile app conception and design, the products of those apps more accurately reflect real-life student needs. ADVENTURE CODE CAMP: LIBRARY MOBILE DESIGN IN THE BACKCOUNTRY | WARD, HAHN, AND MESTRE 52 REFERENCES 1. Council on Library and Information Resources (CLIR), Participatory Design in Academic Libraries: Methods, Findings, and Implementations (Washington, DC: CLIR, 2012), http://www.clir.org/pubs/reports/pub155/pub155.pdf. 2. “Hackathon,” Wikipedia, 2014), http://en.wikipedia.org/wiki/Hackathon. 3. David Ward, James Hahn, and Lori Mestre, “Designing Mobile Technology to Enhance Library Space Use: Findings from an Undergraduate Student Competition,” Journal of Learning Spaces (forthcoming). http://www.clir.org/pubs/reports/pub155/pub155.pdf http://en.wikipedia.org/wiki/Hackathon 5485 ---- Microsoft Word - 5485-10835-5-CE.docx Negotiating  a  Text  Mining  License  for   Faculty  Researchers       Leslie  A.  Williams,     Lynne  M.  Fox,     Christophe  Roeder,     and  Lawrence  Hunter       INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  SEPTEMBER  2014           5     ABSTRACT   This  case  study  examines  strategies  used  to  leverage  the  library’s  existing  journal  licenses  to  obtain  a   large  collection  of  full-­‐text  journal  articles  in  XML  format,  the  right  to  text  mine  the  collection,  and   the  right  to  use  the  collection  and  the  data  mined  from  it  for  grant-­‐funded  research  to  develop   biomedical  natural  language  processing  (BNLP)  tools.  Researchers  attempted  to  obtain  content   directly  from  PubMed  Central  (PMC).  This  attempt  failed  because  of  limits  on  use  of  content  in  PMC.   Next,  researchers  and  their  library  liaison  attempted  to  obtain  content  from  contacts  in  the  technical   divisions  of  the  publishing  industry.  This  resulted  in  an  incomplete  research  data  set.  Researchers,  the   library  liaison,  and  the  acquisitions  librarian  then  collaborated  with  the  sales  and  technical  staff  of  a   major  science,  technology,  engineering,  and  medical  (STEM)  publisher  to  successfully  create  a   method  for  obtaining  XML  content  as  an  extension  of  the  library’s  typical  acquisition  process  for   electronic  resources.  Our  experience  led  us  to  realize  that  text-­‐mining  rights  of  full-­‐text  articles  in   XML  format  should  routinely  be  included  in  the  negotiation  of  the  library’s  licenses.   INTRODUCTION   The  University  of  Colorado  Anschutz  Medical  Campus  (CU  Anschutz)  is  the  only  academic  health   sciences  center  in  Colorado  and  the  largest  in  the  region.  Annually,  CU  Anschutz  educates  3,480   full-­‐time  students,  provides  care  during  1.5  million  patient  visits,  and  receives  more  than  $400   million  in  research  awards.1  CU  Anschutz  is  home  to  a  major  research  group  in  biomedical  natural   language  processing  (BNLP),  directed  by  Professor  Lawrence  Hunter.  Natural  language  processing   (also  known  as  NLP  or,  more  colloquially,  “text  mining”)  is  the  development  and  application  of   computer  programs  that  accept  human  language,  usually  in  the  form  of  documents,  as  input.  BNLP   takes  as  input  scientific  documents,  such  as  journal  articles  or  abstracts,  and  provides  useful     Leslie  A.  Williams  (leslie.williams@ucdenver.edu)  is  Head  of  Acquisitions,  Auraria  Library,   University  of  Colorado,  Denver.  Lynne  M.  Fox  (lynne.fox@ucdenver.edu)  is  Education  Librarian,   Health  Sciences  Library,  University  of  Colorado  Anschutz  Medical  Campus,  Aurora.     Chistophe  Roeder  is  a  researcher  at  the  School  of  Medicine,  University  of  Colorado,  Aurora.   Lawrence  Hunter  (larry.hunter@ucdenver.edu)  is  Professor,  School  of  Medicine,  University  of   Colorado,  Aurora.       NEGOTIATING  A  TEXT  MINING  LICENSE  FOR  FACULTY  RESEARCHERS  |  WILLIAMS  ET  AL   6   functionality,  such  as  information  retrieval  or  information  extraction.  CU  Anschutz’s  Health   Sciences  Library  (HSL)  supports  Hunter’s  research  group  by  providing  a  reference  and  instruction   librarian,  Lynne  Fox,  to  participate  on  the  research  team.  Hunter’s  group  is  working  on   computational  methods  for  knowledge-­‐based  analysis  of  genome-­‐scale  data.2  As  part  of  that  work,   his  group  is  devising  and  implementing  text-­‐mining  methods  that  extract  relevant  information   from  biomedical  journal  articles,  which  is  then  integrated  with  information  from  gene-­‐centric   databases  and  used  to  produce  a  visual  representation  of  all  of  the  published  knowledge  relevant   to  a  particular  data  set,  with  the  goal  of  identifying  new  explanatory  hypotheses.     Hunter’s  research  group  demonstrated  the  potential  of  integrating  data  and  research  information   in  a  visualization  to  further  new  discoveries  with  the  “Hanalyzer”   (http://hanalyzer.sourceforge.net).  Their  test  case  used  expression  data  from  mice  related  to   craniofacial  development  and  connected  that  data  to  PubMed  abstracts  using  gene  or  protein   names.  “Copying  of  content  that  is  subject  to  copyright  requires  the  clearing  of  rights  and   permissions  to  do  this.  For  these  reasons  the  body  of  text  that  is  most  often  used  by  researchers   for  text  mining  is  PubMed.”3  The  resulting  visualization  allowed  researchers  to  identify  four  genes   involved  in  mouse  craniofacial  development  that  had  not  previously  been  connected  to  tongue   development,  with  the  resulting  hypotheses  validated  by  subsequent  laboratory  experiment.4  The   knowledge-­‐based  analysis  tool  is  open  access.     To  continue  the  development  of  the  BNLP  tools  for  the  knowledge-­‐based  analysis  system,  three   things  were  required:  a  large  collection  of  full-­‐text  journal  articles  in  XML  format,  the  right  to  text   mine  the  collection,  and  the  right  to  store  and  use  the  collection  and  the  data  mined  from  it  for   grant-­‐funded  research.  The  larger  the  dataset,  the  more  robust  the  visual  representations  of  the   knowledge-­‐based  analysis  system,  so  Hunter’s  research  group  sought  to  compile  a  large  corpus  of   relevant  literature,  beginning  with  journal  articles.  The  text  that  is  mined  can  start  in  many   formats;  however,  XML  provides  a  computer-­‐ready  format  for  text  mining  because  it  is  structured   to  indicate  parts  of  the  document.  XML  is  “called  a  ‘markup  language’  because  it  uses  tags  to  mark   and  delineate  pieces  of  data.  The  ‘extensible’  part  means  that  the  tags  are  not  pre-­‐defined;  users   can  define  them  based  on  the  type  of  content  they  are  working  with.”5,6   XML  has  been  adopted  as  a  standard  for  content  creation  by  journal  publishers  because  it   provides  a  flexible  format  for  electronic  media.7  XML  allows  the  parts  of  a  journal  article  to  be   encoded  with  tags  that  identify  the  title,  author,  abstract,  and  other  sections,  allowing  the  article  to   be  transmitted  electronically  between  editor  and  publisher  and  to  be  easily  formatted  and   reproduced  into  different  versions  (e.g.,  print,  online).  XML  can  also  indicate  significant  content  in   the  text,  such  as  biological  terms  or  concepts.  XML  allowed  Hunter’s  research  group  to  write   computer  programs  that  can  make  sense  of  each  article  by  using  the  XML  tags  as  indicators  of   content  and  placement  within  the  article.  Products  have  been  developed,  such  as  LA-­‐PDFText,  to   extract  text  from  PDF  documents.8  However,  direct  access  to  XML  provides  more  useful  corpora     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  SEPTEMBER  2014   7   because  the  document  markup  saves  time  and  improves  the  accuracy  of  results  extracted  from   XML.     Once  the  sections  and  content  of  an  article  are  identified,  text-­‐mining  techniques  are  applied  to  the   article.  “Text  mining  extracts  meaning  from  text  in  the  form  of  concepts,  the  relationships  between   the  concepts  or  the  actions  performed  on  them  and  presents  them  as  facts  or  assertions.”9  Text-­‐ mining  techniques  can  be  applied  to  any  type  of  information  available  in  machine-­‐readable  format   (e.g.,  journal  article,  e-­‐books).  A  dataset  is  created  when  the  text-­‐mined  data  is  aggregated.  Using   BNLP  tools,  Hunter’s  research  group’s  knowledge-­‐based  analysis  system  analyzed  the  dataset  and   produced  visual  representations  of  the  knowledge  that  have  the  potential  to  lead  to  new   hypotheses.  Text  mining  and  BNLP  techniques  have  the  potential  to  build  relationships  between   the  knowledge  contained  in  the  scholarly  literature  that  lead  to  new  hypothesis  resulting  in  more   rapid  advances  in  science.   LITERATURE  REVIEW   Hunter  and  Cohen  explored  “literature  overload”  and  its  profoundly  negative  impact  on  discovery   and  innovation.10  With  an  estimated  growth  rate  of  3.1  percent  annually  for  PubMed  Central,  the   US  National  Library  of  Medicine’s  repository,  researchers  struggle  to  master  the  new  literature  of   their  field  using  traditional  methods.  Yet  much  of  the  advancement  of  biological  knowledge  relies   on  the  interplay  of  data  created  by  protein,  sequence,  and  expression  studies  and  the   communication  of  information  and  discoveries  through  nontextual  and  textual  databases  and   published  reports.11  How  do  biomedical  researchers  capitalize  on  and  integrate  the  wealth  of   information  available  in  the  scholarly  literature?  “The  common  ground  in  the  area  of  content   mining  is  in  the  shared  conviction  that  the  ever  increasing  overload  of  information  poses  an   absolute  need  for  better  and  faster  analysis  of  large  volumes  of  content  corpora,  preferably  by   machines.”12   BNLP  “encompasses  the  many  computational  tools  and  methods  that  take  human-­‐generated  texts   as  input,  generally  applied  to  tasks  such  as  information  retrieval,  document  classification,   information  extraction,  plagiarism  detection,  or  literature-­‐based  discovery.”13  BNLP  techniques   accomplish  many  tasks  usually  performed  manually  by  researchers,  including  enhancing  access   through  expanded  indexing  of  content  or  linkage  to  additional  information,  automating  reviews  of   the  literature,  discovering  new  insights,  and  extracting  meaning  from  text.14  Text  mining  is  just   one  tool  in  a  larger  BNLP  toolbox  of  resources  used  to  read,  reason,  and  report  findings  in  a  way   that  connects  data  to  information  sources  to  speed  discovery  of  new  knowledge.15  According  to   pioneering  text-­‐mining  researcher  Marti  Hearst,  “Text  Mining  is  the  discovery  by  computer  of  new,   previously  unknown  information,  by  automatically  extracting  information  from  different  written   resources.  A  key  element  is  the  linking  together  of  the  extracted  information  together  to  form  new   facts  or  new  hypotheses  to  be  explored  further  by  more  conventional  means  of     NEGOTIATING  A  TEXT  MINING  LICENSE  FOR  FACULTY  RESEARCHERS  |  WILLIAMS  ET  AL   8   experimentation.”16  Biomedical  text  mining  uses  “automated  methods  for  exploiting  the  enormous   amount  of  knowledge  available  in  the  biomedical  literature.”17   Recent  reports,  commissioned  by  private  and  governmental  interest  groups,  discuss  the  economic   and  societal  value  of  text  mining.18,19  The  McKinsey  Global  Institute  estimates  the  worth  of   harnessing  big  data  insights  in  US  health  care  at  $300  billion.  The  report  concludes  that  greater   sharing  of  data  for  text  mining  enables  “experimentation  to  discover  needs,  expose  variability,  and   improve  performance”  and  enhances  “replacing/supporting  human  decision  making  with   automated  algorithms,”  among  other  benefits.  Furthermore,  the  McKinsey  report  points  out  that   North  America  and  Europe  have  the  greatest  potential  to  take  advantage  of  innovation  because  of   a  well-­‐developed  infrastructure  and  large  stores  of  text  and  data  to  be  mined.20  However,  these   new  and  evolving  technologies  are  challenging  the  current  intellectual-­‐property  framework  as   noted  in  an  independent  report  by  Ian  Hargreaves,  “Digital  Opportunity:  A  Review  of  Intellectual   Property  and  Growth,”  resulting  in  lost  opportunity  for  innovation  and  economic  growth.21  In  “The   Value  and  Benefits  of  Text  Mining,”  JISC  finds  copyright  restrictions  limit  access  to  content  for  text   mining  in  the  biomedical  sciences  and  chemistry  and  that  costs  for  access  and  infrastructure   prevent  entry  into  text-­‐mining  research  for  many  noncommercial  organizations.22  Despite   copyright  barriers,  organizations  surveyed  pointed  out  the  risks  associated  with  failing  to  use  text-­‐ mining  techniques  to  further  research  include  financial  loss,  loss  of  prestige,  opportunity  lost,  and   the  brain  drain  of  having  talented  staff  seek  more  fulfilling  work.  JISC  explores  a  research  project’s   workflow  and  finds  a  lack  of  access  to  text  mining  delayed  the  publication  of  an  important  medical   research  study  by  many  months,  or  the  time  the  research  team  spent  analyzing  and  summarizing   relevant  research.23  Both  reports  advocate  an  exception  to  intellectual  property  rights  for   noncommercial  text-­‐mining  research  to  balance  the  protection  of  intellectual  property  with  the   access  needs  of  researchers.  A  centrally  maintained  repository  for  text  mining  has  been  proposed,   although  its  creation  would  face  significant  challenges.24   Scholarly  journal  content  is  the  raw  “ore”  for  text  mining  and  BNLP.  The  lack  of  access  to  this  ore   creates  a  bottleneck  for  researchers.  “New  business  models  for  supporting  text  mining  within  the   scholarly  publishing  community  are  being  explored;  however,  evidence  suggests  that  in  some   cases  lack  of  understanding  of  the  potential  is  hampering  innovation.”25  BNLP  and  machine-­‐ learning  research  products  are  more  accurate  and  complete  when  more  content  is  available  for   text  mining.  “Knowledge  discovery  is  the  search  for  hidden  information.  .  .  .  Hence  the  need  is  to   start  looking  as  widely  as  possible  in  the  largest  set  of  content  sources  possible.”26  However,  as   noted  in  a  Nature  article,  “The  question  is  how  to  make  progress  today  when  much  research  lies   behind  subscription  firewalls  and  even  ‘open’  content  does  not  always  come  with  a  text-­‐mining   license.”27  Large  scientific  publishers  are  facing  economic  challenges,  and  potentially  diminished   economic  returns,  as  the  tension  over  the  right  to  use  licensed  content  heats  up.  Nature,  the   flagship  of  a  major  scientific  publisher,  predicted  “trouble  at  the  text  mine”  if  researchers  lack   access  to  the  contents  of  research  publications.28  And  a  2012  investment  report  predicted  slower     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  SEPTEMBER  2014   9   earnings  growth  for  Elsevier,  the  largest  STEM  publisher,  if  it  blocked  access  to  licensed  content   by  text-­‐mining  researchers.  The  review  predicted,  “If  the  academic  community  were  to  conclude   that  the  commercial  terms  imposed  by  Elsevier  are  also  hindering  the  progress  of  science  or  their   ability  to  efficiently  perform  research,  the  risk  of  a  further  escalation  of  the  acrimony  [between   Elsevier  and  the  academic  community]  rises  substantially.”29  With  open  access  alternatives   proliferating,  including  making  federally  funded  research  freely  accessible,  STEM  publishers  are   under  increased  pressure  to  respond  to  market  forces.  “The  greatest  challenge  for  publishers  is  to   create  an  infrastructure  that  makes  their  content  more  machine-­‐accessible  and  that  also  supports   all  that  text-­‐miners  or  computational  linguists  might  want  to  do  with  the  content.”30  On  the  other   end  of  the  spectrum,  researchers  are  struggling  to  gain  legal  access  to  as  much  content  as  possible.     Academic  libraries  have  long  excelled  at  serving  as  the  bridge  between  researchers  and  publishers   and  can  expand  their  roles  to  include  navigating  the  uncharted  territory  of  obtaining  text-­‐mining   rights  for  content.  Increasing  the  library’s  role  in  text  mining  and  other  associated  BNLP  and   machine-­‐learning  methods  offers  tremendous  potential  for  greater  institutional  relevance  and   service  to  researchers.31  At  CU  Anschutz’s  HSL,  Fox  and  Williams,  an  acquisitions  librarian,  found   natural  opportunities  for  collaboration  including  negotiating  rights  to  content  more  efficiently   through  expanded  licensing  arrangements  and  facilitating  the  secure  transfer  and  storage  of  data   to  protect  researchers  and  publishers.   METHOD   Hunter  and  Fox  began  working  in  2011  to  obtain  a  large  corpus  of  biomedical  journal  articles  in   XML  format  to  create  a  body  of  text  as  comprehensive  as  possible  for  BNLP  experimentation  that   would  further  advance  Hunter’s  research  group’s  knowledge-­‐based  analysis  system.  The  desired   result  was  an  aggregated  collection  obtained  from  multiple  publishers,  stored  locally,  and   available  on  demand  for  the  knowledge-­‐based  analysis  system  to  process.  Hunter  and  Fox  soon   realized  that  “the  process  of  obtaining  or  granting  permissions  for  text  mining  is  daunting  for   researchers  and  publishers  alike.  Researchers  must  identify  the  publishers  and  discover  the   method  of  obtaining  permission  for  each  publisher.  Most  publishers  currently  consider  mining   requests  on  a  case  by  case  basis.”32  They  pursued  a  multifaceted  strategy  to  build  a  robust   collection  and  to  determine  which  strategy  proved  most  fruitful  because,  during  a  grant  review,   National  Library  of  Medicine  staff  wanted  evidence  of  access  to  an  XML  collection  before  awarding   a  grant.     Fox  first  approached  two  open-­‐access  publishers,  BioMed  Central  (BMC)  and  Public  Library  of   Science  (PLoS),  to  request  access  to  XML  text  from  journals  in  the  subjects  of  life  and  biomedical   science.  Fox  had  existing  contacts  within  both  organizations  and  an  agreement  was  reached  to   obtain  XML  journal  articles.  Letters  of  understanding  were  quickly  obtained  as  both  publishers   were  excited  about  exploring  new  ways  for  their  research  publications  to  be  accessed  and  the   potential  to  increase  the  use  of  their  journals.  Possible  journal  titles  were  identified  and     NEGOTIATING  A  TEXT  MINING  LICENSE  FOR  FACULTY  RESEARCHERS  |  WILLIAMS  ET  AL   10   arrangements  were  made  to  transfer  and  store  files  locally  from  BMC  and  PLoS  to  Hunter’s   research  group.   Hunter  approached  staff  at  PubMedCentral  (PMC)  to  request  access  to  articles  and  discovered   they  could  only  be  made  available  with  permission  from  publishers.  A  Wiley  research  and  product   development  executive  granted  Hunter  permission  to  access  Wiley  articles  in  PMC.  The  Wiley   executive  was  interested  in  learning  what  impact  text  mining  might  have  on  Wiley  products.   Hunter’s  research  group  planned  to  transfer  Document  Type  Definition  (DTD)  format  files  from   PMC.  Unfortunately,  when  Hunter’s  research  group  staff  requested  file-­‐transfer  assistance  from   PMC,  no  PMC  staff  were  available  to  provide  the  technical  help  needed  because  of  budget   reductions.  PMC  staff  could  accurately  evaluate  their  time  commitment  because  they  had  a  clear   understanding  of  the  XML  access  and  transfer  process,  and  knew  they  could  not  allocate  resources   to  the  effort.     Hunter  then  began  to  leverage  his  professional  network  connections  to  obtain  content  from  a   major  STEM  vendor.  Research  and  development  division  directors  within  the  company  were   familiar  with  the  work  of  Hunter’s  research  group  and  were  willing  to  provide  assistance  in   acquiring  content.  However,  when  the  research  group  began  to  perform  research  using  this  data,   further  investigation  determined  that  the  contents  were  not  adequate  for  the  research.  Follow-­‐up   between  Fox,  the  research  group,  and  the  vendor  revealed  that  the  group’s  needs  were  not   communicated  in  the  vendor’s  vernacular,  resulting  in  the  group  not  clearly  understanding  what   content  the  vendor  was  providing.  This  disconnect  occurred  in  the  communication  flow  from  the   research  group  to  the  vendor’s  research  and  development  staff  to  the  vendor’s  sales  staff  (who   identified  the  content  to  be  shared).  It  was  a  like  a  game  of  telephone  tag.   After  the  initial  strategies  produced  mixed  results,  Hunter’s  research  group  hypothesized  that  they   could  harvest  materials  through  HSL’s  journal  subscriptions.  Hunter’s  research  group  attempted   to  crawl  and  download  journal  content  being  provided  by  HSL’s  subscription  to  a  major  chemistry   publisher.  Since  publishers  monitor  for  web  crawling  of  their  content,  the  chemistry  publisher   became  aware  of  the  unusual  download  activity,  turned  off  campus  access,  and  notified  the  library   that  there  may  have  been  an  unauthorized  attempt  to  access  the  publisher’s  content.  Researchers   are  often  unaware  of  complex  copyright  and  license  compliance  requirements.  In  fact,  librarians   sometimes  become  aware  of  text-­‐mining  projects  only  after  automated  downloads  of  licensed   content  prompt  vendors  to  shut  off  campus  access.33  Libraries  can  prevent  interruption  of   campus-­‐wide  access  to  important  resources  by  suggesting  more  effective  content-­‐access  methods.     Williams,  an  HSL  acquisitions  librarian,  investigated  the  interruption  in  access  and  discovered   Hunter’s  research  group’s  efforts  to  obtain  journal  articles  to  text  mine  for  their  research.  She   offered  to  use  her  expertise  in  acquiring  content  to  help  Hunter’s  research  group  obtain  the   dataset  needed  for  their  research.  Initially,  Hunter  and  Fox  had  not  included  an  acquisitions     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  SEPTEMBER  2014   11   librarian  because  that  position  was  vacant.  After  Williams  became  involved,  the  effort  focused  on   licensing  content  via  negotiation  and  licensing  with  individual  publishers.   RESULTS   “There  are  a  large  number  of  resources  to  help  the  researcher  who  is  interested  in  doing  text   mining”  but  “no  similar  guide  to  obtaining  the  necessary  rights  and  permissions  for  the  content   that  is  needed.”34  At  CU  Anschutz,  this  vacuum  was  filled  by  Williams,  who  is  knowledgeable  about   the  acquisition  of  content,  and  Fox,  who  is  knowledgeable  about  Hunter’s  research,  serving  as  the   bridge  between  the  research  group  and  the  STEM  publisher.  By  working  together  and  capitalizing   on  each  other’s  expertise,  Williams  and  Fox  were  able  to  facilitate  the  collaboration  that  developed   a  framework  for  purchasing  a  large  collection  of  full-­‐text  journal  articles  in  XML  format.  As  the   collaboration  progressed,  three  major  elements  to  the  framework  surfaced,  including  a  pricing   model,  a  license  agreement,  and  the  dataset  and  delivery  mechanism.   Researchers  interested  in  legally  text  mining  journal  content  often  find  themselves  having  to   execute  a  license  agreement  and  pay  a  fee.35  What  should  the  fee  be  based  on  to  create  a  fair  and   equitable  pricing  model?  Publishers  establish  pricing  for  library  clients  on  the  basis  of  not  only  the   content,  but  many  valued-­‐added  services  such  as  the  breath  of  titles  aggregated  and  made   available  for  purchase  in  a  single  product,  the  creation  of  a  platform  to  access  the  journal  titles,  the   indexing  and  searching  functionality  within  the  platform,  and  the  production  of  easily  readable   PDF  versions  of  articles.  These  value-­‐added  services  are  not  required  for  text-­‐mining  endeavors.   Rather,  the  product  is  the  raw  journal  content  that  has  been  peer-­‐reviewed,  edited,  and  formatted   in  XML  that  precedes  the  addition  of  value-­‐added  services.  Therefore  the  pricing  should  not  be   equivalent  to  the  cost  of  a  library’s  subscription  to  a  journal  or  package  of  journals.  In  the  end,   after  lengthy  negotiations,  the  pricing  model  for  the  Hunter’s  research  group  collection  of  full-­‐text   journal  articles  in  XML  format  consisted  of   • a  cost  per  article;   • a  minimum  purchase  of  400,000  articles  for  one  sum  on  the  basis  of  the  cost  per  article;   • an  annual  subscription  for  the  minimum  purchase  of  400,000;   • the  ability  to  subscribe  to  additional  articles  in  excess  of  400,000  in  quantities  determined   by  Hunter’s  research  group;   • a  volume  discount  off  the  per  article  price  for  every  article  purchased  in  excess  of  400,000;   • inclusion  of  the  core  journal  titles  purchased  via  the  library’s  subscription  at  no  charge;     • inclusion  of  the  core  journal  titles  purchased  by  the  University  of  Colorado  Boulder  at  no   charge  because  of  Hunter’s  joint  appointment  at  both  CU  Boulder  and  CUAnschutz   campuses;  and   • a  requirement  for  HSL  to  maintain  its  subscription  to  the  vendor’s  product  at  its  current   level.     NEGOTIATING  A  TEXT  MINING  LICENSE  FOR  FACULTY  RESEARCHERS  |  WILLIAMS  ET  AL   12   “Where  institutions  already  have  existing  contracts  to  access  particular  academic  publications,  it  is   often  unclear  whether  text  mining  is  a  permissible  use.”36  From  the  beginning,  common  ground   was  easily  found  on  the  subject  of  core  titles  purchased  by  the  two  campuses’  libraries.  Core  titles   are  typically  those  journals  that  libraries  pay  a  premium  for  to  obtain  perpetual  rights  to  the   content.  Most  of  the  negotiation  focused  on  access  titles,  which  are  journals  that  libraries  pay  a   nominal  fee  to  have  access  to  without  any  perpetual  rights  included.   The  final  challenge  related  to  cost  was  determining  how  to  process  and  pay  for  the  product.   Hunter’s  research  group  operates  on  major  grant  funding  from  federal  government  agencies.  The   University  of  Colorado  requires  additional  levels  of  internal  controls  and  approvals  to  expend   grant  funds  as  well  as  to  track  expenditures  to  meet  reporting  requirements  of  the  funding   agencies.  Also,  grant  funding  of  this  type  often  spans  multiple  fiscal  years  whereas  the  library’s   budget  operates  on  a  single  fiscal  year  at  a  time.  Therefore  it  was  decided  that  Hunter  would   handle  payment  directly  rather  than  transferring  funds  to  HSL  to  make  payment  on  their  behalf.   “Libraries  as  the  licensee  of  publishers’  content  are  from  that  perspective  interested  in  the  legal   framework  around  content  mining.”37  During  price  negotiations,  Williams  recommended   negotiating  a  license  agreement  similar  to  those  libraries  and  publishers  execute  for  the  purchases   of  journal  packages.  A  license  agreement  would  offer  a  level  of  protection  for  all  parties  involved   while  clearly  outlining  the  parameters  of  the  transaction.  Hunter  and  the  STEM  publisher  readily   agreed.     The  final  license  agreement  contained  ten  sections  including  definitions;  subscription;  obligations;   use  of  names;  financial  arrangement;  term;  proprietary  rights;  warranty,  indemnity,  disclaimer,   and  limitation  of  liability;  and  miscellaneous.  While  the  license  agreement  was  similar  to   traditional  license  agreements  between  libraries  and  publishers  for  journal  subscriptions,  there   were  some  notable  differences.  First,  in  the  definitions  section,  users  were  defined  and  limited  to   Hunter  and  his  research  team.  This  limited  the  users  to  a  specific  group  of  individuals  unlike   typical  library–publisher  license  agreements  that  license  content  for  the  entire  campus.     Second,  the  subscription  section  covered  how  the  data  can  be  used  in  detail  and  allowed  the   dataset  to  be  installed  locally.  This  was  important  to  make  the  dataset  available  on  demand  to   researchers;  to  allow  researchers  to  manipulate,  segment,  and  store  the  data  in  multiple  ways   instead  of  as  one  large  dataset;  and  to  allow  the  researchers  the  ability  to  access  and  use  the  large   dataset  efficiently  and  quickly.  Because  the  dataset  would  be  manipulated  so  extensively,  the   license  gave  permission  to  create  a  backup  copy  and  store  it  separately.  The  subscription  section   also  required  the  dissemination  of  the  research  results  to  occur  in  such  a  way  that  the  dataset   could  not  be  extracted  and  used  by  others.  This  was  significant  because  Prof.  Hunter  releases  the   BNLP  software  applications  they  develop  as  open  source  software  so  that  the  applications  can  be   open  to  peer  review  and  attempts  at  reproduction.  Ideally,  someone  could  download  the  open   source  software,  obtain  the  same  corpus  as  input,  and  see  the  same  output  mentioned  in  the  paper.     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  SEPTEMBER  2014   13   Third,  the  obligations  section  was  radically  different  from  traditional  library–publisher  license   agreements  because  even  though  “publishers  are  still  working  out  how  to  take  advantage  of  text   mining  .  .  .  none  wants  to  miss  out  on  the  potential  commercial  value.”38  This  interest  prompted   the  crafting  of  an  atypical  obligations  section  in  the  license  agreement  that  included  an  option  for   Hunter  to  collaborate  with  the  STEM  publisher  to  develop  and  showcase  an  application  on  the   vendor’s  website  and  included  a  commitment  for  Hunter  to  meet  quarterly  with  the  vendor’s   representatives  to  discuss  advances  in  research.  Furthermore,  the  obligations  section  specified  a   request  for  Hunter  and  the  University  of  Colorado  to  recognize  the  vendor  where  appropriate  and   a  right  for  the  STEM  publisher  to  use  any  research  software  application  released  as  open  source.   Up  to  this  point,  Williams  had  been  collaborating  with  the  University  of  Colorado  in-­‐house  counsel   to  review  and  revise  the  license  agreement.  When  the  STEM  publisher  requested  the  right  to  use   the  software  application,  Williams  was  required  to  submit  the  license  agreement  to  the  University   of  Colorado‘s  Technology  Transfer  Office  for  review  and  approval.  Approval  was  prompt  in  coming,   primarily  because  Prof.  Hunter  releases  his  software  applications  as  open  source.   Fourth,  the  license  agreement  included  a  “use  of  names”  section,  which  is  not  found  in  typical   library–publisher  agreements.  This  section  authorized  the  vendor  to  use  factual  information   drawn  from  a  case  study  in  market-­‐facing  materials  and  a  requirement  that  the  vendor  request   written  consent,  as  required  from  the  University  of  Colorado  System,  for  information  in  the  case   study  to  be  released  for  market  facing  materials.  The  vendor  also  agreed  not  to  use  the  University   of  Colorado’s  trademark,  service  mark,  trade  name,  copyright,  or  symbol  without  prior  written   consent  and  to  use  these  items  in  accordance  with  the  University  of  Colorado  System’s  usage   guidelines.     Fifth,  the  vendor  agreed  not  to  represent  in  any  way  that  the  University  of  Colorado  or  its   employees  endorse  the  vendor’s  products  or  services.  This  is  extremely  important  because  the   University  of  Colorado’s  controller  does  not  allow  product  endorsements  because  of  the  federal   unrelated  business  income  tax.  Exempt  organizations  are  required  to  pay  this  tax  if  engaged  in   activities  that  are  regularly  occurring  business  activities  that  do  not  further  the  purpose  of  the   exempt  organization.39     Finally,  the  license  agreement  stated  all  items  would  be  provided  in  XML  format  with  a  unique   Digital  Object  Identifier  (DOI)  number,  essential  for  linking  XML  content  to  real-­‐world  documents   that  researchers  using  Hunter’s  research  group’s  knowledge-­‐based  analysis  system  would  want  to   access.   After  a  pricing  model  and  license  agreement  were  finalized,  the  focus  turned  to  the  last  major   element  of  the  framework:  the  dataset  and  delivery  mechanism.  Elements  such  as  quality  of  the   corpora  contents,  file  transfer  time,  and  storage  capacity  are  all  important.  In  other  words,  “the   need  is  to  start  looking  as  widely  as  possible  in  the  largest  set  of  content  sources  possible.  This   need  is  balanced  by  the  practicalities  of  dealing  with  large  amounts  of  information,  so  a  choice     NEGOTIATING  A  TEXT  MINING  LICENSE  FOR  FACULTY  RESEARCHERS  |  WILLIAMS  ET  AL   14   needs  to  be  made  of  which  body  of  content  will  most  likely  prove  fruitful  for  discovery.  Text  mines   are  dug  where  there  is  the  best  chance  of  finding  something  valuable.”40   When  building  an  XML  corpora  for  research,  Hunter’s  research  group  wanted  to  maximize  their   return  on  investment,  so  a  pilot  download  was  conducted  to  assure  that  the  most  beneficial   content  could  be  transferred  smoothly  to  a  local  server.  “Permissions  and  licensing  is  only  a  part   of  what  is  needed  to  support  text  mining.  The  content  that  is  to  be  mined  must  be  made  available   in  a  way  that  is  convenient  for  the  researcher  and  the  publisher  alike.”41  This  pilot  phase  allowed   Hunter’s  researchers  and  the  vendor’s  technical  personnel  to  clarify  the  requirements  of  the   dataset  and  to  efficiently  deliver  and  accurately  invoice  for  content.  One  of  the  initial  obstacles   was  that  a  filter  for  the  delivery  mechanism  didn’t  exist.  Letters  to  the  editor,  errata,  and  more   were  all  counted  as  an  article.  Hunter’s  researchers  quickly  determined  that  research  articles  were   most  important  at  this  point  in  the  development  of  the  knowledge-­‐based  analysis  system.  How   should  a  useful  or  minable  article  be  defined—by  its  length,  by  XML  tags  indicating  content  type,  or   by  some  other  criteria?  Roeder,  a  software  engineer,  used  article  attributes  and  characteristics   embedded  in  XML  tags  to  define  an  article  as  including  all  of  the  following:     • an  abstract   • a  body   • at  least  40  lines  of  text   • none  of  the  following  tags:  corrigendum,  erratum,  book  review,  editorial,  introduction,   preface,  correspondence,  or  letter  to  the  editor   In  the  end,  Hunter’s  research  group  and  the  vendor  agreed  to  transmit  everything  and  allow  the   group  a  fifteen  business  days  to  evaluate  the  content.  The  research  group  would  then  notify  the   vendor  of  how  many  “articles”  were  received.  This  process  would  continue  until  400,000  “articles”   were  received.     After  spending  more  than  a  year  working  to  develop  a  structure  to  purchase  a  large  corpus  of   journal  articles  to  text  mine.  Just  as  Hunter’s  research  group  was  ready  to  execute  the  license,   remit  payment,  and  receive  the  articles,  their  federal  grant  expired,  stalling  the  purchase.  In   retrospect,  this  unfortunate  development  was  the  catalyst  for  a  shift  in  philosophy  and  strategy  for   the  researchers  and  librarians  at  CU  Anschutz.   DISCUSSION   XML  text-­‐mining  efforts  will  continue  to  expand,  leading  to  increased  demand  on  libraries  and   librarians  to  play  a  role  in  securing  content.  Publishers,  researchers,  and  libraries  see  the  potential   commercial  and  research  value  for  text  mining  journal  content  and  are  driving  the  rapid  evolution   of  this  arena,  in  part,  because  “there  is  increasing  demand  from  public  and  charitable  funders  that   maximum  value  is  leveraged  from  their  substantial  investment  and  this  includes  making  outputs     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  SEPTEMBER  2014   15   accessible  and  usable.  .  .  .  Text  mining  offers  the  potential  for  fuller  use  of  the  existing  publicly-­‐ funded  research  base.”42     However,  publishers  identified  two  main  barriers  to  text  mining  from  their  perspective—lack  of   standardization  in  content  formats  and  in  access  terms—and  concede  that  “publishers  should   develop  shared  access  terms  for  research-­‐driven  mining  requests.”43  From  the  researcher  and   librarian  perspective,  there  are  many  barriers  and  costs  involved  including  “access  rights  to  text-­‐ minable  materials,  transaction  costs  (participation  in  text  mining),  entry  (setting  up  text  mining),   staff  and  underlying  infrastructure.  Currently,  the  most  significant  costs  are  transaction  costs  and   entry  costs.”44  The  significant  transaction  costs  stem  from  the  time  it  takes  to  navigate  the   complexity  of  negotiating  and  complying  with  license  agreements  for  journal  content.  The  various   types  of  “costs  are  currently  borne  by  researchers  and  institutions,  and  are  a  strong  hindrance  to   text  mining  uptake.  These  could  be  reduced  if  uncertainty  is  reduced,  more  common  and   straightforward  procedures  are  adopted  across  the  board  by  license  holders,  and  appropriate   solutions  for  orphaned  works  are  adopted.  However,  the  transaction  costs  will  still  be  significant  if   individual  rights  holders  each  adopt  different  licensing  solutions  and  barriers  inhibiting  uptake   will  remain.”45   In  a  survey  of  libraries,  findings  indicated  that  librarians  anticipate  a  new  role  as  facilitators   between  researchers  and  publishers  to  enable  text  mining.46  Librarians  are  a  natural  fit  for  this   role  because  they  already  have  expertise  in  navigating  copyright,  requesting  copyright   permissions,  and  negotiating  license  agreements  for  journal  content.  “Advice  and  guidance  should   be  developed  to  help  researchers  get  started  with  text  mining.  This  should  include:  when   permission  is  needed;  what  to  request;  how  best  to  explain  intended  work  and  how  to  describe   the  benefits  of  research  and  copyright  owners.”47   After  their  experience  with  developing  a  framework  to  license  and  purchase  a  large  corpora  of   journal  articles  in  XML  format  to  be  text  mined,  Fox  and  Williams  came  to  believe  that,  in  addition   to  providing  copyright  expertise,  librarians  should  assist  in  reducing  transaction  costs  by   developing  model  license  clauses  for  text  mining  and  routinely  negotiating  for  these  rights  when   the  library  purchases  journals  and  other  types  of  content.  Adopting  this  philosophy  and  strategy   led  Williams  and  Fox  to  successfully  advocate  for  the  inclusion  of  a  text-­‐mining  clause  in  the   license  agreement  for  the  STEM  publisher  in  this  case  study  at  the  time  of  the  library’s   subscription  renewal.  This  occurred  at  a  regional  academic  consortium  level,  making  text  mining   easier  at  fourteen  academic  institutions.  Furthermore,  the  University  of  Colorado  Libraries,  which   includes  five  libraries  on  four  campuses,  is  now  working  on  drafting  a  model  clause  to  use  when   purchasing  journal  content  as  the  University  of  Colorado  System  and  to  put  forth  for  consideration   by  the  consortiums  that  facilitate  the  purchase  of  our  major  journal  packages.  Given  that   incorporating  text  mining  clauses  into  library–publisher  license  agreements  for  scholarly  journals   is  in  its  infancy,  there  are  few  resources  available  to  assist  librarians  adopting  this  new  role.  Model   clauses  include  the  following:     NEGOTIATING  A  TEXT  MINING  LICENSE  FOR  FACULTY  RESEARCHERS  |  WILLIAMS  ET  AL   16   • British  Columbia  Electronic  Library  Network’s  Model  License  Agreement48   o Clause  3.11.  “Data  and  Text  Mining.  Members  and  Authorized  Users  may  conduct   research  employing  data  or  text  mining  of  the  Licensed  Materials  and  disseminate   results  publicly  for  non-­‐commercial  purposes.”     • California  Digital  Library’s  Standard  License  Agreement49   o Section  IV.  Authorized  Use  of  Licensed  Materials.  “Text  Mining.  Authorized  Users   may  use  the  licensed  material  to  perform  and  engage  in  text  mining/data  mining   activities  for  legitimate  academic  research  and  other  educational  purposes.”     • JISC’s  Model  License  for  Journals50   o Clause  3.1.6.8.  “Use  the  Licensed  Material  to  perform  and  engage  in  text   mining/data  mining  activities  for  academic  research  and  other  Educational   Purposes  and  allow  Authorised  Users  to  mount,  load  and  integrate  the  results  on  a   Secure  Network  and  use  the  results  in  accordance  with  this  License.”   o Clause  9.3.  “For  the  avoidance  of  doubt,  the  Publisher  hereby  acknowledges  that  any   database  rights  created  by  Authorised  Users  as  a  result  of  textmining/datamining  of   the  Licensed  Material  as  referred  to  in  Clause  3.1.6.8  shall  be  the  property  of  the   Institution.”   Publishers  are  also  beginning  to  break  down  barriers  perhaps,  in  part,  because  of  the  sentiment   that  “privately  erected  barriers  by  copyright  holders  that  restrict  text  mining  of  the  research  base   could  be  increasingly  regarded  as  inequitable  or  unreasonable  since  the  copyright  holders  have   borne  only  a  small  proportion  of  the  costs  involved  in  the  overall  process;  furthermore,  they  do   not  have  rights  or  ownership  of  the  inherent  facts  or  ideas  within  the  research  base.”51  BioMed   Central  and  PLoS  both  offer  services  that  allow  researchers  to  access  XML  text  collections.  BioMed   Central  makes  content  readily  accessible  by  providing  a  website  for  bulk  download  of  XML  text.52   PLoS  requires  contact  with  a  staff  member  for  download  of  XML  text.53  In  December  2013,   Elsevier  also  announced  that  it  would  create  a  “big  data”  center  at  the  University  College  London   to  allow  researchers  to  work  in  partnership  with  Mendeley,  a  knowledge  management  and   citation  application  now  owned  by  Elsevier.  While  this  is  a  positive  step,  the  partnership  does  not   appear  to  make  the  data  available  to  research  groups  beyond  the  University  College  London.54     However,  there  is  still  a  long  way  to  go  before  publishers  and  librarians  are  routinely   collaborating  on  opening  up  the  scholarly  literature  to  be  mined.  For  example,  a  2012  Nature   editorial  states  “Nature  Publishing  Group,  which  also  includes  this  journal,  says  that  it  does  not   charge  subscribers  to  mine  content,  subject  to  contract.”55  Repeated  attempts  by  Williams  to   obtain  more  information  from  Nature  Publishing  Group  and  a  copy  of  the  contract  have  proved   fruitless.     In  January  2014,  Elsevier  announced  that  “researchers  at  academic  institutions  can  use  Elsevier’s   online  interface  (API)  to  batch-­‐download  documents  in  computer-­‐readable  XML  format”  after     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  SEPTEMBER  2014   17   signing  a  legal  agreement.  Elsevier  will  limit  researchers  to  accessing  10,000  articles  per  week.56,57   For  small-­‐scale  projects  with  a  narrow  scope,  this  limit  will  suffice.  For  example,  mining  the   literature  for  a  specific  gene  that  plays  a  known  role  in  a  disease  could  require  a  text  set  under   30,000  articles.  At  Elsevier’s  current  rate  of  article  transfer,  a  30,000  article  text  set  could  be   created  in  roughly  three  weeks.  However,  for  large-­‐scale  projects  such  as  Hunter’s  research   group’s  knowledge-­‐based  analysis  system  that  require  a  text  set  of  400,000  articles  (or  much   more,  if  not  limited  by  budget  constraints),  nearly  a  year  of  time  would  be  required  to  build  the   corpora.  Time  is  one  of  the  most  valuable  commodities  in  computational  biology.  The  elapsed  time   required  to  transfer  articles  at  the  rate  of  10,000  articles  per  week  represents  a  bottleneck  that   most  grant-­‐funded  research  cannot  afford.  Speed  of  transfer  will  also  be  a  factor.  Researchers   require  flexibility  to  maximize  available  central  processing  unit  (CPU)  hours  because  documents   can  take  from  a  few  seconds  to  a  full  minute  each  to  transfer  to  the  storage  destination.   Monopolizing  peak  hours  in  high  performance  computing  (HPC)  settings  may  mean  that   computing  power  is  not  available  for  other  tasks,  although  many  HPC  centers  have  learned  to   allocate  CPU  use  more  efficiently  to  high  volumes.  Furthermore,  the  terms  and  conditions  set  by   Elsevier  for  output  limits  excerpting  from  the  original  text  to  200  characters.58  This  is  roughly   equivalent  to  two  lines  of  text  or  approximately  forty  words.  This  may  be  insufficient  to  capture   important  biological  relationships  necessary  to  evaluate  the  relevance  of  the  article  to  the   research  being  represented  by  the  Hanalyzer  knowledge-­‐based  analysis  system.     CONCLUSION   Forging  a  partnership  between  a  library,  a  research  lab,  and  a  major  STEM  vendor  requires   flexibility,  patience,  and  persistence.  Our  experience  strengthened  the  existing  relationship   between  the  library  and  the  research  lab  and  demonstrated  the  library’s  willingness  and  ability  to   support  faculty  research  in  a  nontraditional  method.  Librarians  are  encouraged  to  advocate  for   the  inclusion  of  text-­‐mining  rights  in  their  library’s  license  agreements  for  electronic  resources.   What  the  future  holds  for  publishers,  researchers,  and  libraries  involved  in  text  mining  remains  to   be  seen.  However,  what  is  certain  is  that  without  cooperation  between  publishers,  researchers,   and  libraries,  breaking  down  the  existing  barriers  and  achieving  standards  for  content  formats   and  access  terms  will  remain  elusive.   REFERENCES     1.     University  of  Colorado  Anschutz  Medical  Campus,  University  of  Colorado  Anschutz  Medical   Campus  Quick  Facts,  2013,   http://www.ucdenver.edu/about/WhoWeAre/Documents/CUAnschutz_facts_041613.pdf.     NEGOTIATING  A  TEXT  MINING  LICENSE  FOR  FACULTY  RESEARCHERS  |  WILLIAMS  ET  AL   18     2.     Sonia  M.  Leach  et  al.,  “Biomedical  Discovery  Acceleration,  with  Applications  to  Craniofacial   Development,”  PLoS  Computational  Biology  5,  no.  3  (2009):  1–19,   http://dx.doi.org/10.1371/journal.pcbi.1000215.   3.     Jonathan  Clark,  Text  Mining  and  Scholarly  Publishing  (Publishing  Research  Consortium,  2013).   4.     Corie  Lok,  “Literature  Mining:  Speed  Reading,”  Nature  463  (2010):  416–18,   http://dx.doi.org/10.1038/463416a.   5.     Hong-­‐Jie  Dai,  Yen-­‐Ching  Chang,  Richard  Tzong-­‐Han  Tsai,  Wen-­‐Lian  Hsu,  "New  Challenges  for   Biological  Text-­‐Mining  in  the  Next  Decade,"  Journal  of  Computer  Science  and  Technology  25,   no.1  (2010):  169-­‐179,  doi:  10.1007/s11390-­‐010-­‐9313-­‐5.     6.     Anne  Hoekman,  “Journal  Publishing  Technologies:  XML,”   http://www.msu.edu/~hoekmana/WRA%20420/ISMTE%20article.pdf.   7.     Alex  Brown,  "XML  in  Serial  Publishing:  Past,  Present  and  Future,"  OCLC  Systems  &  Services  19,   no.  4,  (2003):149-­‐154,  doi:  10.1108/10650750310698775.   8.     Cartic  Ramakrishnan  et  al.,  “Layout-­‐Aware  Text  Extraction  from  Full-­‐Text  PDF  of  Scientific   Articles,”  Source  Code  for  Biology  and  Medicine  7,  no.  7  (2012),   http://dx.doi.org/10.1186/1751-­‐0473-­‐7-­‐7.   9.     Ibid.   10.    Lawrence  Hunter  and  K.  Bretonnel  Cohen,  “Biomedical  Language  Processing:  Perspective   What’s  Beyond  PubMed?”  Molecular  Cell  21,  no.  5,  (2006):  589–94.   11.    Martin  Krallinger,  Alfonso  Valencia,  and  Lynette  Hirschman,  “Linking  Genes  to  Literature:  Text   Mining,  Information  Extraction,  and  Retrieval  Applications  for  Biology,”  Genome  Biology  9,   supplement  2  (2008):  S8.1–S8.14,  http://dx.doi.org/10.1186/gb-­‐2008-­‐9-­‐S2-­‐S8.   12.    Eefke  Smit  and  Maurits  van  der  Graaf,  “Journal  Article  Mining:  The  Scholarly  Publishers’   Perspective,”  Learned  Publishing  25,  no.  1  (2012):  35–46,   http://dx.doi.org/10.1087/20120106.   13.    Hunter  and  Cohen,  “Biomedical  Language  Processing,”  589.   14.    Clark,  Text  Mining  and  Scholarly  Publishing.   15.    Leach  et  al.,  “Biomedical  Discovery  Acceleration.”   16.    Marti  Hearst,  “What  is  Text  Mining?”  October  17,  2003,   http://people.ischool.berkeley.edu/~hearst/text-­‐mining.html.     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  SEPTEMBER  2014   19     17.    K.  Bretonnel  Cohen  and  Lawrence  Hunter,  “Getting  Started  in  Text  Mining,”  PLoS   Computational  Biology  4,  no.  1  (2008):  1–3,  http://dx.doi.org/10.1371/journal.pcbi/0040020.   18.    JISC,  “The  Model  NESLi2  Licence  for  Journals,”  2013,  http://www.jisc-­‐collections.ac.uk/Help-­‐ and-­‐information/How-­‐Model-­‐Licences-­‐work/NESLi2-­‐Model-­‐Licence-­‐/.   19.    Ian  Hargreaves,  “Digital  Opportunity:  A  Review  of  Intellectual  Property  and  Growth,”  May   2011,  http://www.ipo.gov.uk/ipreview-­‐finalreport.pdf.     20.    James  Manyika  et  al.,  “Big  Data:  The  Next  Frontier  for  Innovation,  Competition,  and   Productivity,”  McKinsey  &  Company,  May  2011,   http://www.mckinsey.com/insights/business_technology/big_data_the_next_frontier_for_inn ovation.   21.   Hargreaves,  “Digital  Opportunity.”     22.    Diane  McDonald  and  Ursula  Kelly,  “The  Value  and  Benefits  of  Text  Mining  to  UK  Further  and   Higher  Education,”  JISC,  2012,  http://www.jisc.ac.uk/reports/value-­‐and-­‐benefits-­‐of-­‐text-­‐ mining.   23.    JISC,  “The  Model  NESLi2  Licence  for  Journals.”   24.    Smit  and  van  der  Graaf,  “Journal  Article  Mining.”   25.    McDonald  and  Kelly,  “The  Value  and  Benefits  of  Text  Mining.”   26.    Clark,  Text  Mining  and  Scholarly  Publishing.   27.    “Gold  in  the  Text?”  Nature  483  (March  8,  2012):  124,  http://dx.doi.org/10.1038/483124a.   28.    Richard  Van  Noorden,  “Trouble  at  the  Text  Mine,”  Nature  483  (March  8,  2012):  134–35.   29.    Claudio  Aspesi,  A.  Rosso,  and  R.  Wielechowski.  Reed  Elsevier:  Is  Elsevier  Heading  for  a  Political   Train-­‐Wreck?  2012.   30.    Clark,  Text  Mining  and  Scholarly  Publishing.   31.    Jill  Emery,  “Working  In  A  Text  Mine:  Is  Access  about  to  Go  Down?”  Journal  of  Electronic   Resources  Librarianship  20,  no.  3  (2009):135–38,   http://dx.doi.org/10.1080/19411260802412745.   32.    Clark,  Text  Mining  and  Scholarly  Publishing:  14.   33. Van  Noorden,  “Trouble  at  the  Text  Mine.” 34.   Ibid. 35.    Ibid.     NEGOTIATING  A  TEXT  MINING  LICENSE  FOR  FACULTY  RESEARCHERS  |  WILLIAMS  ET  AL   20     36.    JISC,  “The  Model  NESLi2  Licence  for  Journals.”   37.    Smit  and  van  der  Graaf,  “Journal  Article  Mining.”   38.    Van  Noorden,  “Trouble  at  the  Text  Mine.”   39.    Internal  Revenue  Service,  “Unrelated  Business  Income  Defined,”     http://www.irs.gov/Charities-­‐&-­‐Non-­‐Profits/Unrelated-­‐Business-­‐Income-­‐Defined.   40.    Clark,  Text  Mining  and  Scholarly  Publishing:  10.   41.    Ibid:  14.   42.    McDonald  and  Kelly,  “The  Value  and  Benefits  of  Text  Mining.”   43.    Smit  and  van  der  Graaf,  “Journal  Article  Mining.”   44.    McDonald  and  Kelly,  “The  Value  and  Benefits  of  Text  Mining.”   45.    Ibid.   46.    Smit  and  van  der  Graaf,  “Journal  Article  Mining.”   47.    McDonald  and  Kelly,  “The  Value  and  Benefits  of  Text  Mining.”   48.    British  Columbia  Electronic  Library  Network,  BC  ELN  Database  Licensing  Framework,     http://www.cdlib.org/services/collections/toolkit/.   49.      “Licensing  Toolkit,”  California  Digital  Library,   http://www.cdlib.org/services/collections/toolkit/.   50.    JISC,  “The  Model  NESLi2  Licence  for  Journals.”   51.    McDonald  and  Kelly,  “The  Value  and  Benefits  of  Text  Mining.”   52.    “Using  BioMed  Central’s  Open  Access  Full-­‐Text  Corpus  for  Text  Mining  Research,”     http://www.biomedcentral.com/about/datamining.   53.    “Help  Using  This  Site,”  PLOS,  http://www.plosone.org/static/help.   54.    Iris  Kisjes,  “University  College  London  and  Elsevier  Launch  UCL  Big  Data  Institute,”  Elsevier   Connect,  press  release,  December  18,  2013,  http://www.elsevier.com/connect/university-­‐ college-­‐london-­‐and-­‐elsevier-­‐launch-­‐ucl-­‐big-­‐data-­‐institute.   55.    “Gold  in  the  Text?”   56.    Richard  Van  Noorden,  “Elsevier  Opens  Its  Papers  to  Text-­‐Mining,”  Nature  506  (February  2,   2014):  17.   57.    Sciverse,  Content  APIs,  http://www.developers.elsevier.com/cms/content-­‐apis.     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  SEPTEMBER  2014   21     58.    “Text  and  Data  Mining,”  Elsevier,  ,  http://www.elsevier.com/about/universal-­‐access/content-­‐ mining-­‐policies.   5495 ---- Microsoft Word - September_ITAL_Fortier_final.docx Hidden  Online  Surveillance:  What   Librarians  Should  Know  to  Protect  Their   Own  Privacy  and  That  of  Their  Patrons       Alexandre  Fortier   and     Jacquelyn  Burkell     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  SEPTEMBER  2015             59   ABSTRACT   Librarians  have  a  professional  responsibility  to  protect  the  right  to  access  information  free  from   surveillance.  This  right  is  at  risk  from  a  new  and  increasing  threat:  the  collection  and  use  of  non-­‐ personally  identifying  information  such  as  IP  addresses  through  online  behavioral  tracking.  This   paper  provides  an  overview  of  behavioral  tracking,  identifying  the  risks  and  benefits,  describes  the   mechanisms  used  to  track  this  information,  and  offers  strategies  that  can  be  used  to  identify  and  limit   behavioral  tracking.  We  argue  that  this  knowledge  is  critical  for  librarians  in  two  interconnected   ways.  First,  librarians  should  be  evaluating  recommended  websites  with  respect  to  behavioral   tracking  practices  to  help  protect  patron  privacy;  second,  they  should  be  providing  digital  literacy   education  about  behavioral  tracking  to  empower  patrons  to  protect  their  own  privacy  online.   INTRODUCTION   Privacy  is  important  to  librarians.  The  American  Library  Association  Code  of  Ethics  (2008)  states   that  “we  protect  each  library  user’s  right  to  privacy  and  confidentiality  with  respect  to  information   sought  or  received  and  resources  consulted,  borrowed,  acquired  or  transmitted,”  while  the   Canadian  Library  Association  Code  of  Ethics  (1976)  states  that  members  have  responsibility  to   “protect  the  privacy  and  dignity  of  library  users  and  staff.”  This  translates  to  a  core  professional   commitment:  according  to  the  American  Library  Association  (2014,  under  “Why  Libraries?”),   “librarians  feel  a  professional  responsibility  to  protect  the  right  to  search  for  information  free   from  surveillance.”   Increasingly,  information  searches  are  conducted  online,  and  as  a  result  librarians  should  be   paying  specific  attention  to  online  surveillance  in  their  efforts  to  satisfy  their  privacy-­‐related   professional  responsibility.  This  is  particularly  important  given  the  current  environment  of   significant  and  increasing  threat  to  privacy  in  the  online  context.  Although  many  concerns  about   online  privacy  relate  to  the  collection,  use,  and  sharing  of  personally  identifiable  information,   there  is  increasing  awareness  of  the  risks  associated  with  the  collection  and  use  of  what  has  been   termed  ‘non-­‐personally  identifiable  information’  (e.g.:  Internet  Protocol  addresses,  pages  visited,   geographic  location  information,  search  strings,  etc.;  Office  of  the  Privacy  Commissioner  of  Canada     Alexandre  Fortier  (afortie@uwo.ca)  is  a  PhD  candidate  and  Lecturer,  Faculty  of  Information  and   Media  Studies,  The  University  of  Western  Ontario,  London,  Ontario.  Jacquelyn  Burkell   (jburkell@uwo.ca)  is  Associate  Professor,  Faculty  of  Information  and  Media  Studies,  The   University  of  Western  Ontario,  London,  Ontario.     HIDDEN  ONLINE  SURVEILLANCE:  WHAT  LIBRARIANS  SHOULD  KNOW  TO  PROTECT  THEIR  OWN  PRIVACY  AND   THAT  OF  THEIR  PATRONS|  FORTIER  AND  BURKELL  |  doi:  10.6017/ital.v34i3.5495   60   2011,  12).  This  practice  has  been  termed  ‘behavioral  tracking’,  and  recent  revelations  of   government  security  agency  collection  of  user  metadata  (Ball  2013;  Weston,  Greenwald  and   Gallager  2014)  have  heightened  awareness  of  this  issue.  The  problem,  however,  is  not  new,  nor  is   the  practice  restricted  to  the  actions  of  governmental  agencies.  Indeed,  as  early  as  1996   commercial  and  non-­‐commercial  entities  were  practicing  online  behavioral  tracking  for  purposes   of  website  and  interaction  personalization  and  to  present  targeted  advertising  (“Affinicast  unveils   personalization  tool”  1996;  “AdOne  Classified  Network  and  ClickOver  announce  strategic  alliance”   1997).  Since  these  initial  forays  into  behavioral  tracking  and  personalization  of  online  content  the   practice  has  proliferated,  and  many  sites  now  use  a  variety  of  behavioral  tracking  tools  to  enhance   user  experience  and  deliver  targeted  advertisements  (see,  e.g.,  the  “What  they  know”  series  from   the  Wall  Street  Journal  2010;  Gomez,  Pinnick  and  Soltani  2009;  Soltani  et  al.  2009).   There  can  be  no  question  that  behavioral  tracking  is  a  form  of  surveillance  (Castelluccia  and   Narayanan  2012),  and  the  ubiquity  of  this  practice  means  that  users  are  regularly  subject  to  this   type  of  surveillance  when  they  access  online  resources.  In  order  to  satisfy  a  professional   commitment  to  support  information  access  free  from  surveillance,  librarians  must  therefore   address  two  related  issues:  first,  they  must  ensure  that  the  resources  they  recommend  are   privacy-­‐respecting  in  that  those  resources  engage  in  little  if  any  online  surveillance;  second,  they   must  raise  the  digital  literacy  of  their  patrons  with  respect  to  online  privacy,  increasing   understanding  of  online  tracking  mechanisms  and  the  strategies  that  patrons  can  use  to  protect   their  privacy  in  light  of  these  activities.   Addressing  the  first  issue  requires  that  librarians  attend  to  surveillance  practices  when   recommending  online  information  resources.  Privacy  and  surveillance  issues,  however,  are   notably  absent  from  common  guidelines  for  evaluating  web  resources  (see,  e.g.,  Kapoun  1998;   University  of  California,  Berkley  2012;  John  Hopkins  University  2013),  and  thus  librarians  do  not   have  the  guidance  they  need  to  ensure  that  the  resources  they  recommend  are  privacy-­‐respecting.   It  is  critical  that  librarians  and  other  information  professionals  address  this  gap  by  developing  an   understanding  of  the  surveillance  mechanisms  used  by  websites  and  the  strategies  that  can  be   deployed  to  identify  and  even  nullify  these  mechanisms.  This  same  understanding  is  necessary  to   address  the  second  goal  of  raising  the  privacy-­‐related  digital  literacy  of  patrons.  Librarians  must   understand  tracking  mechanisms  and  potential  responses  in  order  to  integrate  privacy  literacy   into  library  digital  literacy  initiatives  that  are  central  to  the  mission  of  libraries  (American  Library   Association  2013).   This  paper  provides  an  introduction  to  behavioral  tracking  mechanisms  and  responses.  The  goals   of  this  paper  are  to  provide  an  overview  of  the  risks  and  benefits  associated  with  online   behavioral  tracking,  to  discuss  the  various  surveillance  mechanisms  that  are  used  to  track  user   behavior,  and  to  provide  strategies  for  identifying  and  limiting  online  behavioral  tracking.  We   have  elsewhere  published  analyses  of  behavioral  tracking  practices  on  websites  recommended  by   information  professionals  (Burkell  and  Fortier  2015),  and  on  practices  with  respect  to  the   disclosure  of  tracking  mechanisms  (Burkell  and  Fortier  2015).  This  paper  serves  as  an  adjunct  to     INFORMATION  TECHNOLOGIES  AND  LIBRARIES  |  SEPTEMBER  2015   61   those  empirical  results,  providing  information  professionals  with  background  that  will  assist  them   in  negotiating,  on  the  part  of  themselves  and  their  patrons,  the  complex  territory  of  online  privacy.   Consumer  attitudes  toward  behavioral  tracking   Survey  data  suggest  that  consumers  are,  in  general,  aware  of  behavioral  tracking  practices.  The   2013  US  Consumer  Data  Privacy  Study  (TRUSTe  2013),  for  example,  reveals  that  80  percent  of   users  are  aware  of  online  behavioral  tracking  on  their  desktop  devices,  while  slightly  under  70   percent  are  aware  of  tracking  on  mobile  devices  (see  also  Office  of  the  Privacy  Commissioner  of   Canada  2013).  Awareness,  however,  does  not  directly  translate  to  understanding,  and  recent  data   indicate  that  even  relatively  sophisticated  Internet  users  are  not  fully  informed  about  behavioral   tracking  practices  (McDonald  and  Cranor  2010;  Smit  et  al.  2014).  Moreover,  attitudes  about   tracking  are  at  best  ambivalent  (Ur  et  al.  2012),  and  many  studies  indicate  a  predominantly   negative  reaction  to  these  practices  (Turow  et  al.  2009;  McDonald  and  Cranor  2010;  TRUSTe   2013).  Although  it  is  not  universally  required  by  regulatory  frameworks,  many  users  feel  that   companies  should  request  permission  before  collecting  behavioral  tracking  data  (Office  of  the   Privacy  Commissioner  of  Canada  2013).  Finally,  although  some  users  take  steps  to  limit  or  even   eliminate  behavioral  tracking,  many  do  not.  For  example,  while  one-­‐third  to  three-­‐quarters  of   survey  respondents  indicate  that  they  manage  or  refuse  browser  cookies  (TRUSTe  2013;   comScore  2007;  2011;  Rainie  et  al.  2013),  at  least  one  quarter  reported  no  attempts  to  limit   behavioral  tracking.  This  may  be  attributed  to  the  difficulty  in  using  such  mechanisms  (Leon  et  al.   2011).   Behavioral  tracking  and  its  mechanisms   Tracking  mechanisms  transmit  non-­‐personally  identifiable  information  to  websites  for  different   purposes.  Originally,  the  information  collected  by  these  mechanisms  was  used  to  enhance  user   experience  and  to  make  these  website  interactions  more  efficient.  Tracking  mechanisms  can   record  user  actions  on  a  web  page  and  their  interaction  preferences.  Using  these  data,  websites   can  for  example  direct  returning  visitors  to  a  specific  location  in  the  site,  allowing  those  visitors  to   resume  interaction  with  a  website  at  the  point  where  they  were  on  the  previous  visit.  Using  the   Internet  Protocol  (IP)  address  of  a  user,  websites  can  display  information  relevant  to  the   geographic  area  where  a  user  is  located.  Tracking  mechanisms  also  allow  a  website  to  remember   registration  details  and  the  items  users  have  put  in  their  shopping  basket  (Harding,  Reed  and  Gray   2007).   Tracking  mechanisms  are  also  of  great  use  to  webmasters,  supporting  the  optimization  of  website   design.  Thus,  for  example,  these  mechanisms  can  inform  webmasters  of  users’  movements  on   their  websites:  what  pages  are  visited,  how  often  they  are  visited,  and  in  what  order.  They  can  also   indicate  the  common  entry  and  exit  points  for  a  specific  website.  This  information  can  be   leveraged  in  site  redesign  to  increase  user  satisfaction  and  traffic.     HIDDEN  ONLINE  SURVEILLANCE:  WHAT  LIBRARIANS  SHOULD  KNOW  TO  PROTECT  THEIR  OWN  PRIVACY  AND   THAT  OF  THEIR  PATRONS|  FORTIER  AND  BURKELL  |  doi:  10.6017/ital.v34i3.5495   62   Website  optimization  and  interaction  personalization  have  potential  benefit  to  users.  At  the  same   time,  however,  the  detailed  profile  of  user  activities,  potentially  aggregated  across  multiple  visits   to  different  websites,  presents  potential  privacy  risks.  The  information  gathered  through  tracking   mechanisms  can  allow  a  website  to  identify  browsing  and  information  access  habits,  to  infer  user   characteristics  including  location  and  some  demographics,  and  to  know  what  topics  or  products   are  of  particular  interest  to  a  user.   Tracking  mechanisms  can  be  first-­‐party  or  third-­‐party,  and  the  difference  has  implications  for  the   detail  that  can  be  assembled  in  the  user  profile.  First-­‐party  mechanisms  are  set  by  directly  by  the   website  a  user  is  visiting,  while  third-­‐party  mechanisms  are  set  by  outside  companies  providing   services,  such  as  advertising,  analysis  of  user  patterns  and  social  media  integration,  on  the   primary  site.  First-­‐party  tracking  mechanisms  collect  information  about  a  site  visit  and  visitor  and   deliver  that  information  to  the  site  itself.  Using  first-­‐party  tracking,  web  sites  can  provide   personalized  interaction,  integrating  visit  and  visitor  information  both  within  a  single  visit  and   across  multiple  visits  (Randall  1997).  This  information  is  available  only  to  the  web  site  itself,  and   thus  neither  includes  information  about  visits  to  other  sites  nor  is  accessible  by  other  websites,   unless  the  information  is  sold  or  leaked  by  the  first-­‐party  site  (see  Narayanan  2011).   Third-­‐party  tracking  mechanisms,  by  contrast,  deliver  information  about  a  site  visit  and  visitor  to   a  third  party.  This  transaction  is  often  invisible  to  the  user,  and  the  information  is  transmitted   typically  without  explicit  user  consent.  Third-­‐party  tracking  represents  a  greater  menace  to   privacy,  since  third  parties  have  a  presence  on  multiple  sites,  and  are  able  to  collect  information   about  users  and  their  activities  on  all  those  sites  and  integrate  that  information  across  sites  and   across  visits  into  a  single  detailed  user  profile  (see  Mayer  and  Mitchell  2012  for  a  discussion  of   privacy  problems  associated  with  third-­‐party  tracking).  Research  demonstrates  that  third-­‐party   tracking  is  a  common  and  perhaps  even  ubiquitous  practice  (Gomez,  Pinnick  and  Soltani  2009;   (Burkell  and  Fortier  2013).  It  is  not  uncommon  for  websites  to  have  trackers  from  more  than  one   third  party,  and  some  websites,  especially  popular  ones,  have  trackers  from  dozens  of  different   organizations:  Gomez,  Pinnick  and  Soltani  (2009),  for  example,  found  100  unique  web  beacons  on   a  single  website.  Furthermore,  the  same  tracking  companies  are  present  on  many  different   websites,  allowing  them  to  integrate  into  a  single  profile  information  about  visits  to  each  of  these   many  sites.  PrivacyChoice1,  which  maintains  a  comprehensive  database  of  tracking  companies,   estimates  that  Google  Display  Network  (Doubleclick),  for  instance,  has  a  presence  on  57  percent  of   websites.  Thus,  a  user  traveling  the  web  is  likely  to  be  tracked  by  Doubleclick  on  more  than  half  of   the  sites  they  visit,  and  Doubleclick  has  access  to  information  about  all  visits  to  each  of  these  many   sites.   Worries  about  the  potential  privacy  breaches  that  mechanisms  for  tracking  a  user’s  activities   online  can  allow  are  not  new.  Even  at  their  inception  in  the  mid-­‐1990s,  HTTP  cookies  (also  known   as  browser  cookies)  were  generating  controversy  about  the  potential  invasion  of  privacy                                                                                                                               1 http://www.privacychoice.org/.   INFORMATION  TECHNOLOGIES  AND  LIBRARIES  |  SEPTEMBER  2015   63   (e.g.  Randall  1997).  Users,  however,  quickly  realized  that  they  could  manage  HTTP  cookies  using   accessible  browser  settings  that  limit  or  even  entirely  disallow  the  practice  of  setting  cookies.  As  a   result,  websites,  advertisers  and  others  who  benefit  from  web  audience  segmentation  and   behavior  analytics  developed  newer  and  more  obscure  tracking  technologies  including   ‘supercookies’  and  web  beacons,  and  these  technologies  are  now  deployed  along  with  HTTP   cookies  (Sipior,  Ward  and  Mendoza  2011).  Tracking  technologies  are  constantly  evolving  in   response  to  user  behavior  and  advertiser  demand,  therefore  keeping  up  to  date  is  an  ongoing   challenge  (see,  e.g.,  Goodwin  2011).   HTTP  cookies   HTTP  cookies  were  originally  meant  to  help  web  developers  “invisibly”  gather  information  about   users  in  order  to  personalize  and  optimize  user  experience  (Randall  1997).  These  cookies  are   simply  a  few  lines  of  text  shared  in  an  HTTP  transaction,  and  a  typical  cookie  might  include  a  user   ID,  the  time  of  a  visit,  and  the  IP  address  of  the  computer.  Cookies  are  associated  with  a  specific   browser,  and  the  information  is  not  shared  between  different  browsers  on  the  same  machine:   thus,  the  cookies  stored  by  Firefox  are  not  accessible  to  Internet  Explorer,  and  vice  versa.  Cookies   do  not  usually  include  identifying  information  such  as  name  or  address,  and  they  are  able  to  do  so   if  and  only  if  the  user  has  explicitly  provided  this  information  to  the  website.  When  users  want  to   access  a  web  page,  their  browser  sends  a  request  to  the  server  for  the  specific  website  and  the   server  searches  the  hard  drive  for  a  cookie  file  from  this  site.  If  there  is  no  cookie,  a  unique   identifier  code  is  assigned  to  the  browser  and  a  cookie  file  is  saved  on  the  hard  drive.  If  there  is  a   cookie,  it  is  retrieved  and  the  information  is  used  to  personalize  and  structure  the  website   interaction  (for  a  detailed  description  of  the  mechanics  of  cookies,  see  Kriscol  2001,  152–155).     Some  HTTP  cookies,  called  session  or  transient  cookies,  automatically  expire  when  the  browser  is   closed  (Barth  2011).  They  are  mainly  used  to  keep  track  of  what  a  consumer  has  added  to  a   shopping  cart  or  to  allow  users  to  navigate  on  a  website  without  having  to  log  in  repeatedly.  Other   HTTP  cookies,  called  permanent,  persistent  or  stored  cookies,  are  configured  to  keep  track  of   users  until  the  cookie  reaches  its  expiration  date,  which  can  be  set  many  years  after  creation   (Barth  2011).  Permanent  HTTP  cookies  can  be  easily  deleted  using  browser  management  tools   (Sipior,  Ward  and  Mendoza  2011).  Studies  have  shown  that  approximately  a  third  of  users  delete   cookies  once  a  month  (e.g.  comScore  2007;  2011).  Such  behavior,  however,  displeases  advertisers,   as  it  leads  to  an  overestimation  of  the  number  of  true  unique  visitors  on  a  website  and  impede   user  tracking  (Marshall  2005;  see  also  comScore  2007;  2011).   Flash  cookies  and  other  ‘supercookies’   To  palliate  this  ‘attack’  on  HTTP  cookies,  an  online  advertising  company,  United  Virtualities,   developed  a  backup  system  for  cookies  using  the  local  shared  object  feature  of  Adobe’s  Flash   Player  plug-­‐in:  the  persistent  identification  element  (Sipior,  Ward  and  Mendoza  2011).  This  type   of  storage,  called  Flash  Player  Local  Shared  Objects  or,  more  commonly,  Flash  cookies,  shares   many  similarities  with  HTTP  cookies  with  regard  to  their  tracking  capabilities,  storing  similar     HIDDEN  ONLINE  SURVEILLANCE:  WHAT  LIBRARIANS  SHOULD  KNOW  TO  PROTECT  THEIR  OWN  PRIVACY  AND   THAT  OF  THEIR  PATRONS|  FORTIER  AND  BURKELL  |  doi:  10.6017/ital.v34i3.5495   64   non-­‐personally  identifying  information.  Unlike  HTTP  cookies,  however,  Flash  cookies  do  not  have   an  expiration  date,  a  characteristic  that  makes  them  permanent  until  they  are  manually  deleted.   They  are  also  not  handled  by  a  browser,  but  are  stored  in  a  location  accessible  to  different   browsers  and  Flash  widgets,  which  are  thus  all  able  to  access  the  same  cookie.  They  can  hold   much  more  data  (up  to  100  KB  by  default  compared  to  4  KB  for  HTTP  cookies),  and  support  more   complex  data  types  than  HTTP  cookies  (see  MacDonald  and  Cranor  2012  for  a  technical   comparison  of  HTTP  and  Flash  cookies).  Moreover,  it  is  estimated  that  Adobe’s  Flash  Player  is   installed  on  over  99  percent  of  personal  computers  (Adobe  2011),  making  Flash  cookies  usable  on   virtually  all  computers.   Flash  cookies  represent  a  more  resilient  technology  for  tracking  than  HTTP  cookies.  Erasing   traditional  cookies  within  a  browser  does  not  affect  Flash  cookies,  which  needs  to  be  erased  in  a   separate  panel  (Sipior,  Ward  and  Mendoza  2011).  Flash  cookies  also  have  the  ability  to  ‘respawn’   (or  recreate)  deleted  HTTP  cookies.  A  website  using  Flash  cookies  can  therefore  track  users  across   sessions  even  if  the  user  has  taken  reasonable  steps  to  avoid  this  type  of  online  profiling  (Soltani   et  al.  2009),  and  although  it  is  declining  in  incidence,  this  practice  is  still  occurring,  sometimes  on   very  popular  websites  (Ayenson  et  al.  2011;  MacDonald  and  Cranor  2012).     It  should  also  be  noted  that  other  Internet  technologies  (e.g.  Silverlight,  JavaScript,  and  HTML5),   which  have  so  far  attracted  less  attention  from  researchers,  use  local  storage  for  similar  purposes.   One  developer  even  created  the  ‘evercookie’,  a  very  persistent  cookie  incorporating  twelve  types   of  storage  mechanisms  available  in  a  browser  that  makes  data  persist  and  allows  for  respawning   (Kamkar  2010),  a  method  investigated  by  the  National  Security  Agency  to  de-­‐anonymize  users  of   the  Tor  network,  (‘Tor  Stinks’  presentation  2013),  a  network  which  aims  at  concealing  the   location  and  usage  of  users.   Web  beacons   Users’  online  behavior  can  also  be  monitored  by  web  beacons  (also  called  web  bugs,  clear  GIFs  or   pixel  tags),  which  tiny  are  image  tags  embedded  within  a  document,  appearing  on  a  webpage  or   attached  to  an  email,  that  are  intended  to  be  unnoticed  (Martin,  Wu  and  Alsaid  2003).  The  image   tag  creates  a  holding  space  for  a  referenced  image  residing  on  the  Web,  and  beacons  transmit   information  to  a  remote  computer  when  the  document  (web  page  or  email)  is  viewed.  Web   beacons  can  gather  information  on  their  own,  and  they  can  also  retrieve  information  from  a   previously  set  cookie  (Angwin  2010;  see  Martin,  Wu  and  Alsaid  2003  for  description  of  the   different  technological  abilities  of  web  beacons).  Such  capacity  means,  according  to  the  Privacy   Foundation  (Smith  2000;  quoted  in  Martin,  Wu  and  Alsaid  2003),  that  beacons  could  potentially   transfer  to  a  third  party  demographic  data  and  personally  identifiable  information  (name,  address,   phone  number,  email  address,  etc.)  that  a  user  has  typed  on  a  page.  Unlike  cookies,  beacons  are   not  tied  to  a  specific  server  and  can  track  users  over  multiple  web  sites  (Schoen  2009).  Beacons,   moreover,  cannot  be  managed  through  browser  settings.  While  blocking  third-­‐party  cookies  limit     INFORMATION  TECHNOLOGIES  AND  LIBRARIES  |  SEPTEMBER  2015   65   their  range  of  action,  it  does  not  preclude  beacons  from  gathering  information  on  their  own,  and   users  have  to  install  extensions  to  their  browser  to  efficiently  limit  the  effects  of  web  beacons.   Strategies  for  identifying  behavioral  tracking   In  order  to  identify  privacy-­‐respecting  online  resources,  librarians  must  learn  to  assess  the   behavioral  tracking  activities  occurring  on  websites.  The  first  step  is  to  identify  and  review   website  privacy  policies.  Privacy  guidelines  regulating  the  collection,  retention  and  use  of  personal   information  in  the  online  environment  usually  require  that  users  should  be  given  notice  of  website   practices  (e.g.,  Fair  Information  Practice  Principles2  proposed  in  1973  by  the  US  Secretary’s   Advisory  Committee  on  Automated  Personal  Data  Systems,  the  Convention  for  the  Protection  of   Individuals  with  Regard  to  Automatic  Processing  of  Personal  Data  developed  by  the  Council  of   Europe  (1981),  and  the  Organisation  for  Economic  Co-­‐operation  and  Development  Guidelines  on   the  Protection  of  Privacy  and  Transborder  flows  of  Personal  Data3).  This  notice  is  typically  provided   in  privacy  policies  that  identify  what  information  is  collected,  how  it  is  used,  and  with  whom  it  is   shared.  Regulatory  frameworks,  however,  did  not  originally  contemplate  the  collection  of  non-­‐ personally  identifiable  information.  While  such  disclosure  would  seem  to  be  consistent  with  the   Fair  Information  Practice  Principles,  the  current  mode  of  mode  of  control  is  in  many  cases  self-­‐ regulatory45,  and  full  compliance  with  notice  requirements  is  far  from  universal  (Komanduri  et  al.   2011-­‐2012).  Thus,  while  disclosure  of  behavioral  tracking  practices  in  websites  should  be  seen  as   diagnostic  of  the  presence  of  these  mechanisms,  lack  of  disclosure  cannot  be  interpreted  to  mean   that  the  site  does  not  engage  in  behavioral  tracking  (Komanduri  et  al.  2011-­‐2012;  Burkell  and   Fortier  2013b).   Furthermore,  privacy  policy  disclosures,  where  they  do  exist,  may  be  difficult  to  understand   (Burkell  and  Fortier  2013b).  Website  privacy  policies  are  often  complex  (Micheti,  Burkell  and   Steeves  2010).  They  tend  to  be  written  with  the  goal  of  protecting  a  website  owner  against   lawsuits  rather  than  informing  users  (Earp  et  al.  2005;  Pollach  2005).  Pollach  (2005),  for  example,   details  a  variety  of  linguistic  strategies  that  serve  to  undermine  user  understanding  of  website   practices,  including  mitigation  and  enhancement,  obfuscation  of  reality,  relationship  building,  and   persuasive  appeals.  Therefore,  even  if  many  websites  acknowledge  the  collection  of  non-­‐ personally  identifiable  information,  both  from  first-­‐  and  third-­‐party,  the  effectiveness  of  this   disclosure  is  limited,  making  privacy  policies  a  relatively  ineffective  tool  to  identify  behavioral   tracking  practices.                                                                                                                             2 The Privacy Act of 1974, 5 U.S.C. § 552a. 3 C(80)58/FINAL, as amended on 11 July 2013 by C(2013)79. 4 For instance, the new Self-Regulatory Guidelines for Online Behavioral Advertising identify the need to provide notice to users when behavioral data is collected that allows the tracking of users across websites and over time (United States Federal Trade Commission, 2009). 5 Exceptions to this self-regulatory principle are increasing, including but not limited to the California Online Privacy Protection Act of 2003 (OPPA), and the EU Cookie Directive (2009/136/EC) of the European Parliament and of the Council.   HIDDEN  ONLINE  SURVEILLANCE:  WHAT  LIBRARIANS  SHOULD  KNOW  TO  PROTECT  THEIR  OWN  PRIVACY  AND   THAT  OF  THEIR  PATRONS|  FORTIER  AND  BURKELL  |  doi:  10.6017/ital.v34i3.5495   66   As  a  result,  librarians  need  to  develop  strategies  and  tools  that  allow  them  to  assess  directly  the   behavioral  tracking  practices  of  websites,  in  order  that  these  practices  can  be  considered  in   making  websites  recommendations.  Different  protocols  can  be  followed  in  making  this   assessment,  but  they  should  be  built  around  the  following  guiding  principles  (see  Burkell  and   Fortier  2013a  for  a  full  discussion).  The  first  important  principle  is  that  each  website  should  be   visited  in  an  independent  session  to  eliminate  contamination.  Each  website  under  consideration   should  be  visited  in  an  independent  session,  beginning  with  the  browser  at  an  about:blank  page,   with  clean  data  directories  (no  HTTP  and  Flash  cookies,  and  an  empty  cache).  The  evaluator   should  ensure  that  browser  settings  are  configured  to  allow  cookies,  tools  to  track  web  beacons   (e.g.,  the  Ghostery6  browser  extension)  are  installed  in  the  browser,  and  Adobe  Flash,  via  the   Website  Storage  Settings  panel  is  configured  to  accept  data.  The  website  should  then  be  accessed   directly  by  entering  the  domain  name  into  the  browser’s  navigation  bar.  Evaluators  should  mimic   a  typical  user  interaction  with  the  website  on  many  pages  without  clicking  on  advertisements  or   following  links  to  outside  sites.  As  they  browse  through  the  site,  the  evaluator  should  record  the   web  beacons  and  trackers  identified  by  the  browser  extension  (e.g.,  Ghostery).  At  the  end  of  the   session,  they  should  immediately  review  the  contents  of  the  browser  cookie  file  and  the  Adobe   Flash  Panel  via  Website  Storage  settings,  recording  any  cookies  that  are  present.  PrivacyChoice,  as   well  as  Ghostery,  maintains  a  database  of  trackers  that  evaluators  can  use  to  identify  associated   privacy  risk.  While  all  third-­‐party  trackers  raise  some  privacy  issues,  some  of  them  put  users  at  a   greater  risk  than  others,  either  because  of  their  practices  or  their  presence  on  a  large  number  of   websites.  Evaluators  should  take  that  into  account  when  making  a  decision.   Strategies  for  limiting  behavioral  tracking   Users  may  also  take  these  steps  to  identify  the  presence  of  behavioral  tracking,  and  digital  literacy   initiatives  should  provide  this  information  along  with  tools  and  strategies  that  users  can  employ   to  limit  tracking.  It  should  be  noted  that  elimination  of  all  behavioral  tracking  may  not  be  a   desirable  outcome  from  the  perspective  of  users  who  benefit  from  the  website  personalization   and  optimization  supported  by  these  mechanisms.  Targeted  advertising  can  also  be  positive  for   many  people,  since  it  eliminates  unwanted  or  ‘useless’  advertisements.  Ultimately,  a  user  must   decide  whether  he  or  she  wants  to  be  tracked.  Digital  literacy  initiatives  should  raise  awareness  of   behavioral  tracking  and  provide  users  with  the  tools  they  need  to  identify  and  control  tracking   should  they  choose  to  do  so.     The  easiest  step  is  for  users  to  learn  how  to  manage  HTTP  cookies  in  every  web  browser  that  they   use.  Using  browser  settings,  users  can  decide  to  refuse  third-­‐party  cookies  or  even  all  cookies.  The   latter,  however,  will  make  the  make  the  browsing  experience  much  less  efficient  and  may  impede   users  from  accessing  some  websites.  Users  should  also  learn  how  to  delete  cookies  and  they   should  be  encouraged  to  think  about  periodically  emptying  the  cookie  file  of  each  of  their   browsers.  Controlling  Flash  cookies  is  more  complex,  yet  crucial  considering  the  capabilities  of                                                                                                                             6 https://www.ghostery.com/.   INFORMATION  TECHNOLOGIES  AND  LIBRARIES  |  SEPTEMBER  2015   67   Flash  cookies.  This  is  achieved  through  settings  on  the  Adobe  Website  Storage  Settings  Panel.   Browser  extensions,  such  as  Ghostery  and  AdBlock  Plus7,  can  be  added  to  most  browsers.   Ghostery  allows  users  to  block  trackers,  either  on  a  tracker-­‐by-­‐tracker  basis,  a  site-­‐by-­‐site  basis  or   a  mixture  of  the  two.  Also  customable,  Adblock  Plus  allows  users  to  block  either  all   advertisements  or  only  the  ones  they  do  not  want  to  see.  These  extensions,  however,  may  slow   down  Internet  browsing.   Users  can  also  change  their  Internet  use  habits.  It  is  possible  for  user  to  use  search  engines  that  do   dot  store  any  non-­‐personally  identifiable  information,  such  as  Ixquick8  and  DuckDuckGo9.  Ixquick   returns  the  top  ten  results  from  multiple  search  engines.  It  only  sets  one  cookie  that  remembers  a   user’s  search  preferences  and  that  is  deleted  after  a  user  does  not  visit  Ixquick  for  90  days.   DuckDuckGo,  which  returns  the  same  search  results  for  a  given  search  term  to  all  users,  aims  at   getting  information  from  the  best  sources  rather  than  the  most  sources.  While  these  search   engines  do  not  have  all  the  functionality  of  the  major  search  engines,  both  of  them  have  received   praise  (e.g.  McCracken  2011).  The  ultimate  solution,  one  that  allows  a  user  to  navigate  online  total   anonymity,  is  to  use  the  Tor10  web  browser,  which  impedes  network  surveillance  or  traffic   analysis  and  which  the  U.S.  National  Security  Agency  has  characterized  as  “the  King  of  high  secure,   low  latency  Internet  anonymity”  (Schneier  2013).  The  anonymity  afforded  by  Tor,  however,   comes  at  the  price  of  reduced  speed  and  limitations  to  available  content.   CONCLUSION   It  is  widely  understood  that  online  privacy  is  at  risk,  threatened  by  the  actions  of  governmental   agencies  and  commercial  entities.  There  is  widespread  awareness  of  and  attention  to  the  risks   associated  with  the  collection  and  use  of  personally  identifiable  information,  but  less  attention  is   paid  to  an  equally  significant  issue:  the  collection  and  use  of  information  that  is  highly  personal   but  nonetheless  ‘non-­‐identifying’.  This  practice,  termed  ‘behavioral  tracking’,  is  the  focus  of  this   paper.  Other  research  demonstrates  that  behavioral  tracking  is  widespread  (Gomez,  Pinnick  and   Soltani  2009;  Burkell  and  Fortier  2013a),  but  users  demonstrate  only  a  limited  knowledge  of  the   practice  and  they  do  little  to  control  tracking  (comScore  2007;  2011;  Rainie  et  al.  2013;  TRUSTe   2013).  We  argue  that  librarians  have  a  dual  professional  responsibility  with  respect  to  this  issue:   first,  librarians  should  be  aware  of  the  surveillance  practices  of  the  websites  they  recommend  to   patrons  and  take  these  practices  into  account  in  making  website  recommendations;  second,  digital   literacy  initiatives  spearheaded  by  librarians  include  a  focus  on  online  privacy,  and  provide   patrons  with  the  information  they  need  to  manage  their  own  online  privacy.     This  paper  presents  an  overview  of  online  behavioral  tracking  mechanisms,  and  provides   strategies  for  identifying  and  limiting  online  behavioral  tracking.  The  information  presented   provides  a  basic  understanding  of  tracking  mechanisms  along  with  practical  strategies  that                                                                                                                             7 https://adblockplus.org/. 8 https://www.ixquick.com/. 9 https://duckduckgo.com/. 10 www.torproject.org/torbrowser/.   HIDDEN  ONLINE  SURVEILLANCE:  WHAT  LIBRARIANS  SHOULD  KNOW  TO  PROTECT  THEIR  OWN  PRIVACY  AND   THAT  OF  THEIR  PATRONS|  FORTIER  AND  BURKELL  |  doi:  10.6017/ital.v34i3.5495   68   librarians  can  use  to  evaluate  websites  with  respect  to  these  practices  and  strategies  that  can  be   used  to  limit  online  tracking.  We  recommend  that  website  evaluation  standards  be  extended  to   include  assessment  of  online  privacy  and  especially  behavioral  tracking.  We  also  recommend  that   librarians  actively  promote  digital  literacy  by  engaging  in  public  education  programs  that  take   privacy  and  other  digital  literacy  issues  into  account  (American  Library  Association  2013).  Finally,   we  note  that  protecting  online  privacy  is  an  ongoing  challenge,  and  librarians  must  ensure  that   they  continually  update  their  understanding  of  online  surveillance  mechanisms  and  the   approaches  that  can  be  used  to  monitor  and  limit  these  activities.     ACKNOWLEDGEMENT   Support  for  this  project  was  provided  by  the  Office  of  the  Privacy  Commissioner  of  Canada   through  its  Contributions  Program.  The  views  expressed  in  this  document  are  those  of  the   researchers  and  do  not  necessarily  reflect  the  views  of  the  Officer  of  the  Privacy  Commissioner  of   Canada.   REFERENCES   Adobe.  2011.  “Adobe  Flash  Platform  runtimes:  PC  penetration”.   http://www.adobe.com/mena_en/products/flashplatformruntimes/statistics.html.   “AdOne  Classified  Network  and  ClickOver  announce  strategic  alliance”.  1997.  Business  Wire,  March   24.   “Affinicast  unveils  personalization  tool”.  1996.  AdAge,  December  4.   http://adage.com/article/news/affinicast-­‐unveils-­‐personalization-­‐tool/2714/.   American  Library  Association.  2008.  Code  of  Ethics.   http://www.ala.org/advocacy/proethics/codeofethics/codeethics.   ———.  2013.  Digital  literacy,  libraries,  and  public  policies:  Report  of  the  Office  for  Information   Technology  Policy’s  Digital  Literacy  Task  Force.  http://www.districtdispatch.org/wp-­‐ content/uploads/2013/01/2012_OITP_digilitreport_1_22_13.pdf.   ———.  2014.  Choose  Privacy  Week.  Accessed  April  8.  http://chooseprivacyweek.org.   Angwin,  Julia.  2010.  “The  web’s  new  gold  mine:  Your  secrets”.  The  Wall  Street  Journal  July  31.   http://online.wsj.com/news/articles/SB10001424052748703940904575395073512989404.     Ayenson,  Mika,  Dietrich  James  Wambach,  Ashkan  Soltani,  Nathan  Good  and  Chris  Jay  Hoofnagle.   2011.  “Flash  cookies  and  privacy  II:  Now  with  HTML5  and  ETag  respawning”.  Social  Science   Research  Network.  http://ssrn.com/abstract=1898390.   Ball,  James.  2013.  “NSA  stores  metadata  of  millions  of  web  users  for  up  to  a  year,  secret  files  show”.   The  Guardian,  September  30.  http://www.theguardian.com/world/2013/sep/30/nsa-­‐americans-­‐ metadata-­‐year-­‐documents.     INFORMATION  TECHNOLOGIES  AND  LIBRARIES  |  SEPTEMBER  2015   69   Barth,  Adam.  2011.  “HTTP  State  Management  Mechanism”.  Internet  Engineering  Task  Force,  RFC   6265.  http://tools.ietf.org/html/rfc6265.   Burkell,  Jacquelyn  and  Alexandre  Fortier.  2013.  Privacy  policy  disclosures  of  behavioural  tracking   on  consumer  health  websites.  Proceedings  of  the  76th  Annual  Meeting  of  the  Association  for   Information  Science  and  Technology,  edited  by  Andrew  Grove.  doi:  10.1002/meet.14505001087.   Burkell,  Jacquelyn  and  Alexandre  Fortier.  2015.  Could  we  do  better?  Behavioural  tracking  on   recommended  consumer  health  websites.  Health  Information  and  Libraries  Journal  32  (3):  182– 194.   Canadian  Library  Association.  1976.  Code  of  Ethics.   http://www.cla.ca/Content/NavigationMenu/Resources/PositionStatements/Code_of_Ethics.htm.   Castelluccia,  Claude  and  Arvind  Narayanan.  2012.  Privacy  considerations  of  online  behavioural   tracking.  Heraklion,  Greece:  European  Union  Agency  for  Network  and  Information  Security.   http://www.enisa.europa.eu/activities/identity-­‐and-­‐trust/library/deliverables/privacy-­‐ considerations-­‐of-­‐online-­‐behavioural-­‐tracking.   comScore  2007.  The  impact  of  cookie  deletion  on  the  accuracy  of  site-­‐server  and  ad-­‐server  metrics:   An  empirical  comScore  study.   https://www.comscore.com/fre/Insights/Presentations_and_Whitepapers/2007/Cookie_Deletio n_Whitepaper.   ———.  2011.  The  impact  of  cookie  deletion  on  site-­‐server  and  ad-­‐server  metrics  in  Latin  America:   An  empirical  comScore  study.   http://www.comscore.com/Insights/Presentations_and_Whitepapers/2011/Impact_of_Cookie_De letion_on_Site-­‐Server_and_Ad-­‐Server_Metrics_in_Latin_America.   Council  of  Europe.  1981.  Convention  for  the  protection  of  individuals  with  regard  to  automatic   processing  of  personal  data.  http://conventions.coe.int/Treaty/en/Treaties/Html/108.htm.   Earp,  Julia  B.,  Annie  I.  Antón,  Lynda.  Aiman-­‐Smith  and  William  H.  Stufflebeam.  2005.  “Examining   Internet  privacy  policies  within  the  context  of  user  values”.  IEEE  Transactions  on  Engineering  and   Management  52  (2):  227–237.     Gomez,  Joshua,  Travis  Pinnick  and  Ashkan  Soltani.  2009.  KnowPrivacy.   http://ashkansoltani.files.wordpress.com/2013/01/knowprivacy_final_report.pdf.   Goodwin  Josh.  2011.  Super  cookies,  ever  cookies,  zombie  cookies,  oh  my.  Ensighten,  blog  entry.   http://www.ensighten.com/blog/super-­‐cookies-­‐ever-­‐cookies-­‐zombie-­‐cookies-­‐oh-­‐my.   Harding,  William  T.,  Anita  J.  Reed  and  Robert  L.  Gray.  2001.  Cookies  and  web  bugs:  What  they  are   and  how  they  work  together.  Information  Systems  Management  18  (3):  17–24.     HIDDEN  ONLINE  SURVEILLANCE:  WHAT  LIBRARIANS  SHOULD  KNOW  TO  PROTECT  THEIR  OWN  PRIVACY  AND   THAT  OF  THEIR  PATRONS|  FORTIER  AND  BURKELL  |  doi:  10.6017/ital.v34i3.5495   70   Johns  Hopkins  University  Sheridan  Libraries.  2013.  Evaluating  information  found  on  the  Internet.   http://guides.library.jhu.edu/evaluatinginformation.   Kamkar,  Samy.  2010.  “evercookie”.  http://samy.pl/evercookie/.   Kapoun,  Jim.  1998.  “Teaching  undergrads  web  evaluation:  A  guide  for  library  instruction”.  College   &  Research  Libraries  News,  July/August:  522–523.   Komanduri,  Saranga,  Richard  Shay,  Greg  Norcie,  Blase  Ur  and  Lorrie  Faith  Cranor.  2011-­‐2012.   “AdChoices?  Compliance  with  online  behavioral  advertising  notice  and  choice  requirements”.  I/S:   A  Journal  of  Law  and  Policy  for  the  Information  Society  7:  603–638.   Kristol,  David  M.  2001.  HTTP  Cookies:  Standards,  privacy,  and  politics.  ACM  Transactions  on   Internet  Technology  1  (2):  151–198.   Leon,  Pedro  Giovanni,  Blase  Ur,  Rebecca  Balebako,  Lorrie  Faith  Cranor,  Richard  Shay,  and  Yang   Wang.  2012.  “Why  Johnny  can’t  op  out:  A  usability  evaluation  of  tools  to  limit  online  behavioral   advertising”.  Proceedings  of  the  SIGCHI  Conference  on  Human  Factors  in  Computing  Systems.   http://dl.acm.org/citation.cfm?id=2207759.   Marshall,  Matt.  2005.  “New  cookies  much  harder  to  crumble”.  The  Standard-­‐Times,  May  15.   http://www.southcoasttoday.com/apps/pbcs.dll/article?AID=/20050515/NEWS/305159957.   Martin,  David,  Hailin  Wu  and  Adil  Alsaid.  2003.  Hidden  surveillance  by  web  sites:  Web  bugs  in   contemporary  use.  Communications  of  the  ACM  46  (1):  258–264.   Mayer,  Jonathan  R.  and  John  C.  Mitchell.  2012.  Third-­‐party  web  tracking:  Policy  and  technology.   Proceedings  of  the  2012  IEEE  Symposium  on  Security  and  Privacy.   https://cyberlaw.stanford.edu/files/publication/files/trackingsurvey12.pdf.   McCracken,  Harry.  2011.  “50  websites  that  make  the  web  great.  Time,  August  16.   http://content.time.com/time/specials/packages/0,28757,2087815,00.html.   McDonald,  Aleecia  M.  and  Lorrie  Faith  Cranor.  2010.  “Beliefs  and  behaviors:  Internet  users’   understanding  of  behavioral  advertising”.  Social  Science  Research  Network.   http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1989092.   ———.  2012.  “A  survey  of  the  use  of  Adobe  Flash  Local  Shared  Objects  to  respawn  HTTP  cookies”.   I/S:  A  Journal  of  Law  and  Policy  for  the  Information  Society  7  (3):  639–687.   Micheti,  Anca,  Jacquelyn  Burkell  and  Valerie  Steeves.  2010.  “Fixing  broken  doors:  Strategies  for   drafting  privacy  policies  young  people  can  understand”.  Bulletin  of  Science,  Technology,  and   Society.  30  (2):  130–143.   Narayanan,  Arvind.  2011.  “There  is  no  such  thing  as  anonymous  online”.  Blog  entry,  July  28.   https://cyberlaw.stanford.edu/blog/2011/07/there-­‐no-­‐such-­‐thing-­‐anonymous-­‐online-­‐tracking.     INFORMATION  TECHNOLOGIES  AND  LIBRARIES  |  SEPTEMBER  2015   71   Office  of  the  Privacy  Commissioner  of  Canada.  2011.  Report  on  the  2010  Office  of  the  Privacy   Commissioner  of  Canada's  Consultations  on  Online  Tracking,  Profiling  and  Targeting,  and  Cloud   Computing.  https://www.priv.gc.ca/resource/consultations/report_201105_e.pdf.   ———.  2013.  Survey  of  Canadians  on  privacy-­‐related  issues.   http://www.priv.gc.ca/information/por-­‐rop/2013/por_2013_01_e.pdf.   Pollach,  Irene.  2005.  “A  typology  of  communicative  strategies  in  online  privacy  policies:  Ethics,   power,  and  informed  consent”.  Journal  of  Business  Ethics  62  (3):  221–235.   Rainie,  Lee,  Sara  Kiesler,  Ruogu  Kang  and  Mary  Madden.  Anonymity,  privacy,  and  security  online.   Pew  Research  Internet  Project.  http://www.pewinternet.org/2013/09/05/anonymity-­‐privacy-­‐ and-­‐security-­‐online/.   Randall,  Neil.  1997.  “The  new  cookie  monster”.  PC  Magazine  16  (8):  211–214.   Schneier,  Bruce.  2013.  “Attacking  Tor:  How  the  NSA  targets  users'  online  anonymity”.  The   Guardian,  4  October.  http://www.theguardian.com/world/2013/oct/04/tor-­‐attacks-­‐nsa-­‐users-­‐ online-­‐anonymity.   Schoen,  Seth.  2009.  “New  cookie  technologies:  Harder  to  see  and  remove,  widely  used  to  track   you”.  Blog  entry,  September  14.  https://www.eff.org/deeplinks/2009/09/new-­‐cookie-­‐ technologies-­‐harder-­‐see-­‐and-­‐remove-­‐wide.   Sipior  ,  Janice  C.,  Burke  T.  Ward  and  Ruben  A.  Mendoza.  2011.  Online  privacy  concerns  associated   with  cookies,  Flash  cookies,  and  web  beacons.  Journal  of  Internet  Commerce  10  (1):  1–16.   Smit,  Edith  G.,  Guda  Van  Noort  Hilde  A.  M.  Voorveld.  2014.  Understanding  online  behavioural   advertising:  User  knowledge,  privacy  concerns,  and  online  coping  behaviour  in  Europe.  Computers   in  Human  Behavior  32  (1):  15–22.   Smith,  R.  M.  2000.  “Why  are  they  bugging  you?”  Privacy  Foundation.   http://www.privacyfoundation.org/resources/whyusewb.asp.     Soltani,  Ashkan,  Shannon  Canty,  Quentin  Mayo,  Lauren  Thomas,  Chris  Jay  Hoofnagle.  2009.  “Flash   cookies  and  privacy”.  Social  Science  Research  Network.   http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1446862.   “‘Tor  Stinks’  presentation”.  2013.  The  Guardian  Online,  October  4.   http://www.theguardian.com/world/interactive/2013/oct/04/tor-­‐stinks-­‐nsa-­‐presentation-­‐ document.   TRUSTe.  2013.  US  2013  Consumer  data  privacy  study  –  Advertising  edition.   http://www.truste.com/us-­‐advertising-­‐privacy-­‐index-­‐2013/.     HIDDEN  ONLINE  SURVEILLANCE:  WHAT  LIBRARIANS  SHOULD  KNOW  TO  PROTECT  THEIR  OWN  PRIVACY  AND   THAT  OF  THEIR  PATRONS|  FORTIER  AND  BURKELL  |  doi:  10.6017/ital.v34i3.5495   72   Turow,  Joseph,  Jennifer  King,  Chris  Jay  Hoofnagle,  Amy  Bleakley  and  Michael  Hennessy.  2009.   “Americans  reject  tailored  advertising  and  three  activities  that  enable  it”.  Social  Science  Research   Network.  http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1478214.   United  States  Federal  Trade  Commission.  2009.  FTC  staff  report:  Self-­‐regulatory  principles  for   online  behavioral  advertising.  http://www.ftc.gov/os/2009/02/P085400behavadreport.pdf.   University  of  California,  Berkley  Library.  2012.  “Finding  information  on  the  Internet:  A  tutorial”   http://www.lib.berkeley.edu/TeachingLib/Guides/Internet/Evaluate.html.   Ur,  Blase,  Pedro  Giovanni  Leon,  Lorrie  Faith  Cranor,  Richard  Shay,  and  Yang  Wang.  2012.  “Smart,   useful,  scary,  creepy:  Perceptions  of  online  behavioral  advertising”.  SOUPS  ’12  Proceedings  of  the   Eighth  Symposium  on  Usable  Privacy  and  Security.  http://dl.acm.org/citation.cfm?id=2335362.   Weston,  Greg,  Glenn  Greenwal  and  Ryan  Gallagher.  2014.  “CSEC  used  airport  Wi-­‐Fi  to  track   Canadian  travelers:  Edward  Snowden  documents”.  CBC  News,  January  30.   http://www.cbc.ca/news/politics/csec-­‐used-­‐airport-­‐wi-­‐fi-­‐to-­‐track-­‐canadian-­‐travellers-­‐edward-­‐ snowden-­‐documents-­‐1.2517881.   “What  they  know”.  2010.  The  Wall  Street  Journal  Online.  http://blogs.wsj.com/wtk/.     5576 ---- Editorial Board Thoughts: The Checklist Mark Cyzyk INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2014 1 At my home institution, Johns Hopkins, we have a renowned professor who made a keen insight several years ago. Dr. Peter Pronovost, professor of anesthesiology and critical care medicine, of public health policy and management, of surgery, and of nursing, took note of the fact that many careless errors were occurring in hospital intensive care units, errors that were due not to ignorance, but to overlooking simple, mundane pr ocesses and procedures that maximize the safety of patients and increase the likelihood of positive medical outcomes. How to remedy this situation? As is often the case, the simplest solution is the most profound, the most effective, the most brilliant. Pronovost implemented straightforward checklists of processes and procedures for doctors and nurses to routinely follow in the intensive care unit (ICU). There was nothing on these checklists the medical staff did not already know; indeed, they were so basic that what was listed almost went without saying. But the brilliance of Pronovost’s insight was that the very implementation of a “just saying” checklist resulted in an immediate and significant improvement in patient outcomes. So simple. So easily done. Might we here in the library IT world implement just such a checklist? In particular, I’m wondering whether we might begin to construct a fairly comprehensive checklist of what I’m calling “software genres” for use in libraries. It doesn’t take much insight to see that software packages map to services and that groups of these packages cluster around such services. Such clusters, I’m thinking, are actually genres of software. So, for instance, let’s think about a standard libr ary service: a service that fulfills the need of academic institutions to store, archive, and provide access to faculty research. We see software systems emerge to fulfil this need, e.g., DSpace, CONTENTdm, Fedora, EPrints, and Islandora. We can think of each of these systems as the fulfilment of similar sets of requirements and these requirements as being dictated by the satisfaction of a need. These systems individually represent concrete instantiations of clusters of similar requirements. But these systems also cluster among and between themselves insofar as the requirements and the needs they satisfy are similar. They form a genre of software. Now, we’ve identified one genre of software useful in libraries: institutional repository software. What other genres might there be? How about the Granddaddy of them all: a software Mark Cyzyk (mcyzyk@jhu.edu), a member of the ITAL Editorial Board, is the Scholarly Communication Architect in The Sheridan Libraries, John Hopkins Univ ersity, Baltimore, Maryland. mailto:mcyzyk@jhu.edu EDITORIAL BOARD THOUGHTS: THE CHECKLIST | CYZYK 2 system/concrete-instantiation-of-requirements fulfilling the need in libraries to enable the creation and collection of bibliographic metadata, to provide access to that metadata to library patrons, to enable the management of the circulation of physical and electronic objects mapping to that metadata and to manage acquisition of materials including serial publications? Why it’s the venerable integrated library management system, of course! And here we’ll find such open-source examples as Koha and Evergreen alongside their commercial counterparts. Continuing to compile, we’ll begin to gather a table of data similar to what’s below. But what good is this? Don’t we all already know about these software “genres”? Aren’t we just in some sense stating the obvious? To the extent that these software genres and the software packages classified by each are obvious, this table, this checklist, resembles the checklists that Professor Pronovost presented to doctors and nurses in the ICU. Those doctors and nurses certainly knew to wash their hands, to properly sterilize around an area of catheterization, etc. And if they were about to overlook a hand-washing or sterilization, the checklist brought this need and requirement immediately to their attention. So too we might be aware of, say, the fact that many libraries implement systems to manage visual resources, electronic images. We may work in a library that does not. Yet scanning the list below we’ll see that it is a need with requirements that cluster to such an extent that a genre has developed around it. We may say to ourselves, “We too have a significant collection of electronic images that we’ve just been putting into our local repository, but maybe the repository is not the right place for this content, maybe we should look into these other systems designed specifically for storing and serving up images?” More, suppose you work in a really small and r esource- constrained library, yet one that has an important archival collection of materials related to local history? You now scan the table below and determine that there is indeed a genre of archival information systems. “Maybe we should set up an instance of ArchivesSpace to facilitate the cataloging and provision of access to our archival treasures?” Maybe you should! Now some of these systems can do double duty. DSpace, for instance, could be used as a serviceable image management/data management/ETD management system. But cramming so many disparate kinds of content into a single system and expecting it to behave like a bona fide image management or data management or ETD management system is probably a bad idea. The whole reason these genres emerge in the first place is that they satisfy the idiosyncratic needs of various types of content; their functionalities are tailored to the types of services one would want to provide surrounding such disparate types of content. Using the right tool for the job is surely a best practice if ever there was one. That said, some of these software packages, by design, cut across the different genres in much the same way that, say, the music of Ryan Adams cuts across the genres of rock and alt country or the novels of China Mieville cut across all manner of literary genres. Islandora provides a good example of this. It can be used as a repository, but its various “solution packs” allow it to be INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2014 3 expanded to serve well as, for example, an image management system. Islandora, interestingly, is itself built on two of the software packages listed below—Drupal and Fedora—combining the strengths and functionalities of both, resulting in a genre -busting (genre-extending?) software package. Viewing software packages as types of genres is useful insofar as, when it comes time to assemble requirements for a new project or ser vice, thinking of existing software systems as concrete instantiations of such requirements speeds the accomplishment of this task. Rather than begin to gather requirements ex nihilo, we should be thinking, “Here are some prime examples of actual working systems that fulfil the satisfaction of needs similar to ours. How about we compile a list of them, install a few, and see what they do? Maybe there is something here that fits our specific need?” So simple. So easily done. Check! Software Genre Software Packages ILMS Koha Evergreen Aleph Millennium Polaris Symphony Horizon Voyager Indexing and Discovery Blacklight Summon WorldCat Local Ebsco Discovery Services Encore Synergy Primo Central LibraryFind eXtensible Text Framework Institutional Repository DSpace Fedora EPrints Greenstone CONTENTdm Islandora Digital Commons Image Management and Access MDID ARTstor Shared Shelf Luna Publishing: Journals Open Journal Systems EDITORIAL BOARD THOUGHTS: THE CHECKLIST | CYZYK 4 Annotum Digital Commons Publishing: Monographs Open Monograph Press BookType Online Exhibition Software Omeka Pachyderm Open Exhibits Collective Access Archival Information System Archon Archivists Toolkit ArchivesSpace ICA-AtoM Faculty Research Portfolio/Showcase BibApp SciVal Personal Bibliography Management Zotero RefWorks EndNote Mendeley Web Content Management WordPress Drupal Joomla Data Management Data Conservancy Archivematica Dataverse ETD Management Vireo OpenETD Wiki MediaWiki Confluence Sharepoint Tiki Wiki GIS ArcGIS Server MapServer MapStory 5600 ---- Microsoft Word - March_ITAL_prommann_original_notes.docx Applying  Hierarchical  Task  Analysis   Method  to  Discovery  Layer  Evaluation       Merlen  Prommann   and  Tao  Zhang     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  MARCH  2015             77   ABSTRACT   While  usability  tests  have  been  helpful  in  evaluating  the  success  or  failure  of  implementing  discovery   layers  in  the  library  context,  the  focus  of  usability  tests  has  remained  on  the  search  interface  rather   than  the  discovery  process  for  users.  The  informal  site-­‐  and  context  specific  usability  tests  have   offered  little  to  test  the  rigor  of  the  discovery  layers  against  the  user  goals,  motivations  and  workflow   they  have  been  designed  to  support.  This  study  proposes  hierarchical  task  analysis  (HTA)  as  an   important  complementary  evaluation  method  to  usability  testing  of  discovery  layers.  Relevant   literature  is  reviewed  for  the  discovery  layers  and  the  HTA  method.  As  no  previous  application  of  HTA   to  the  evaluation  of  discovery  layers  was  found,  this  paper  presents  the  application  of  HTA  as  an   expert  based  and  workflow  centered  (e.g.,  retrieving  a  relevant  book  or  a  journal  article)  method  to   evaluating  discovery  layers.  Purdue  University’s  Primo  by  Ex  Libris  was  used  to  map  eleven  use  cases   as  HTA  charts.  Nielsen’s  Goal  Composition  theory  was  used  as  an  analytical  framework  to  evaluate   the  goal  charts  from  two  perspectives:  a)  users’  physical  interactions  (i.e.,  clicks),  and  b)  user’s   cognitive  steps  (i.e.,  decision  points  for  what  to  do  next).  A  brief  comparison  of  HTA  and  usability  test   findings  is  offered  as  a  way  of  conclusion.   INTRODUCTION   Discovery  layers  are  relatively  new  third  party  software  components  that  offer  Google-­‐like  web-­‐ scale  search  interface  for  library  users  to  find  information  held  in  the  library  catalo  and  beyond.   Libraries  are  increasingly  utilizing  these  to  offer  a  better  user  experience  to  their  patrons.  While   popular  in  application,  the  discussion  about  discovery  layer  implementation  and  evaluation   remains  limited.  [1][2]     A  majority  of  reported  case  studies  discussing  discovery  layer  implementations  are  based  on   informal  usability  tests  that  involve  a  small  sample  of  users  in  a  specific  context.  The  resulting  data   sets  are  often  incomplete  and  the  scenarios  are  hard  to  generalize.[3]  Discovery  layers  have  a   number  of  technical  advantages  over  the  traditional  federated  search  and  cover  a  much  wider   range  of  library  resources.  However,  they  are  not  without  limitations.  Questions  have  remained   scarce  about  the  workflow  of  discovery  layers  and  how  well  they  help  users  achieve  their  goals.     Merlen  Prommann  (mpromann@purdue.edu)  is  User  Experience  Researcher  and  Designer,   Purdue  University  Libraries.  Tao  Zhang  (zhan1022@purdue.edu)  is  User  Experience  Specialist,   Purdue  University  Libraries.     APPLYING  HIERARCHICAL  TASK  ANALYSIS  METHOD  TO  DISCOVERY  LAYER  EVALUATION  |  PROMANN  AND   ZHANG       78   Beth  Thomsett-­‐Scott  and  Patricia  E.  Reese1  offered  an  extensive  overview  of  the  literature   discussing  the  disconnect  between  what  the  library  websites  offer  and  what  their  users  would   like.[1]  On  the  one  hand,  library  directors  deal  with  a  great  variety  of  faculty  perceptions,  in  terms   of  what  the  role  of  library  is  and  how  they  approach  research  differently.  The  Ithaka  S+R  Library   Survey  of  not-­‐for  profit  four-­‐year  academic  institutions  in  the  US  suggests  a  real  diversity  of   American  academic  libraries  as  they  seek  to  develop  services  with  sustained  value.[4]  For  the   common  library  website  user,  irrelevant  search  results  and  unfamiliar  library  taxonomy  (e.g.  call   numbers,  multiple  locations,  item  formats,  etc.)  are  two  most  common  gaps.[3]  Michael  Khoo  and   Catherine  Hall  demonstrated  how  users,  primarily  college  students,  have  become  so  accustomed   to  the  search  functionalities  on  the  Internet  that  they  are  reluctant  to  use  library  websites  for  their   research.[5]  No  doubt,  the  launch  of  Google  Scholar  in  2005  was  another  driver  for  librarians  to   move  from  the  traditional  federated  searching  to  something  faster  and  more  comprehensive.[1]   While  literature  encouraging  Google-­‐like  search  experiences  is  abundant,  Khoo  and  Hall  have   warned  designers  to  not  take  users’  preferences  towards  Google  at  face  value.  They  studied  users’   mental  models,  defining  it  as  “a  model  that  people  have  of  themselves,  others,  the  environment,   and  the  things  with  which  they  interact,  such  as  technologies,”  and  concluded  that  users  often  do   not  understand  the  complexities  of  how  search  functions  actually  work  or  what  is  useful  about   them.[5]     A  more  systematic  examination  of  the  tasks  that  discovery  layers  are  designed  to  support  is   needed.  This  paper  introduces  hierarchical  task  analysis  (henceforth  HTA)  as  an  expert  method  to   evaluate  discovery  layers  from  a  task-­‐oriented  perspective.  It  aims  to  complement  usability   testing.  For  more  than  40  years,  HTA  has  been  the  primary  methodology  to  study  systems’  sub-­‐ goal  hierarchies  for  it  presents  the  opportunity  to  provide  insights  into  key  workflow  issues.  With   expertise  in  applying  HTA  and  being  frequent  users  of  the  Purdue  University  Libraries  website  for   personal  academic  needs,  we  mapped  user  tasks  into  several  flow  charts  based  on  three  task   scenarios:  (1)  finding  an  article,  (2)  finding  a  book,  and  (3)  finding  an  eBook.  Jackob  Nielsen’s   “Goal  Composition”  heuristics:  generalization,  integration  and  user  control  mechanisms[6]  were   used  as  an  analytical  framework  to  evaluate  the  user  experience  of  an  Ex  Libris  Primo®  discovery   layer  implemented  at  Purdue  University  Libraries.  The  Goal  Composition  heuristics  focus  on   multifunctionality  and  the  idea  of  servicing  many  possible  user  goals  at  once.  For  instance,   generalization  allows  users  to  use  one  feature  on  more  objects.  Integration  allows  each  feature  to   be  used  in  combination  with  other  facilities.  Control  mechanisms  allow  users  to  inspect  and   amend  how  the  computer  carries  out  the  instructions.  We  discussed  the  key  issues  with  other   Library  colleagues  to  meet  Nielsen’s  five  expert  rule  and  avoid  loss  in  the  quality  of  insights.[7]   Nielsen  studied  the  value  of  participant  volume  in  usability  tests  and  concluded  that  after  the  fifth   user  researchers  are  wasting  their  time  by  observing  the  same  findings  and  not  learning  much   new.  A  comparison  to  usability  study  findings,  as  presented  by  Fagan  et  al,  is  offered  as  a  way  of   conclusion.[3]       INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  MARCH  2015   79   RELATED  WORK     Discovery  Layers   The  traditional  federated  search  technology  offers  the  overall  benefit  of  searching  many  databases   at  once.[8][1]  Yet  it  has  been  known  to  frustrate  users,  as  they  often  do  not  know  which  databases   to  include  in  their  search.  Emily  Alling  and  Rachel  Naismith  aggregated  common  findings  from  a   number  of  studies  involving  the  traditional  federated  search  technology.[9]  Besides  slow  response   time,  other  key  causes  of  frustrating  inefficiency  were:  limited  information  about  search  results,   information  overload  due  to  the  lack  of  filters,  and  the  fact  that  results  were  not  ranked  in  order  of   relevance  (see  also  [2][1]).   New  tools,  termed  as  “discovery,”  “discovery  tools,”[2][10]  “discovery  layers’”  or  “next  generation   catalogs,”[11]  have  become  increasingly  popular  and  have  provided  the  hope  of  eliminating  some   of  the  issues  with  traditional  federated  search.  Generally,  they  are  third  party  interfaces  that  use   pre-­‐indexing  to  provide  speedy  discovery  of  relevant  materials  across  millions  of  records  of  local   library  collections,  from  books  and  articles,  to  databases  and  digital  archives.  Furthermore,  some   systems  (e.g.,  Ex  Libris  Primo  Central  Index)  aggregate  hundreds  of  millions  of  scholarly  e-­‐ resources,  including  journal  articles,  e-­‐books,  reviews,  legal  documents  and  more  that  are   harvested  from  primary  and  secondary  publishers  and  aggregators,  and  from  open-­‐access   repositories.  Discovery  layers  are  projected  to  help  create  the  next  generation  of  federated  search   engines  that  utilize  a  single  search  index  of  metadata  to  search  the  rising  volume  of  resources   available  for  libraries.[2][11][10][1]    While  not  systematic  yet,  results  from  a  number  of  usability   studies  on  these  discovery  layers  point  to  the  benefits  they  offer.     The  most  noteworthy  benefit  of  a  discovery  layer  is  its  seemingly  easy  to  use  unified  search   interface.  Jerry  Caswell  and  John  D.  Wynstra  studied  the  implementation  of  Ex  Libris  MetaLib   centralized  indexes  based  on  the  federated  search  technology  at  the  University  of  Northern  Iowa   Library.[8]  They  confirmed  how  the  easily  accessible  unified  interface  helped  users  to  search   multiple  relevant  databases  simultaneously  and  more  efficiently.  Lyle  Ford  concluded  that  the   Summon  discovery  layer  by  Serials  Solutions  fulfilled  students’  expectations  to  be  able  to  search   books  and  articles  together.[12]  Susan  Johns-­‐Smith  pointed  out  another  key  benefit  to  users:   customizability.[10]  The  Summon  discovery  layer  allowed  users  to  determine  how  much  of  the   machine-­‐readable  cataloging  (MARC)  record  was  displayed.  The  study  also  confirmed  how  the   unified  interface,  aligning  the  look  and  feel  among  databases,  increased  the  ease  of  use  for  end-­‐ users.  Michael  Gorrell  described  how  one  of  the  key  providers,  EBSCO,  gathered  input  from  users   and  considered  design  features  of  popular  websites,  to  implement  new  technologies  to  the   EBSCOhost  interface.[13]  Some  of  the  features  that  ease  the  usability  of  EBSCOhost  are  a  dynamic   date  slider,  an  article  preview  hover,  and  expandable  features  for  various  facets,  such  as  subject   and  publication.[2]     APPLYING  HIERARCHICAL  TASK  ANALYSIS  METHOD  TO  DISCOVERY  LAYER  EVALUATION  |  PROMANN  AND   ZHANG       80   Another  key  benefit  of  discovery  systems  is  the  speed  of  results  retrieval.  The  Primo  discovery   layer  by  Ex  Libris  has  been  complimented  for  its  ability  to  reduce  the  time  it  takes  to  conclude  a   search  session,  while  maximizing  the  volume  of  relevant  results  per  search  session.[14]  It  was   suggested  that  in  so  doing  the  tool  helps  introduce  users  to  new  content  types.  Yuji  Tosaka  and   Cathy  Weng  reported  how  records  with  richer  metadata  tend  to  be  found  more  frequently  and   lead  to  more  circulation.[15]  Similarly,  Luther  and  Kelly  reported  an  increase  in  overall  downloads,   while  the  use  of  individual  databases  decreased.[16]  These  studies  point  to  the  trend  of  an   enhanced  distribution  of  discovery  and  knowledge.     With  the  additional  metadata  of  item  records,  however,  there  is  also  the  increased  likelihood  of   inconsistencies  across  databases  that  are  brought  together  in  a  centralized  index.  A  study  by   Graham  Stone  offered  a  comprehensive  report  on  the  implementation  process  of  the  Summon   discovery  layer  at  the  University  of  Huddersfield,  highlighting  major  inconsistences  in  cataloging   practices  and  the  difficulties  it  caused  in  providing  consistent  journal  holdings  and  titles.[17]  This   casts  shadows  on  the  promise  of  better  findability.     Jeff  Wisniewski[18]  and  Williams  and  Foster[2]  are  among  the  many  who  espouse  discovery   layers  as  a  step  towards  a  truly  single  search  function  that  is  flexible  while  allowing  needed   customizability.  These  new  tools,  however,  are  not  without  their  limitations.  The  majority  of   usability  studies  reinforce  similar  results  and  focus  on  the  user  interface.  Fagan  et  al,  for  example,   studied  the  usability  of  EBSCO  Discovery  Service  at  James  Madison  University  (JMU).  While  most   tasks  were  accomplished  successfully,  the  study  confirmed  previous  warnings  that  users  do  not   understand  the  complexities  of  search  and  identified  several  interface  issues:  (1)  users  desire   single  search,  but  willingly  use  multiple  options  for  search,  (2)  lack  of  visibility  for  the  option  to   sort  search  results,  and  (3)  the  difficulty  in  finding  journal  articles.[3]     Yang  and  Wagner  offer  one  case  where  the  aim  was  to  evaluate  discovery  layers  against  a  check-­‐ list  of  12  features  that  would  define  a  true  ‘next  generation  catalogue’:     (1)  Single  point  of  entry  to  all  library  information,     (2)  State-­‐of-­‐the-­‐art  web  interface  (e.g.  Google  and  Amazon),     (3)  Enriched  content  (e.g.  book  cover  images,  ratings  and  comments),     (4)  Faceted  navigation  for  search  results,     (5)  Simple  keyword  search  on  every  page,     (6)  More  precise  relevancy  (with  circulation  statistics  a  contributing  factor),     (7)  Automatic  spell  check,     (8)  Recommendations  to  related  materials  (common  in  commercial  sites,  e.g.  Amazon),     (9)  Allowing  users  to  add  data  to  records  (e.g.  reviews),       INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  MARCH  2015   81   (10)  RSS  feeds  to  allow  users  to  follow  top  circulating  books  or  topic  related  updates  in  the   library  catalogue,     (11)  Links  to  social  networking  sites  to  allow  users  to  share  their  resources,     (12)  Stable  URL’s  that  can  be  easily  copied,  pasted  and  shared.  [11]     They  used  this  list  to  evaluate  seven  open  source  and  ten  proprietary  discovery  layers,  revealing   how  only  a  few  of  them  can  be  considered  true  ‘next  generation  catalogs’  supporting  the  users’   needs  that  are  common  on  the  Web.  All  of  the  tools  included  in  their  study  missed  precision  in   retrieving  relevant  search  results,  e.g.  based  on  transaction  data.  The  authors  were  impressed   with  open  source  discovery  layers  LibraryFind  and  VuFind,  which  had  10  of  the  12  features,   leaving  vendors  of  proprietary  discovery  layers  ranking  lower  (see  figure  1).     Figure  1.  17  discovery  layers  (x-­‐axis)  were  evaluated  against  a  checklist  of  12  features  expected  of   the  next  generation  catalogue  (y-­‐axis)   Yang  and  Wagner  theorized  that  the  relative  lack  of  innovation  among  commercial  discovery   layers  is  due  to  practical  reasons:  vendors  create  their  new  discovery  layers  to  run  alongside  older   ones,  rather  than  attempting  to  alter  the  proprietary  code  of  the  Integrated  Library  System’s  (ILS)   online  public  access  catalog  (OPAC).  They  pointed  to  the  need  for  “libraries,  vendors  and  the  open   source  community  […]  to  cooperate  and  work  together  in  a  spirit  of  optimism  and  collegiality  to   make  the  true  next  generation  catalogs  a  reality”.[11]  At  the  same  time,  the  University  of  Michigan   Article  Discovery  Working  Group  reported  on  vendors’  being  more  cooperative  and  allowing   coordination  among  products,  increasing  the  potential  of  web-­‐scale  discovery  services.[19]  How   to  evaluate  and  optimize  user  workflow  across  these  coordinating  products  remains  a  practical   9   9   9   8   7.5   7   7   7   6   6   6   5   5   4   2   1   0   1   2   3   4   5   6   7   8   9   10   Ranking  of  Discovery  Layers     (Yang  and  Wagner  2010,  707)       APPLYING  HIERARCHICAL  TASK  ANALYSIS  METHOD  TO  DISCOVERY  LAYER  EVALUATION  |  PROMANN  AND   ZHANG       82   challenge.  In  this  study,  we  propose  HTA  as  a  prospectively  helpful  method  to  evaluate  user   workflow  through  these  increasingly  complex  products.       Hierarchical  Task  Analysis   With  roots  in  Tylorism*,  industrial  psychology  and  system  processes,  task  analyses  continue  to   offer  valuable  insights  into  the  balance  of  efficiency  and  effectiveness  in  human-­‐computer   interaction  scenarios  [20][21].  Historically,  Frank  and  Lillian  Gilbreth  (1911)  set  forth  the   principle  of  hierarchical  task  analysis  (HTA),  when  they  broke  down  and  studied  the  individual   steps  involved  in  laying  bricks.  They  reduced  the  brick  laying  process  from  about  18  movements   down  to  four  (in  [21]).  But,  it  was  John  Annett  and  Keith  D.  Duncan  (1967)  who  introduced  HTA  as   a  method  to  better  evaluate  the  personnel  training  needs  of  an  organization.  They  used  it  to  break   apart  behavioral  aspects  of  complex  tasks  such  as  planning,  diagnosis  and  decision-­‐making  (see   in[22][21]).     HTA  helps  break  users  goals  into  subtasks  and  actions,  usually  in  a  visual  form  of  a  graphic  chart.   It  offers  a  practical  model  for  goal  execution,  allowing  designers  to  map  user  goals  to  the  system’s   varying  task  levels  and  evaluate  their  feasibility  [23].  In  so  doing,  HTA  offers  the  structure  with   which  to  learn  about  tasks  and  highlight  any  unnecessary  steps  and  potential  errors  that  might   occur  during  a  task  performance  [24][25],  whether  cognitive  or  physical.  Its  strength  lies  in  its   dual  approach  to  evaluation:  on  the  one  hand,  user  interface  elements  are  mapped  at  an  extremely   low  and  detailed  level  (to  individual  buttons),  while  on  the  other  hand,  each  of  these  interface   elements  gets  mapped  to  user’s  high-­‐level  cognitive  tasks  (the  cognitive  load).  This  informs  a   rigorous  design  approach,  where  each  detail  accounts  for  the  high-­‐level  user  task  it  needs  to   support.     The  main  limitation  of  classical  HTA  is  its  system-­‐centric  focus  that  does  not  account  for  the  wider   context  the  tasks  under  examination  exists  in.  The  field  of  human-­‐computer  interaction  has  shifted   our  understanding  of  cognition  from  an  individual  information  processing  model  to  a  networked   and  contextually  defined  set  of  interactions,  where  the  task  under  analysis  is  no  longer  confined  to   a  desktop  but  “extends  into  a  complex  network  of  information  and  computer-­‐mediated  interactions”   [26].  The  task  step  focused  HTA  does  not  have  the  ability  to  account  for  the  rich  social  and   physical  contexts  that  the  increasingly  mediated  and  multifaceted  activities  are  embedded  in.  HTA   has  been  reiterated  with  additional  theories  and  heuristics,  so  as  to  better  account  for  the   increasingly  more  complete  understanding  of  human  activity.       Advanced  task  models  and  analysis  methods  have  been  developed  based  on  the  principle  of  HTA.   Stuart  K.  Card,  Thomas  P.  Moran  and  Allen  Newell  [27]  proposed  an  engineering  model  of  human   performance  –  GOMS  (Goals,  Operators,  Methods,  and  Selection)  –  to  map  how  task  environment   features  determine  what  and  when  users  know  about  the  task  [20].  GOMS  have  been  expanded  to   cope  with  rising  complexities  (e.g.  [28][29][30]),  but  the  models  have  become  largely  impractical                                                                                                                             *  Tylorism  is  the  application  of  scientific  method  to  the  analysis  of  work,  so  as  to  make  it  more  efficient  and  cost-­‐effective.  Modern  task     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  MARCH  2015   83   in  the  process  [20].  Instead  of  simplistically  suggesting  cognitive  errors  are  due  to  interface  design,   Cognitive  Task  Analysis  (CTA)  attempts  to  address  the  underlying  mental  processes  that  most   often  give  rise  to  errors  [24].  Given  the  lack  of  our  structural  understanding  about  cognitive   processes,  the  analysis  of  cognitive  tasks  has  remained  problematic  to  implement  [20][31].   Activity  Theory  models  people  as  active  decision  makers  [20].  It  explains  how  users  convert  goals   into  a  set  of  motives  and  how  they  seek  to  execute  those  motives  as  a  set  of  interactions  in  a  given   situational  condition.  These  situational  conditions  either  help  or  prevent  the  user  from  achieving   the  intended  goal.  Activity  Theory  is  beginning  to  offer  a  coherent  foundation  to  account  for  the   task  context  [20],  but  it  has  yet  to  offer  a  disciplined  set  of  methods  to  execute  this  theory  in  the   form  of  a  task  analysis.     Even  though  task  analyses  have  seen  much  improvement,  adaptation  and  usage  in  its  near-­‐40-­‐ year-­‐long  existence  and  its  core  benefit  –  aiding  an  understanding  of  the  tasks  users  need  to   perform  to  achieve  their  desired  goals  –  have  remained  the  same.  Until  Activity  Theory,  CLA  and   other  contextual  approaches  are  developed  into  more  readily  applicable  analysis  frameworks,   classical  HTA  with  the  additional  layers  of  heuristics  guiding  the  analysis  remains  the  practical   option  [21].  Nielsen’s  Goal  Composition  [6]  offers  one  such  set  of  heuristics  applicable  for  the  web   context.  It  presents  usability  concepts  such  as  reuse,  multitasking,  automated  use,  recovering  and   retrieving,  to  name  a  few,  so  as  to  systematically  evaluate  the  HTA  charts  representing  the   interplay  between  an  interface  and  the  user.     Utility  of  HTA  for  evaluating  discovery  layers     Usability  testing  has  become  the  norm  in  validating  the  effectiveness  and  ease  of  use  of  library   websites.  Yet,  thirteen  years  ago,  Brenda  Battleson,  Austin  Booth  and  Jane  Weintrop  [32]   emphasized  the  need  to  support  user  tasks  as  the  crucial  element  to  user-­‐centered  design.  In   comparison  to  usability  testing,  HTA  offers  a  more  comprehensive  model  for  the  analysis  of  how   well  discovery  layers  support  users’  tasks  in  the  contemporary  library  context.  Considering  the   strengths  of  the  HTA  method  and  the  current  need  for  vendors  to  simplify  the  workflows  in  the   increasingly  complex  systems,  it  is  surprising  that  HTA  has  not  yet  been  applied  to  the  evaluation   of  discovery  layers.     This  paper  introduces  Hierarchical  Task  Analysis  (HTA)  as  a  solution  to  systematically  evaluate   the  workflow  of  discovery  layers  as  a  technology  that  helps  users  accomplish  specific  tasks,  herein,   retrieving  relevant  items  from  the  library  catalog  and  other  scholarly  collections.  Nielsen’s  [6]   Goal  Composition  heuristics,  designed  to  evaluate  usability  in  the  web  context,  is  used  to  guide  the   evaluation  of  the  user  workflow  via  the  HTA  task  maps.  As  a  process  (vs.  context)  specific   approach,  HTA  can  help  achieve  a  more  systematic  examination  of  the  tasks  discovery  layers   should  support,  such  as  finding  an  article,  a  book  or  an  eBook,  and  help  vendors  coordinate  to   achieve  the  full  potential  of  web-­‐scale  discovery  services.       APPLYING  HIERARCHICAL  TASK  ANALYSIS  METHOD  TO  DISCOVERY  LAYER  EVALUATION  |  PROMANN  AND   ZHANG       84   METHOD:  Applying  HTA  to  Primo  by  Ex  Libris   The  object  of  this  study  was  Purdue  University’s  Library  website,  which  was  re-­‐launched  with  Ex   Libris’  Primo  in  January  2013  (Figure  2)  to  serve  the  growing  student  and  faculty  community.  Its   3.6  million  indexed  records  are  visited  over  1.1  million  times  every  year.  Roughly  34%  of  these   visits  are  to  electronic  books.  According  to  Sharon  Q.  Yang  and  Kurt  Wagner  [11],  who  studied  17   different  discovery  layers,  Primo  ranked  the  best  among  the  commercial  discovery  layer  products,   coming  fourth  after  the  open  source  tools  Library  Find,  VuFind,  and  Scriblio  in  the  overall  rankings.   We  will  evaluate  how  efficiently  and  effectively  the  Primo  search  interface  supports  users’  of  the   Purdue  Libraries  tasks.         Figure  2.  Purdue  Library  front  page  and  search  box   Based  on  our  three  year  experience  of  user  studies  and  usability  testing  of  the  library  website,  we   identified  finding  an  article,  a  book  and  an  eBook  as  the  three  major  representative  scenarios  of   Purdue  Library  usage.  To  test  how  Primo  helps  its  users  and  how  many  cognitive  steps  it  requires   of  them,  each  of  the  three  scenarios  were  broken  into  three  or  four  specific  case  studies.  The  case   studies  were  designed  to  account  for  the  different  availability  categories  present  in  the  current   Primo  system,  e.g.  ‘full  text  available’,  ‘partial  availability’,  ‘restricted  access’  or  ‘no  access’.  This  is   because  the  different  availabilities  present  users  with  different  possible  frustrations  and  obstacles     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  MARCH  2015   85   to  task  accomplishment.  This  system-­‐design  perspective  could  offer  a  comparable  baseline  for   discovery  layer  evaluation  across  libraries.  A  full  list  of  the  eleven  case  studies  can  be  seen  below:     Find  an  Article:   Case  1.  The  library  has  only  a  full  electronic  text.   Case  2.  The  library  has  the  correct  issue  of  the  journal  in  print,  which  contains  the  article,  as   well  as,  a  full  electronic  text.   Case  3.  The  library  has  the  correct  issue  of  the  journal,  which  contains  the  article,  only  in   print.   Case  4.  The  library  does  not  have  the  full  text,  either  in  print  or  electronically.  A  possible   option  is  to  use  Inter  Library  Loan  (here  forth  ILL)  Request.     Find  a  book  (print  copy):   Case  5.  The  library  has  the  book  and  the  book  is  on  the  shelf.   Case  6.  The  library  has  the  book,  but  the  book  is  in  a  restricted  place,  such  as  The  Hicks   Repository.  The  user  has  to  request  the  book.   Case  7.  The  library  has  the  book,  but  it  is  either  on  the  shelf  or  in  a  repository.  The  user   would  like  to  request  the  book.   Case  8.  The  library  does  not  have  the  book.  Possible  options  are  UBorrow†  or  ILL.       Find  an  eBook:   Case  9.  The  library  has  the  full  text  of  the  eBook.     Case  10.  The  eBook  is  shown  in  search  results  but  the  library  does  not  have  full  text.   Case  11.  The  book  is  not  shown  in  search  results.  Possible  option  is  to  use  UBorrow  or  ILL.   It  is  generally  accepted  that  HTA  is  not  a  complex  analysis  method,  but  since  it  offers  general   guiding  principles  rather  than  a  rigorous  step-­‐by-­‐step  guide,  it  can  be  tricky  to  implement   [24][20][21][23].  Both  authors  of  this  study  have  expertise  in  applying  HTA  and  are  frequent   users  of  the  Purdue  Library’s  website.  We  are  familiar  with  the  library’s  commonly  reported   system  errors;  however,  all  of  our  case  studies  result  from  a  randomized  topic  search,  not  from   specific  reported  items.  To  achieve  consistent  HTA  charts  one  author  carried  out  the  identified   use-­‐cases  on  a  part-­‐time  basis  over  a  two-­‐month  period.  Each  case  was  executed  on  the  Purdue   Library  website,  using  the  Primo  discovery  layer.  An  on  campus  Hewlett-­‐Packard  (hp)  desktop   computer  with  Internet  Explorer  and  a  personal  MacBook  laptop  with  Safari  and  Google  Chrome   were  used  to  identify  any  possible  inconsistencies  between  user  experiences  on  different                                                                                                                             †  uBorrow  is  a  federated  catalog  and  direct  consortial  borrowing  service  provided  by  the  Committee  on  Institutional   Cooperation  (CIC).  uBorrow  allows  users  to  search  for,  and  request,  available  books  from  all  CIC  libraries,  which  includes   all  universities  in  the  Big  Ten  as  well  as  the  University  of  Chicago,  and  the  Center  for  Research  Libraries.     APPLYING  HIERARCHICAL  TASK  ANALYSIS  METHOD  TO  DISCOVERY  LAYER  EVALUATION  |  PROMANN  AND   ZHANG       86   operating  systems.  As  per  Stanton’s  [21]  statement  that  “HTA  is  a  living  documentation  of  the  sub-­‐ goal  hierarchy  that  only  exists  in  the  latest  state  of  revision”,  mapping  the  HTA  charts  was  an   iterative  process  between  the  two  authors.   According  to  David  Embrey  [24]  “the  analyst  needs  to  develop  a  measure  of  skill  [in  the  task]  in   order  to  analyze  a  task  effectively”  (2).  This  measure  of  skill  was  developed  in  the  process  of   finding  real  examples  (via  a  randomized  topic  search)  from  the  Purdue  Library  catalog  to  match   the  structural  cases  listed  above.  For  instance  ‘Case  1.  The  library  has  only  the  electronic  full  text’   was  turned  into  a  case  goal:  ‘0  Find  the  conference  proceeding  on  Network-­‐assisted  underwater   acoustic  communication'.  A  full  list  of  referenced  case  studies  is  below:   Find  an  Article:   Case  1.  Find  the  article  “Network-­‐assisted  underwater  acoustic  communication”  (Yang  and  Kevin,   2012).   Case  2.  Find  the  article  “Comparison  of  Simple  Potential  Functions  for  Simulating  Liquid  Water”   (Jorgensen  et  al.,  1983).   Case  3.  Find  the  journal  Design  Annual  “Graphis  Inc”  (2008).   Case  4.  Find  the  article  “A  technique  for  murine  irradiation  in  a  controlled  gas  environment”   (Walb,  M.  C.  et  al.,  2012).   Find  a  book  (in  print):   Case  5.  Find  the  book  Show  me  the  numbers:  Designing  tables  and  graphs  to  enlighten  (Few,   2004).   Case  6.  Find  the  book  The  Love  of  Cats  and  place  a  request  for  it  (Metcalf,  1973).   Case  7.  Find  the  book  The  Prince  and  place  a  request  for  it  (Machiavelli).   Case  8.  Find  the  book  The  Design  History  Reader  by  Maffei  and  Houze  (2010).  (UBorrow  or  ILL).     Find  an  eBook:   Case  9.  Find  the  Ebook  Handbook  of  Usability  Testing.  How  to  Plan,  Design  and  Conduct  Effective   Tests  (Rubin  and  Chisnell,  2008)   Case  10.  Find  the  Ebook  The  Science  of  Awakening  Consciousness:  Our  Ancient  wisdom  (Partly   available  via  Hathi  Trust)   Case  11.  Find  the  Ebook  Ancient  Awakening  by  Matthew  Bryan  Laube  (UBorrow).     HTA  descriptions  are  generally  diagrammatic  or  tabular.  Since  diagrams  are  easier  to  assimilate   and  promise  the  identification  of  a  larger  number  of  sub-­‐goals  [23],  diagrammatic  description   method  was  preferred  (Figure  2).  Each  analysis  started  with  the  establishment  of  sub-­‐goals,  such   as  ‘Browse  the  Library  website’  and  ‘Retrieve  the  Article’,  and  followed  with  the  identification  of   individual  small  steps  that  make  the  sub-­‐goal  possible,  e.g.  ‘Press  Search’  and  ‘Click  on  2,  to  go  to   page  2’  (Figures  3-­‐5).  Then,  additional  iterations  were  made  to  include:  (1)  cognitive  steps,  where     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  MARCH  2015   87   users  need  to  evaluate  the  screen  in  order  to  take  the  next  step  (e.g.  identifying  the  correct  URL  to   open  from  the  initial  results  set),  and  (2)  capture  cognitive  decision  points  between  multiple   options  for  users  to  choose  from.  For  instance,  items  can  be  requested  either  via  interlibrary  loan   (ILL)  or  UBorrow,  presenting  the  user  with  an  A  or  B  option,  requiring  cognitive  effort  to  make  a   choice.  Such  parallel  paths  were  color  coded  in  yellow  (Figure  2).  Both  physical  and  cognitive   steps  were  recorded  into  XMind‡,  a  free  mind  mapping  software.  They  were  color-­‐coded  black  and   gray,  respectively,  helping  visualize  the  volume  of  cognitive  decision  points  and  steps  (i.e.   cognitive  load).     Figure  3.  Full  HTA  chart  for  'Find  a  Book'  scenario  (CASE  5).  Created  in  Xmind.         Figure  4.  Zoom  in  to  steps  1  and  2  of  the  HTA  map  for  ‘Find  a  Book’  scenario  (CASE  5).  Created  in   Xmind.                                                                                                                               ‡ XMind is a free mind mapping software that allows structured presentation of step multiple coding references, the addition of images, links and extensive notes. http://www.xmind.net/   APPLYING  HIERARCHICAL  TASK  ANALYSIS  METHOD  TO  DISCOVERY  LAYER  EVALUATION  |  PROMANN  AND   ZHANG       88     Figure  5.  Zoom  in  to  step  3  of  the  HTA  map  for  the  'Find  a  Book'  scenario  (CASE  5).  Created  in   XMind.     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  MARCH  2015   89     Figure  6.  Zoom  in  to  step  4  of  the  HTA  map  of  the  'Find  a  Book'  scenario  (CASE  5).  Created  in   XMind.   To  organize  the  decision  flow  chart,  the  original  hierarchical  number  scheme  for  HTA  that   requires  every  sub-­‐goal  to  be  uniquely  numbered  with  an  integer  in  numerical  sequence  [21],  was   strictly  followed.  Visual  (screen  captures)  and  verbal  notes  on  efficient  and  inefficient  design   factors  were  taken  during  the  HTA  mapping  process  and  linked  directly  to  the  tasks  they  applied   to.  Steps,  where  interface  design  guided  the  user  to  the  next  step,  were  marked  ‘fluent’  with  a   green  tick  (figures  3  and  4).  Steps  that  were  likely  to  mislead  users  from  the  optimal  path  to  item   retrieval  and  were  a  burden  to  user’s  workflow  were  marked  with  a  red  ‘X’  (see  figures  4  and  5).   One  major  advantage  of  the  diagram  format  is  its  visual  and  structural  representation  of  sub-­‐goals   and  their  steps  in  a  spatial  manner  (See  figures  2-­‐5).  This  is  useful  for  gaining  a  quick  overview  of   the  workflow  [21].   When  exactly  to  stop  the  analysis  has  remained  undefined  for  HTA  [21].  It  is  at  the  discretion  of   the  analyst  to  evaluate  if  there  is  the  need  to  re-­‐describe  every  sub-­‐goal  down  to  the  most  basic   level,  or  whether  the  failure  to  perform  that  sub-­‐goal  is,  in  fact,  consequential  to  the  study  results.   We  decided  to  stop  evaluation  at  the  point  where  the  user  located  (a  shelf  number  or  reserve  pick   up  number)  or  received  the  sought  item  via  download.  Furthermore,  steps  that  were  perceived  as   possible  when  impossible  in  actuality  were  transcribed  into  the  diagrams.  Article  scenario  case  1   offers  an  example:  once  the  desired  search  result  was  identified,  its  green  dot  for  ‘full  text  available’     APPLYING  HIERARCHICAL  TASK  ANALYSIS  METHOD  TO  DISCOVERY  LAYER  EVALUATION  |  PROMANN  AND   ZHANG       90   was  likely  to  be  perceived  as  clickable,  when  in  actuality  it  was  not.  The  user  is  required  to  click  on   the  title  or  open  the  tab  ‘Find  Online’  to  access  the  external  digital  library  and  download  the   desired  article  (See  figure  7).       Figure  7.  Article  scenario  (CASE1)  two  search  results,  where  green  'full  text  available'  may  be   perceived  as  clickable.   Task  analysis  focuses  on  the  properties  of  the  task  rather  than  the  user.  This  requires  expert   evaluation  in  place  of  involving  users  in  the  study.  As  stated  above,  both  of  the  authors  are   working  experts  in  the  field  of  user  experience  in  the  library  context,  thoroughly  aware  of  the   tasks  under  analysis  and  how  they  are  executed  on  a  daily  basis.  A  group  of  12  (librarians,   reference  service  staff,  system  administrators  and  developers)  were  asked  to  review  the  HTA   charts  on  a  monthly  basis.  Feedback  and  implications  of  identified  issues  were  discussed  as  a   group.  According  to  Nielsen  [7]  it  takes  five  experts  (double  specialist  in  Nielsen’s  terms,  is  an   expert  in  usability  as  well  as  in  the  particular  technology  employed  by  the  software.)  to  not  have   significant  loss  of  findings  (See  figure  7).  Based  on  this  enumeration,  the  final  versions  of  the  HTA   charts  offer  accurate  representations  of  the  Primo  workflow  in  the  three  use  scenarios  of  finding   an  article,  finding  a  book  and  finding  an  eBook  at  Purdue  University  Libraries.       INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  MARCH  2015   91     Figure  8.  Average  proportion  of  usability  problems  found  as  a  function  of  number  of  evaluators  in   a  group  performing  heuristic  evaluation  [7].   RESULTS     The  reason  for  mapping  Primo’s  workflows  in  HTA  charts  was  to  identify  key  workflow  and   usability  issues  of  a  widely  used  discovery  layer  in  scenarios  and  contexts  it  was  designed  to   serve.  The  resulting  HTA  diagrams  offered  insights  into  fluent  steps  (green  ticks),  as  well  as   workflow  issues  (red  ‘X’)  present  in  Primo,  as  applied  at  Purdue  University  Libraries.  It  is  due  to   space  limitations,  that  only  the  main  findings  of  the  HTA  will  be  discussed.  The  full  results  are   published  on  Purdue  University  Research  Repository§.  Table  1  presents  how  many  parallel  routes   (A  vs.  B  route),  physical  steps  (clicks),  cognitive  evaluation  steps,  likely  errors  and  well  guided   steps  each  of  the  use  cases  had.     On  average  it  took  between  20  to  30  steps  to  find  a  relevant  item  within  Primo.  Even  though  no   ideal  step  count  has  been  identified  for  the  library  context,  this  is  quite  high  in  the  general  context   of  the  web,  where  fast  task  accomplishment  is  generally  expected.  Paul  Chojecki  [33]  tested  how   too  many  options  impact  usability  on  a  website.  He  revealed  that  the  average  step  count  to  lead  to   higher  satisfaction  levels  is  6  (vs.  18,16  average  steps  at  Purdue  Libraries).  In  our  study,  the   majority  of  the  steps  were  physical  pressing  of  a  button  or  filter  selection;  however,  cognitive   steps  took  up  just  under  a  half  of  the  steps  in  nearly  all  cases.  The  majority  of  cases  flow  well,  as   the  strengths  (fluent  well  guided  steps)  of  Primo  outweigh  its  less  guided  steps  that  easily  lend   themselves  to  the  chance  of  error.                                                                                                                                   § Task analysis cases and results for Ex Libris Primo. https://purr.purdue.edu/publications/1738   APPLYING  HIERARCHICAL  TASK  ANALYSIS  METHOD  TO  DISCOVERY  LAYER  EVALUATION  |  PROMANN  AND   ZHANG       92   CONTENT  TYPE   ARTICLES   BOOKS   EBOOKS   CASE  NUMBER   1   2   3   4   AVG   5   6   7   8   AVG   9   10   11   AVG   No.  of  decision  points   (between  A  &  B),  to   retrieve  an  item   5   8   4   4   5   4   5   5   2   4   6   3   2   4   Minimum  steps   possible  to  retrieve  an   item  (clicks  +  cognitive   decisions)     18   27   16   30   23   18   25   28   24   24   22   19   19   20   Of  these  minimum   steps,  how  many  were   cognitive  (information   evaluation  was  needed   to  proceed)     4   8   9   13   9   6   9   7   7   7   4   6   4   5   Maximum  steps  it  can   take  to  retrieve  an   item  (clicks  +  cognitive   decisions)   26   35   23   36   30   22   31   33   28   29   32   23   22   26   Of  these,  maximum   steps,  how  many  were   cognitive   10   17   14   15   14   10   13   16   8   12   9   8   5   7   Errors  (steps  that   mislead  from  optimal   item  retrieval)   3   15   4   8   8   2   2   4   3   3   13   1   2   5   Fluent  well  guided   steps  to  item  retrieval   11   11   9   8   10   7   8   7   5   7   6   4   3   5   Table  1.  Table  listing  each  case’s  key  task  measures,  and  each  scenario’s  averages.   Between  the  three  item  search  scenarios  –  Articles,  Books  and  Ebooks  –  the  retrieval  of  articles   was  least  guided  and  required  the  highest  amount  of  decisions  from  the  user  (5,  vs.  4  for  books   and  4  for  eBooks  on  average).  Retrieving  an  article  (between  23-­‐30  steps  on  average)  or  a  book   (24-­‐29  steps  on  average)  took  more  steps  to  accomplish  than  finding  a  relevant  eBook  (20-­‐26   steps  on  average).  The  high  volume  of  steps  (max  30  steps  on  average)  it  required  to  retrieve  an   article,  as  well  as  its  high  error  rate  (8),  were  due  to  the  higher  amount  of  cognitive  steps  (12   steps  on  average)  required  to  identify  the  correct  article  and  to  locate  a  hard  copy  (instead  of  the   relatively  easily  retrievable  online  copy).  In  the  book  scenario,  the  challenge  was  also  two-­‐fold:  on   the  one  hand,  it  was  challenging  to  verify  the  right  book  when  there  were  many  similar  results   (this  explains  the  high  number  of  12  cognitive  steps  on  average);  on  the  other  hand,  the  flow  to   place  a  request  for  a  book  was  also  a  challenge.  The  latter  was  a  key  contributor  to  the  higher   amount  of  physical  steps  required  for  retrieving  a  book  (max  29  on  average).         INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  MARCH  2015   93   Common  to  all  eleven  cases,  whether  articles  or  books,  was  the  four  sub-­‐goal-­‐process:  1)  Browse   the  Library  website,  2)  Find  results,  3)  Open  the  page  of  the  desired  item,  and  4)  Retrieve,  locate   or  order  the  item.  The  first  two  offered  near  identical  experiences,  no  matter  the  search  scenario   or  case.  Third  and  fourth  sub-­‐goals,  however,  presented  different  workflow  issues  depending  on   the  product  searched  and  its  availability,  e.g.  ‘in  print’  or  ‘online’.  As  such,  general  results  will  be   presented  for  the  first  two  themes,  while  scenario  specific  overviews  will  be  provided  for  the   latter  two  themes.   Browsing  the  Library  Website   Browsing  the  Library  website  was  easy  and  supported  different  user  tasks.  The  simple  URL   (lib.purdue.edu)  was  memorable  and  appeared  first  in  the  search  results.  The  immediate   availability  of  sub-­‐menus,  such  as  Databases  and  Catalogs,  offered  speedy  searching  for  the   frequent  users.  The  choice  between:  a)  general  URL,  or  b)  sub-­‐menu,  was  the  first  key  decision   point  users  of  Primo  at  Purdue  Libraries  were  presented  with.     The  Purdue  libraries’  home  page  (revisit  figure  1)  had  a  simple  design  with  a  clear,  central  and   visible  search  box.  Just  above  it  were  search  filters  for  articles,  books  and  the  Web.  This  was  the   second  key  decision  point  users  were  presented  with:  a)  they  could  either  type  into  the  search  bar   without  selecting  any  filters,  or  b)  they  could  select  a  filter  to  aid  the  focus  of  their  results  to  a   specific  item  type.  Browsing  the  library  website  offers  an  efficient  and  fluent  workflow,  with   eBooks  being  the  only  exception.  It  was  hard  to  know  whether  they  were  grouped  under  Articles   or  Books  &  Media  filters.  Confusingly  (at  the  time  of  the  study)  Purdue  Libraries  listed  eBooks  that   had  no  physical  copies  under  Articles,  while  other  eBooks  that  Purdue  had  physical  version  of  (in   addition  to  the  digital  ones)  under  Books  &  Media.  This  was  not  explained  in  the  interface,  nor  was   there  a  readily  available  tooltip.   Finding  Relevant  Results     Figure  9.  Search  results  for  Article  (CASE2)  ‘Comparison  of  Simple  Potential  Functions  for   Simulating  Liquid  Water’     APPLYING  HIERARCHICAL  TASK  ANALYSIS  METHOD  TO  DISCOVERY  LAYER  EVALUATION  |  PROMANN  AND   ZHANG       94   Primo  presented  the  search  results  in  an  algorithmic  order  of  relevance  offering  additional  pages   for  every  20  items  appearing  in  the  search  results.  The  search  bar  was  then  minimized  at  the  top   of  the  page,  available  for  easy  editing.  The  page  was  divided  into  two  key  sections,  where  the  first   quarter  entailed  filters  (e.g.  year  of  publishing,  resource  type,  author,  journal,  etc.),  and  the  other   three  quarters  was  left  for  search  results  (see  figure  8).  The  majority  of  cognitive  decisions  across   scenarios  were  made  on  this  results  page.  This  was  due  to  the  need  to  pick  up  the  cues  to  identify   and  verify  the  accurate  item  being  searched.  The  value  of  these  cognitive  steps  lies  in  their  leading   of  the  user  to  the  next  physical  steps.  As  discussed  in  the  next  section,  opening  the  page  of  the   desired  item,  there  were  several  elements  that  succeeded  and  failed  at  guiding  the  user  to  their   accurate  search  result.     Search  results  were  considered  relevant  when  the  search  presented  results  in  the  general  topic   area  of  the  searched  item.  Most  cases  in  most  scenarios  led  to  relevant  results,  however,  Book   CASE  8  and  eBook  CASE  11,  provided  only  unrelated  results.  Generally,  books  and  eBooks  were   easy  to  identify  as  available.  This  was  due  to  their  typically  short  titles,  which  took  less  effort  to   read.  Journal  articles,  on  the  other  hand,  have  longer  titles  and  required  more  cognitive  effort  to   be  verified.     Article  CASE  4,  Book  CASE  6  and  eBook  CASE  10  had  relevant  but  restricted  results.  The  color-­‐ coding  system  that  indicated  the  level  of  availability  for  the  presented  search  results:  green  (fully   available),  orange  (partly  available)  or  gray  (not  available)  dots  –  was  followed  by  an  explanatory   availability  tag,  e.g.  'Available  online'  or  'Full  text  available'  etc.  Tabs  represented  additional  cues,   offering  additional  information,  e.g.  ‘Find  in  Print’.  These  appeared  in  a  supplementary  way  where   applicable.  For  example,  if  an  item  was  not  available,  its  dot  was  gray  and  it  neither  had  the  'Find   in  Print'  nor  'Find  online'  tab.  Instead,  it  had  a  'Request'  tab,  guiding  the  user  towards  an  available   alternative  action.  Restricted  availability  items,  such  as  a  book  in  a  closed  repository,  had  an   orange  indicator  for  partial  availability.  For  these,  Primo  still  offered  the  'Find  in  Print'  or  'Find   Online'  tab,  whichever  was  appropriate.  While  the  overall  presentation  of  item  availability  was   clear  and  color-­‐coding  consistent,  the  mechanisms  were  not  without  their  errors,  as  discussed   below.   Opening  the  page  of  the  desired  item   This  sub-­‐goal  comprised  of  two  main  steps:  1)  information  driven  cognitive  steps,  which  help  the   user  identify  the  correct  item,  and  2)  user  interface  guided  physical  steps  that  resulted  in  opening   the  page  of  the  desired  item.     Frequent  strengths  that  helped  the  identification  of  relevant  items  across  the  scenarios  were  the   clearly  identifiable  labels  underneath  the  image  icons  (e.g.  'book’,  'article',  ‘conference  proceeding'),   hierarchically  structured  information  about  the  items  (title,  key  details,  availability)  and   perceivably  clickable  links  (blue  with  an  underlined  hover  effect).  The  labels  and  hierarchically   presented  details  (e.g.  year,  journal,  issue,  volume,  etc.)  helped  the  workflow  to  remain  smooth,     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  MARCH  2015   95   minimizing  the  need  to  use  side  filters.  The  immediate  details  reduced  the  need  to  open   additional  pages,  cutting  down  the  steps  needed  to  accomplish  the  task.  The  hover  effect  of  item   titles  made  the  link  look  and  feel  clickable,  guiding  the  user  closer  to  retrieving  the  item.  Color-­‐ coding  all  clickable  links  in  the  same  blue  was  also  an  effective  design  feature,  even  though  bolded   availability  labels  were  equally  prominent  and  clickable.  This  was  especially  true  for  articles   where  the  ‘full  text  available’  tags  correspond  to  users  goal  to  immediately  download  the  sought   item  (figure  8).   The  most  frequent  causes  of  errors  were  duplicated  search  results.  Generally,  Primo  displays   multiple  versions  of  the  same  item  into  one  search  result  and  offered  a  link:  ‘See  all  results’.  In  line   with  Graham  Stone’s  [17]  study,  which  highlighted  the  problem  of  cataloging  inconsistences,   Primo  struggled  to  consistently  grouping  all  overlapping  search  result  items.  Both  book  and   article  scenarios  suffered  from  at  least  one  duplicate  search  result  case  due  to  inconsistent  details.   Article  scenario  CASE  2  offers  an  example,  where  Jorgensen  et  al  “Comparison  of  Simple  Potential   Functions  for  Simulating  Liquid  Water”  (1983)  had  two  separate  results  for  the  same  journal   article  of  the  same  year  (first  two  results  in  figure  8).  Problematically,  the  two  results  offered   different  details  for  the  journal  issue  and  page  numbers.  This  may  cause  likely  referencing   problems  for  Primo  users.   Duplicated  search  results  were  also  an  issue  for  book  scenarios.  The  most  frequent  causes  for  this   were  instances  where  authors’  first  and  last  names  were  presented  in  a  reverse  order  (see  also   figure  8  for  article  CASE  2),  the  books  had  different  print  editions,  or  the  editors’  name  was  used   in  place  of  the  authors’.  Book  scenario  CASE  7:  Machiavelli’s  “The  Prince”  resulted  in  extremely   varied  results,  requiring  16  cognitive  steps  and  33  physical  steps  before  a  desired  item  could  be   verified.  This  is  where  search  filters  were  most  handy.  Problematically,  in  CASE  7,  Machiavelli  –   the  author  –  did  not  even  appear  in  the  author  filter  list,  while  Ebrary  Inc  was  listed.  Again,  this   points  to  the  inconsistent  metadata  and  the  effects  it  can  have  on  usability,  as  discussed  by  Stone.2   Other  workflow  issues  were  presented  by  design  details  such  as  the  additional  information  boxes   underneath  the  item  information,  e.g.  ‘find  in  print’,  ‘details’  and  ‘find  online’.  They  opened  a  small   scrollable  box  that  maintained  the  overall  page  view,  were  difficult  to  scroll.  The  arrow  kept   slipping  outside  of  the  box,  scrolling  the  entire  site’s  page  instead  of  the  content  inside  the  box.  In   addition,  the  information  boxes  did  not  work  well  with  Chrome.  This  was  especially  problematic   on  the  MacBook  where  after  a  couple  of  searches  the  boxes  failed  to  list  the  details  and  left  the   user  with  an  unaccomplished  task.  Comparably,  Safari  on  a  Mac  and  Internet  Explorer  on  a  PC   never  had  such  issues.       Retrieving  the  items  (call  number  or  downloading  the  PDF)   The  last  sub-­‐goal  was  to  retrieve  the  item  of  interest.  This  often  comprised  of  multiple  decision   points:  whether  to  retrieve  the  PDF  version  from  online  or  identify  a  call  number  for  the  physical     APPLYING  HIERARCHICAL  TASK  ANALYSIS  METHOD  TO  DISCOVERY  LAYER  EVALUATION  |  PROMANN  AND   ZHANG       96   copy  or  whether  to  place  a  request,  ordering  it  via  Inter  Library  Loan  (ILL)  or  UBorrow.  Each   option  is  briefly  discussed  below.     EBooks  and  Articles,  if  available  online,  offered  efficient  online  availability.  If  an  article  was   identified  for  retrieval,  there  were  two  options  to  access  the  link  to  the  database,  e.g.  ‘View  this   record  in  ACM’:  a)  via  the  full  view  of  the  item,  or  b)  small  ‘find  online’  preview  box  discussed   above.  Where  more  than  one  database  was  available,  information  about  the  publication  range  the   Library  holds  helped  identify  the  right  link  to  download  the  PDF  on  the  link-­‐resolver  page.  One  of   the  key  benefits  of  having  links  from  within  Primo  to  the  full  texts  was  the  fact  that  they  opened  in   new  browser  windows  or  tabs,  without  interference  to  other  ongoing  search.  While  a  few  of  the   PDF  links  to  downloadable  texts  were  difficult  to  find  through  some  external  database  sites,  once   found,  they  all  opened  in  Adobe  Reader  with  easy  options  to  either  'Save'  or  ‘Print’  the  material.     EBooks  were  available  via  Ebrary  or  EBL  libraries.  While  the  latter  offers  some  novel  uses,  such  as   audio  (i.e.  read  aloud),  neither  of  the  two  platforms  was  easy  to  use.  While  reading  online  was   possible,  downloading  an  eBook  was  challenging.  The  platform  seemed  to  offer  good  options:  a)   download  by  chapter,  b)  download  by  page  numbers,  or  c)  download  the  full  book  for  14  days.  In   actuality,  however,  these  were  all  unavailable.  EBook  CASE  9  had  chapters  longer  than  the  60-­‐page   limit  per  day.  Page  numbers  proved  difficult  to  use,  as  the  book’s  numbers  did  not  match  the  PDF’s   page  numbers.  This  made  it  hard  to  keep  track  of  what  was  downloaded  and  where  one  left  off  to   continue  later  (due  to  imposed  time-­‐limits).  The  14-­‐day  full  access  option  was  only  available  in   Adobe  Digital  Editions  software  (an  ebook  reader  software  by  Adobe  Systems  built  with  Adobe   Flash),  which  was  neither  available  on  most  campus  computers  nor  on  personal  laptops.     The  least  demanding  and  most  fluent  of  all  retrieval  options  was  the  process  of  identifying  the   location  and  call  number  for  physical  copies.  Inconsistent  metadata,  however,  posed  some   challenges.    Book  CASE  5  offered  a  merged  search  result  of  two  books,  but  listed  them  with   different  call  numbers  in  the  ‘Find  in  Print’  tab.  Libraries  have  many  physical  copies  of  the  same   book,  but  identifying  consistency  in  call  number  is  a  cognitive  step  that  helps  verify  the   similarities  or  differences  between  the  two  results.  The  different  call  numbers  raised  doubts  about   which  item  to  choose,  slowing  the  workflow  for  the  task  and  increasing  the  number  of  cognitive   steps  required  to  accomplish  the  task.     Compared  to  books,  finding  an  article  in  print  format  was  hardly  straightforward.  The  main  cause   for  error  when  looking  up  hard  copies  of  journals  was  the  fact  that  individual  journal  issues  did   not  have  individual  call  numbers  at  Purdue  Libraries.  Instead,  they  were  had  one  call  number  per   periodical  where  the  entire  journal  series  had  only  one  call  number.  Article  CASE  2,  for  example,   offered  the  journal  code:  530.5  J821  in  the  ‘Find  in  Print’  tab.  In  general,  the  tab  suffered  from  too   much  information,  poor  layout  and  unhelpful  information  hierarchy,  all  of  which  slowed  down  the   cognitive  tasks  of  verifying  whether  an  item  was  relevant  or  not.  It  listed  ‘Location’  and  ‘Holdings   range’  as  the  first  pieces  of  information,  wherein  ‘Holdings  range’  included  not  just  hard  copy   related  information,  but  listed  digital  items  as  well,  even  though  this  tab  was  for  physical  version     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  MARCH  2015   97   of  the  item.  To  illustrate,  Article  CASE  2  claimed  to  have  holdings  for  1900  –  2013,  whereas  hard   copies  were  only  available  for  1900-­‐2000,  and  digital  copies  for  2001-­‐2013.     Each  scenario  had  one  or  two  cases  where  there  were  neither  physical  nor  digital  options   available.  The  sub-­‐goal  commonly  comprised  of  a  decision  between  three  options:  c)  Placing  a   request,  d)  Ordering  an  item  via  Inter  Library  Loan  (ILL),  or  c)  Ordering  an  item  via  UBorrow.   While  the  ‘Signing  in  to  request’  option  and  ILL  were  easy  to  use  with  few  required  steps,  there   was  a  lack  of  guidance  on  how  to  choose  between  the  three  options.  Frequently,  ILL  and  UBorrow   appeared  as  equal  options  adjacent  to  one  another,  leaving  the  next  step  unguided.  Of  all  three,   placing  a  request  via  UBorrow  was  the  hardest  to  accomplish.  It  often  failed  to  present  any   relevant  results  on  the  first  results  page  of  the  UBorrow  system,  requiring  the  use  of  advanced   search  and  filters.  For  instance,  book  CASE  6  was  ‘not  requestable’  via  UBorrow.  When  it  did  list   the  sought  for  item  in  the  search  results  it  looped  back  to  Purdue's  own  closed  repository  (which   remained  unavailable).     DISCUSSION   The  goal  of  this  study  was  to  utilize  HTA  to  examine  the  workflow  of  the  Primo  discovery  layer  at   Purdue  University  Libraries.  Nielsen’s  [6]  Goal  Composition  heuristics  were  used  to  extend  the   task-­‐based  analysis  and  understand  the  tasks  in  the  context  of  discovery  layers  in  libraries.  Three   key  usability  domains:  generalization,  integration  and  user  control  mechanisms  were  used  as  an   analytical  framework  to  draw  usability  conclusions  about  how  Primo  was  supporting,  if  at  all,   successful  completion  of  the  three  scenarios.  The  next  three  sub-­‐sections  evaluate  and  offer  design   solutions  on  the  three  usability  domains  mentioned  above.  Overall,  this  study  confirmed  Primo’s   ability  to  reduce  the  workload  for  users  to  find  their  materials.  Primo  is  flexible  and  intuitive,   permitting  efficient  search  and  successful  retrieval  of  library  materials,  while  offering  the   possibility  of  many  search  sessions  at  once  [14].    A  comparison  to  a  usability  test  results  is  offered   as  a  way  of  conclusion.     Generalization  Mechanisms   Primo  can  be  considered  a  flexible  discovery  layer  as  it  helps  users  achieve  many  goals  with   minimum  amount  of  steps.  It  makes  use  of  several  generalization  mechanisms  that  allow  users  to   utilize  their  tasks  towards  many  goals  at  once.  For  instance,  the  library  website  result  in  Google   offers  not  only  the  main  URL  but  also  seven  sub-­‐links  to  specialist  Library  site  locations,  such  as   opening  hours  and  databases.  This  makes  Primo  accessible  and  relevant  for  a  broader  array  of   people  who  are  likely  to  have  different  goals.  For  instance,  some  may  seek  to  enter  a  specific   Database,  instead  of  having  to  open  Primo’s  landing  page  and  entering  the  search  terms.  Another   may  wish  to  utilize  ‘Find’,  which  guides  the  user,  one  step  at  a  time,  via  a  process  of  definition   elimination,  closer  to  the  item  they  are  looking  forknow  the  opening  times.   Similarly,  the  Primo  search  function  saves  already  typed  information,  both  on  its  landing  page  and   its  results  page.  This  facilitates  search  by  requiring  query  entry  only  once,  while  allowing  end     APPLYING  HIERARCHICAL  TASK  ANALYSIS  METHOD  TO  DISCOVERY  LAYER  EVALUATION  |  PROMANN  AND   ZHANG       98   users  to  click  on  different  filters  to  narrow  the  results  in  different  ways.  As  a  part  of  the  work  done   towards  one  search  can  be  used  towards  another,  e.g.  by  content,  journal,  or  topic  type,  the  system   can  ease  the  work  effort  required  of  users.  This  is  further  supported  by  the  system  saving  already   typed  keywords  when  returning  to  the  main  search  page  from  research  results  and  allows  for  a   fluid  search  experience  where  the  user  adjusts  a  keyword  thread  with  minimal  typing,  until  they   find  what  they  are  looking  for.   A  key  problem  for  Primo  is  its  inability  to  manage  inconsistent  meta-­‐data.  The  tendency  to  group   different  versions  of  the  same  search  results  together  is  helpful  as  it  clarifies  information  noise.  In   an  effort  to  enhance  the  speed  it  takes  to  evaluate  the  relevancy  of  search  results,  the  system  seeks   to  shighlight  any  differences  in  the  meta-­‐data.  If  inconsistencies  in  meta-­‐data  cause  same  search   results  to  appear  as  separate  items,  it  is  likely  to  affect  the  cognitive  steps  and  therefore  the   workload  and  efficiency  with  which  the  user  is  able  to  accomplish  identification.     It  is  clear  from  previous  studies  that  if  discovery  layers  were  to  become  the  next  generation   catalogs  [11],  and  were  to  enhance  the  speed  of  knowledge  distribution  as  has  been  hoped  by   Tosaka  and  Weng  [15]  and  Luther  and  Kelly  [16],  then  mutual  agreement  is  needed  on  how  meta-­‐ data  from  disparate  sources  [17].  Understanding  that  users’  cognitive  workload  should  be   minimized  (by  offering  fewer  options  and  more  directive  guidance)  for  more  efficient  decision-­‐ making,  library  items  should  have  accurate  details  in  their  meta-­‐data,  e.g.  consistent  and  thorough   volume,  issue  and  page  numbers  for  journal  articles,  correct  print  and  reprint  years  for  books,  and   item  type  (conference  proceeding  vs.  journal  article).   Integration  Mechanisms   The  discovery  layer’s  ability  to  increase  the  number  of  search  sessions  [14]  at  any  one  time  is   possible  due  to  its  flexibility  to  support  multitasking.  Primo  achieves  this  with  its  own  individual   features  used  in  combination  with  other  system  facilities  and  external  sources.  For  instance,   Primo’s  design  allows  users  to  review  and  compare  several  search  results  at  once  via  the  ‘Find  in   Print’  or  ‘Details’  tabs.  Although  not  perfect,  since  the  small  boxes  are  hard  to  scroll  within,  the   information  can  save  the  user  the  need  and  additional  steps  of  opening  many  new  windows  and   having  to  click  between  them  just  for  reviewing  search  results.  Instead,  many  ‘detail’  boxes  of   similar  results  may  be  opened  and  viewed  at  once,  allowing  for  effective  visual  comparison.  This   integration  mechanism  allows  a  fluent  transition  from  skimming  the  search  results  to  another   temporary  action  of  gaining  insight  about  the  relevance  of  an  item.  Most  importantly,  this  is   accomplished  without  requiring  the  user  to  open  a  new  browser  page  or  tab,  where  they  would   have  to  break  from  their  overall  search  flow  and  remember  the  details  (instead  of  visually   comparing  them),  making  it  hard  to  resume  from  where  they  left  off.     A  contrary  integration  mechanic  that  Primo  makes  use  of  is  its  smooth  automated  connectivity  to   external  sites,  such  as  databases,  Ebrary,  ILL,  etc.  New  browser  pages  are  used  to  allow  the   continuation  of  a  task  outside  of  Primo  itself  without  forcing  the  user  out  of  the  system  to  the     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  MARCH  2015   99   library  service  or  full  text.  Primo  users  can  skim  search  results,  identify  relevant  resources  and   open  them  in  new  browser  pages  for  later  reviewing.     What  is  missing,  however,  is  the  opportunity  to  easily  save  and  resume  a  search.  Retrieving  the   search  result  or  saving  it  under  ones’  login  details  would  benefit  users  who  recall  items  of  interest   from  previous  searches  and  would  like  to  repeat  the  results  without  having  to  remember  the   keywords  or  search  process  they  used.  It  is  not  obvious  how  to  locate  the  save  search  session   option  in  Primo’s  interface.   User  Control  Mechanisms   Yang  and  Wagner  [11]  ranked  Primo  highest  among  the  vendors,  primarily  for  its  good  user   control  mechanisms,  which  allow  users  to  inspect  and  change  the  search  functions  on  an  ongoing   basis.  Primo  does  a  good  job  at  presenting  search  results  in  a  quick  and  organized  manner.  It   allows  for  the  needed  ‘undo’  functionality  and  continued  attachment  and  removal  of  filters,  while   saving  the  last  keywords  when  clicking  the  back  button  from  search  results.  The  continuously   available  small  search  box  also  offers  the  flexibility  for  the  user  to  change  search  parameters   easily.  In  summary,  Primo  offers  agile  searching,  while  accounting  for  a  few  different  discovery   mental  models.     However,  if  Primo  wants  to  preserve  its  current  effectiveness  and  make  the  jump  towards  a  single   search  function  that  is  truly  flexible  and  allows  for  much  needed  customizability  [18][2],  it  needs   to  allow  for  several  similar  user  goals  to  be  easily  executable  without  confusion  about  the  likely   outcome.  The  most  prominent  current  system  error  for  Primo,  as  it  has  been  applied  in  the  Purdue   Libraries,  is  its  inability  to  differentiate  eBooks  from  journal  articles  or  books.  It  would  support   users  goals  to  be  able  to  start  and  finish  an  eBook  related  tasks  at  the  home  page’s  search  box.   Currently,  users  have  the  cognitive  burden  to  consider  whether  eBooks  are  more  likely  to  be   found  under  ‘Books  &  Media’  or  ‘Journals’.  Currently,  Primo,  as  applied  to  its  implementation  at   Purdue  Libraries  at  the  time  of  this  study,  does  not  support  goals  to  search  for  content  type,  e.g.  an   eBook.  This  however,  is  increasingly  popular  among  the  student  population  who  want  eBooks  on   their  tablets  and  phones  instead  of  carrying  heavy  books  in  their  backpacks.     Another  key  pain-­‐point  for  current  users  is  the  identification  of  specific  journals  in  physical  form,   say  for  archival  research.  Currently,  each  journal  issue  is  listed  individually  in  the  ‘find  in  print’   section,  even  though  the  journals  only  have  one  call  number.  Listing  all  volumes  and  issues  of  each   periodical  overwhelms  the  user  with  too  much  information  and  prevents  the  effective   accomplishment  of  the  task  of  locating  a  specific  journal  issue.  Since  there  is  only  one  call  number   available  for  the  entire  journal  sequence,  it  may  lead  to  better  clarity  and  usability  if  the   information  was  reduced.  Instead  of  listing  all  possible  journal  issues,  a  range  or  ranges  (if   incomplete  set  of  issues)  that  the  library  has  physically  present  should  be  listed.  In  Article  CASE  2,   for  instance,  there  are  five  items  for  the  year  1983.  Why  lead  the  user  to  look  at  a  range  where   there  is  no  possible  option?     APPLYING  HIERARCHICAL  TASK  ANALYSIS  METHOD  TO  DISCOVERY  LAYER  EVALUATION  |  PROMANN  AND   ZHANG       100   Comparing  HTA  to  a  Usability  Test   Usability  tests  benefit  from  the  invaluable  direct  input  from  the  end  user.  At  the  same  time   usability  studies,  as  constructed  conditions,  offer  limited  opportunities  to  learn  about  users’  real   motivations  and  goals  and  how  the  discovery  layers  support  or  fail  to  support  their  tasks.  Fagan   et  al  [3]  conducted  a  usability  test  with  eight  students  and  two  faculty  members  to  learn  about   usability  issues  and  user  satisfaction  with  discovery  layers.  They  measured  time,  accuracy  and   completion  rate  for  nine  specific  tasks,  and  obtained  insights  from  task  observations  and  post-­‐test   surveys.  They  reported  on  issues  with  users  not  following  directions  (93),  the  prevalence  of  time   outs,  users  skipping  tasks,  and  variable  task  times.    These  results  all  point  to  a  mismatch  between   the  user  goals  and  the  study  tasks  and  offer  an  incomplete  picture  about  the  system’s  ability  to   support  user  goals  that  are  accomplished  via  specific  tasks.   Expert  evaluation  based  HTA  method  does  not  require  users’  direct  input.  HTA  offers  a  method  to   achieve  a  relatively  complete  evaluation  of  how  low-­‐level  interface  facets  support  users’  high-­‐ level  cognitive  tasks.  HTA  measures  the  system  designs  quality  in  supporting  a  specific  task   needed  to  accomplish  a  user  goal.  Instead  of  measuring  time,  physical  and  cognitive  tasks  are   measured  in  number  of  steps.  Instead  of  accuracy  and  completion  rate,  fluent  workflow  steps  and   mistaken  steps  are  counted.  The  two  methods  offer  opposite  strengths,  making  them  a  good   complements.  Given  HTA’s  system-­‐centric  approach,  it  can  better  inform  which  tasks  would  be   useful  in  usability  testing.   To  compare  the  our  research  findings  with  usability  tests,  Fagan  et  al  [3]  confirmed  some  of  the   previously  established  findings  that  journal  titles  are  difficult  to  locate  via  the  library  home  page   (vs.  databases),  that  filters  are  handy  when  they  are  needed  and  that  users’  mental  models  have  a   preference  for  a  Google-­‐like  single  search-­‐box.  For  instance,  students  and  even  librarians,  struggle   to  understand  what  is  being  searched  in  each  system  and  how  results  are  ranked  (See  also  [5]).   The  HTA  method  applied  in  this  study  was  also  able  to  confirm  that  journal  titles  are  more   difficult  to  identify  than  books  and  eBooks,  the  flexibility  benefit  offered  by  filters  and  identify  the   single  search  box  as  a  fluent  system  design.  Since,  HTA  does  not  rely  on  the  user  to  tell  why  these   results  are  true,  HTA,  as  applied  in  this  study,  helped  expert  evaluators  understand  the  reasons   for  these  findings  via  self-­‐directed  execution  and  discussion  with  colleagues  later.  Depending  on   the  task  design,  either  usability  testing  or  HTA  offer  the  capabilities  to  identify  cases  such  as   confusion  about  how  to  start  an  eBook  search  in  Primo.  Taking  a  system  design  approach  to  task   design  offers  a  path  to  a  systematic  understanding  of  discovery  layer  usability,  which  lends  itself   to  easier  comparison  and  external  validity.     In  terms  of  specific  interface  features,  usability  tests  are  good  for  evaluating  the  visibility  of   specific  features.  For  example,  Fagan  et  al  [3]  asked  their  participants  to  (1)  search  on  speech   pathology,  (2)  find  a  way  to  limit  search  results  to  audiology,  and  then  (3)  limit  their  search   results  to  peer-­‐reviewed  (task  3  in  [3],  p.  95).  By  measuring  completion  rate,  they  were  able  to   identify  the  relative  failure  of  ‘peer-­‐reviewed’  over  ‘audiology’  filters,  but  they  were  left  “unclear     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  MARCH  2015   101   [about]  why  the  remaining  participants  did  not  attempt  to  alter  the  search  results  to  ‘peer  reviewed,”   failing  to  accomplish  the  task  [3].  In  comparison,  HTA  as  an  analytical  rather  than  observational   methodology,  leads  to  more  synthesized  results.  In  addition  to  insights  into  possible  gaps   between  system  design  and  mental  models,  HTA  as  a  goal-­‐oriented  approach,  concerns  itself  with   issues  of  workflow  (how  well  the  system  guides  the  user  to  accomplishing  their  task)  and   efficiency  (minimizing  the  number  of  steps  required  to  finish  a  task).  These  are  less  obvious  to   identify  with  usability  tests,  where  participants  are  not  impacted  by  their  routine  goals,  time   pressures  and  consequently  their  patience  may  be  more  tolerant  as  a  result.   The  application  of  HTA  helped  identify  key  workflow  issues  and  map  them  to  specific  design   elements.  For  instance,  the  lack  of  eBooks  as  a  search  filter  meant  that  the  current  system  did  not   support  content  form  based  searching  well  for  two  mains  forms:  articles  and  books.  Compared  to   usability  tests  that  focus  on  specific  fabricated  search  processes,  HTA  aims  to  map  all  possible   routes  the  system’s  design  offers  to  accomplish  a  goal,  allowing  for  their  parallel  existence  during   the  analysis.  This  system-­‐centered  approach  to  task  evaluation,  we  argue,  is  the  key  benefit  HTA   can  offer  towards  a  more  systematic  evaluation  of  discovery  layers,  where  different  user  groups   would  have  varying  levels  of  assistance  needs.  HTA  task-­‐analysis  allows  for  the  nuanced   understanding  that  results  can  differ  as  the  context  of  use  differs.  That  applies  even  to  the   contextual  difference  between  user  test  participants  and  routine  library  users.     CONCLUSION   Discovery  layers  are  advancing  the  search  experiences  libraries  can  offer.  With  increasing   efficiency,  increased  ease  of  use  and  more  relevant  results,  scholarly  search  has  become  a  far  less   frustrating  experience.  While  Google  is  still  perceived  as  the  holy  grail  of  discovery  experiences,  in   reality  it  may  not  be  quite  what  scholarly  users  are  after  [5].  The  application  of  discovery  layers   has  focused  on  eliminating  the  limitations  that  plagued  the  traditional  federated  search  and   improving  the  search  index  coverage  and  performance.  Usability  studies  have  been  effective  in   verifying  these  benefits  and  key  interface  issues.  Moving  forward,  studies  on  discovery  layers   should  focus  more  on  the  significance  of  discovery  layers  on  user  experience.   This  study  presents  the  expert  evaluation  based  HTA  methods  as  a  complementary  way  to   systematically  evaluate  popular  discovery  layers.  It  is  the  system  design  and  goal-­‐oriented   evaluation  approach  that  offers  the  prospects  of  a  more  thorough  body  of  research  on  discovery   layers  than  usability  alone.  Using  HTA  as  a  systematic  preliminary  study  guiding  formal  usability   testing  offers  one  way  to  achieve  more  comparable  study  results  on  applications  of  discovery   layers.  It  is  through  comparisons  that  the  discussion  of  discovery  and  user  experience  can  gain  a   more  focused  research  attention.  As  such,  HTA  can  help  vendors  to  achieve  the  full  potential  of   web-­‐scale  discovery  services.     To  better  understand  and  ultimately  design  to  their  full  potential,  systematic  studies  are  needed   on  discovery  layers.  This  study  is  the  first  attempt  to  apply  HTA  towards  systematically  analyzing   user  workflow  and  interaction  issues  on  discovery  layers.  The  authors  hope  to  see  more  work  in     APPLYING  HIERARCHICAL  TASK  ANALYSIS  METHOD  TO  DISCOVERY  LAYER  EVALUATION  |  PROMANN  AND   ZHANG       102   this  area,  with  the  hope  of  achieving  true  next  generation  catalogs  that  can  enhance  knowledge   distribution.         REFERENCES   [1]   Beth  Thomsett-­‐Scott  and  Patricia  E.  Reese,  “Academic  Libraries  and  Discovery  Tools:  A   Survey  of  the  Literature,”  College  &  Undergraduate  Libraries  19,  no.  2–4  (April  2012):  123– 143.  http://dx.doi.org/10.1080/10691316.2012.697009.     [2]     Sarah  C.  Williams  and  Anita  K.  Foster,  “Promise  Fulfilled?  An  EBSCO  Discovery  Service   Usability  Study,”  Journal  of  Web  Librarianship  5,  no.  3  (Jul.  2011):  179–198.   http://dx.doi.org/10.1080/19322909.2011.597590.     [3]     Jody  Condit  Fagan,  Meris  A.  Mandernach,  Carl  S.  Nelson,  Jonathan  R.  Paulo,  and  Grover   Saunders,  “Usability  Test  Results  for  a  Discovery  Tool  in  an  Academic  Library,”  Information   Technology  and  Libraries  31,  no.  1  (Mar.  2012):  83–112,  Mar.  2012.   http://dx.doi.org/10.6017/ital.v31i1.1855.   [4]     Roger  C.  Schonfeld  and  Matthew  P.  Long,  “Ithaka  S+R  US  Library  Survey  2013,”  Ithaka  S+R,     survey  2,  Mar.  2014.  http://sr.ithaka.org/research-­‐publications/ithaka-­‐sr-­‐us-­‐library-­‐survey-­‐ 2013.   [5]     Michael  Khoo  and  Catherin  Hall,  “What  Would  ‘Google’  Do?  Users’  Mental  Models  of  a  Digital   Library  Search  Engine,”  in  Theory  and  Practice  of  Digital  Libraries,  ed.  Panayiotis  Zaphiris,   George  Buchanan,  Edie  Rasmussen,  and  Fernando  Loizides,  1-­‐12  (Berlin  Heidelberg,   Springer:  2012).  http://dx.doi.org/10.1007/978-­‐3-­‐642-­‐33290-­‐6_1.   [6]     Jakob  Nielsen,  “Goal  Composition:  Extending  Task  Analysis  to  Predict  Things  People  May   Want  to  Do,”  Goal  Composition:  Extending  Task  Analysis  to  Predict  Things  People  May  Want  to   Do,  01-­‐Jan-­‐1994.  http://www.nngroup.com/articles/goal-­‐composition/.   [7]     Jakob  Nielsen,  “Finding  Usability  Problems  Through  Heuristic  Evaluation,”  in  Proceedings  of   the  SIGCHI  Conference  on  Human  Factors  in  Computing  Systems,  373-­‐380  (New  York,  NY,   ACM:  1992).  http://dx.doi.org/10.1145/142750.142834.   [8]     Jerry  V.  Caswell  and  John  D.  Wynstra,  “Improving  the  search  experience:  federated  search   and  the  library  gateway,”  Library  Hi  Tech  28,  no.  3  (Sep.  2010):  391–401.   http://dx.doi.org/10.1108/07378831011076648.   [9]     Emily  R.  Alling  and  Rachael  Naismith,  “Protocol  Analysis  of  a  Federated  Search  Tool:   Designing  for  Users,”  Internet  Reference  Services  Quarterly  12,  no.  1/2,  (2007):  195–210.   http://dx.doi.org/10.1300/J136v12n01_10.   [10]   Susan  Johns-­‐Smith,  “Evaluation  and  Implementation  of  a  Discovery  Tool,”  Kansas  Library   Association  College  and  University  Libraries  Section  Proceedings  2,  no.  1  (Jan.  2012):  17–23.     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  MARCH  2015   103   [11]   Sharon  Q.  Yang  and  Kurt  Wagner,  “Evaluating  and  comparing  discovery  tools:  how  close  are   we  towards  next  generation  catalog?,”  Library  Hi  Tech  28,  no.  4  (Nov.  2010):  690–709.   http://dx.doi.org/10.1108/07378831011096312.   [12]   Lyle  Ford,  “Better  than  Google  Scholar?,”  presentation,  Advance  Program  for  Internet   Librarian  2010,  Monterey,  California,  25-­‐Oct-­‐2010.   [13]   Michael  Gorrell,  “The  21st  Century  Searcher:  How  the  Growth  of  Search  Engines  Affected  the   Redesign  of  EBSCOhost,”  Against  the  Grain  20,  no.  3  (2008):  22,  24.   [14]   Sian  Harris,  “Discovery  services  sift  through  expert  resources,”  Research  Information,  no.  53,  ,   (Apr.  2011):  18–20.   http://www.researchinformation.info/features/feature.php?feature_id=315.   [15]   Yuji  Tosaka  and  Cathy  Weng,  “Reexamining  Content-­‐Enriched  Access:  Its  Effect  on  Usage  and   Discovery,”  College  &  Research  Libraries  72,  no.  5  (Sep.  2011):  pp.  412–427.   http://dx.doi.org/10.5860/.   [16]   Judy  Luther  and  Maureen  C.  Kelly,  “The  Next  Generation  of  Discovery,”  Library  Journal  136,   no.  5  (March  15,  2011):  66-­‐71.     [17]   Graham  Stone,  “Searching  Life,  the  Universe  and  Everything?  The  Implementation  of   Summon  at  the  University  of  Huddersfield,”  LIBER  Quarterly  20,  no.  1  (2010):  25–51.   http://liber.library.uu.nl/index.php/lq/article/view/7974.   [18]   Jeff  Wisniewski,  “Web  Scale  Discovery:  The  Future’s  So  Bright,  I  Gotta  Wear  Shades,”  Online   34,  no.  4  (Aug.  2010):  55–57.   [19]   Gaurav  Bhatnagar,  Scott  Dennis,  Gabriel  Duque,  Sara  Henry,  Mark  MacEachern,  Stephanie   Teasley,  and  Ken  Varnum,  “University  of  Michigan  Library  Article  Discovery  Working  Group   Final  Report,”  University  of  Michigan  Library,  Jan.  2010,   http://www.lib.umich.edu/files/adwg/final-­‐report.pdf     [20]   Abe  Crystal  and  Beth  Ellington,  “Task  analysis  and  human-­‐computer  interaction:  approaches,   techniques,  and  levels  of  analysis”  in  AMCIS  2004  Proeedings,  Paper  391,   http://aisel.aisnet.org/amcis2004/391.    [21]  Neville  A.  Stanton,  “Hierarchical  task  analysis:  Developments,  applications,  and  extensions,”   Applied  Ergonomics  37,  no.  1  (2006):  55–79.   [22]   John  Annett  and  Neville  A.  Stanton,  eds.  Task  Analysis,  1  edition.  London ;  New  York:  CRC   Press,  2000.   [23]   Sarah  K.  Felipe,  Anne  E.  Adams,  Wendy  A.  Rogers,  and  Arthur  D.  Fisk,  “Training  Novices  on   Hierarchical  Task  Analysis,”  Proceedings  of  the  Human  Factors  and  Ergonomics  Society   Annual  Meeting  54,  no.  23,  (Sep.  2010):  2005–2009,   http://dx.doi.org/10.1177/154193121005402321.     APPLYING  HIERARCHICAL  TASK  ANALYSIS  METHOD  TO  DISCOVERY  LAYER  EVALUATION  |  PROMANN  AND   ZHANG       104   [24]   D.  Embrey,  “Task  analysis  techniques,”  Human  Reliability  Associates  Ltd,  vol.  1,  2000.   [25]   J.  Reason,  “Combating  omission  errors  through  task  analysis  and  good  reminders,”  Quality  &   Safety  Health  Care  11,  no.  1  (Mar.  2002):  40–44,  http://dx.doi.org/10.1136/qhc.11.1.40.   [26]   James  Hollan,  Edwin  Hutchins,  and  David  Kirsh,  “Distributed  Cognition:  Toward  a  New   Foundation  for  Human-­‐computer  Interaction  Research,”  ACM  Trans.  Comput.-­‐Hum.  Interact   7,  no.  2  (Jun.  2000):  174–196,    http://dx.doi.org/10.1145/353485.353487.   [27]   Stuart  K.  Card,  Allen  Newell,  and  Thomas  P.  Moran,  The  Psychology  of  Human-­‐Computer   Interaction.  Hillsdale,  NJ,  USA:  L.  Erlbaum  Associates  Inc.,  1983.   [28]   Stephen  J.  Payne  and  T.  R.  G.  Green,  “The  structure  of  command  languages:  an  experiment  on   task-­‐action  grammar,”  International  Journal  of  Man-­‐Machine  Studies  30,  no.  2  (Feb.  1989):   213–234.   [29]   Bonnie  E.  John  and  David  E.  Kieras,  “Using  GOMS  for  User  Interface  Design  and  Evaluation:   Which  Technique?,”  ACM  Transactions  on  Computer-­‐Human  Interactions  3,  no.  4  (Dec.  1996):   287–319,  http://dx.doi.org/10.1145/235833.236050.   [30]   David  E.  Kieras  and  David  E.  Meyer,  “An  Overview  of  the  EPIC  Architecture  for  Cognition  and   Performance  with  Application  to  Human-­‐computer  Interaction,”  Human-­‐Computer   Interaction  12,  no.  4  (Dec.  1997):  391–438,   http://dx.doi.org/10.1207/s15327051hci1204_4.   [31]   Laura  G.  Militello  and  Robert  J.  Hutton,  “Applied  cognitive  task  analysis  (ACTA):  a   practitioner’s  toolkit  for  understanding  cognitive  task  demands,”  Ergonomics  41,  no.  11   (Nov.  1998):    1618–1641,  http://dx.doi.org/10.1080/001401398186108.   [32]   Brenda  Battleson,  Austin  Booth,  and  Jane  Weintrop,  “Usability  testing  of  an  academic  library   Web  site:  a  case  study,”  The  Journal  of  Academic  Librarianship  27,  no.  3  (May  2001):  188– 198.   [33]   Paul  Chojecki,  “How  to  increase  website  usability  with  link  annotations,”  in  20th   International  Symposium  on  Human  Factors  in  Telecommunication.  6th  European  Colloquium   for  User-­‐Friendly  Product  Information.  Proceedings,  2006,  p.  8.             Case  Study  References:     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  MARCH  2015   105     Find  an  Article:   Case  1.  Yang,  T.  C.,  and  Kevin  D.  Heaney.  "Network-­‐Assisted  Underwater  Acoustic   Communications."  In  Proceedings  of  the  Seventh  ACM  International  Conference  on  Underwater   Networks  and  Systems,  p.  37.  ACM,  2012.   Case  2.  Jorgensen,  William  L.,  Jayaraman  Chandrasekhar,  Jeffry  D.  Madura,  Roger  W.  Impey,  and   Michael  L.  Klein.  "Comparison  of  Simple  Potential  Functions  for  Simulating  Liquid  Water."  The   Journal  of  Chemical  Physics  79  (1983):  926.   Case  3.  “Design  Annual.”  Graphis  Inc.,  2008   Case  4.  Walb,  M.  C.,  J.  E.  Moore,  A.  Attia,  K.  T.  Wheeler,  M.  S.  Miller,  and  M.  T.  Munley.  "A   Technique  for  Murine  Irradiation  in  a  Controlled  Gas  Environment."  Biomedical  Sciences   Instrumentation  48  (2012):  470.   Find  a  Book  (physical):   Case  5.  Few,  Stephen.  Show  Me  the  Numbers:  Designing  Tables  and  Graphs  to  Enlighten.  Vol.  1,   no.  1.  Oakland,  CA:  Analytics  Press,  2004.   Case  6.  Metcalf,  Christine.  The  Love  of  Cats.  Crescent  Books,  1973.   Case  7.  Machiavelli,  Niccolò,  and  Leo  Paul  S.  De  Alvarez.  1989.  The  Prince.  Prospect  Heights,  Ill:   Waveland  Press.   Case  8.  Lees-­‐Maffei,  Grace,  and  Rebecca  Houze,  eds.  The  Design  History  Reader.  Berg,  2010.   Find  an  eBook:   Case  9.  Rubin,  Jeffrey,  and  Dana  Chisnell.  Handbook  of  Usability  Testing:  How  to  Plan,  Design,   and  Conduct  Effective  Tests.  Wiley  Technical  Communication  Library,  2008.   Case  10.  Rubin,  Jeffrey,  and  Dana  Chisnell.  Handbook  of  Usability  Testing:  How  to  Plan,  Design,   and  Conduct  Effective  Tests.  Wiley  Technical  Communication  Library,  2008.   Case  11.  Laube,  Matthew  Bryan.  Ancient  Awakening.  2010.       5607 ---- Microsoft Word - March_ITAL_stuart_TC proofread.docx Measuring  Journal  Linking  Success     from  a  Discovery  Service       Kenyon  Stuart,   Ken  Varnum,  and   Judith  Ahronheim       INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  MARCH  2015             52   ABSTRACT   Online  linking  to  full  text  via  third-­‐party  link-­‐resolution  services,  such  as  Serials  Solutions  360  Link  or   Ex  Libris’  SFX,  has  become  a  popular  method  of  access  to  users  in  academic  libraries.  This  article   describes  several  attempts  made  over  the  course  of  the  past  three  years  at  the  University  of  Michigan   to  gather  data  on  linkage  failure:  the  method  used,  the  limiting  factors,  the  changes  made  in  methods,   an  analysis  of  the  data  collected,  and  a  report  of  steps  taken  locally  because  of  the  studies.  It  is  hoped   that  the  experiences  at  one  institution  may  be  applicable  more  broadly  and,  perhaps,  produce  a   stronger  data-­‐driven  effort  at  improving  linking  services.   INTRODUCTION   Online  linking  via  vended  services  has  become  a  popular  method  of  access  to  full  text  for  users  in   academic  libraries.  But  not  all  user  transactions  result  in  access  to  the  desired  full  text.   Maintaining  information  that  allows  the  user  to  reach  full  text  is  a  shared  responsibility  among   assorted  vendors,  publishers,  aggregators,  local  catalogers,  and  electronic  access  specialists.  The   collection  of  information  used  in  getting  to  full  text  can  be  thought  of  as  a  supply  chain.  To   maintain  this  chain,  libraries  need  to  enhance  the  basic  information  about  the  contents  of  each   vendor  package—a  collection  of  journals  bundled  for  sale  to  libraries—with  added  details  about   local  licenses  and  holdings.  These  added  details  need  to  be  maintained  over  time.  Since  links,   platforms,  contracts,  and  subscriptions  change  frequently,  this  can  be  a  time-­‐consuming  process.   When  links  are  unsuccessfully  constructed  within  each  system,  considerable  troubleshooting  of  a   very  complex  process  is  required  to  determine  where  the  problem  lies.  Because  so  much  of  the   transaction  is  invisible  to  the  user,  linking  services  have  come  to  be  taken  for  granted  by  the   community,  and  performance  expectations  are  very  high.  Failure  to  reach  full  text  reflects  poorly   on  the  institutions  that  offer  the  links,  so  there  is  considerable  interest  for  and  value  to  the   institution  in  improving  performance.     Kenyon  Stuart  (kstuart@umich.edu)  is  Senior  Information  Resources  Specialist,  Ken  Varnum   (kvarnum@umich.edu)  is  Web  Systems  Manager,  and  Judith  Ahronheim  (jaheim@umich.edu)  is   Head,  Electronic  Resource  Access  Unit,  University  of  Michigan  Library,  Ann  Arbor,  Michigan.       MEASURING  JOURNAL  LINKING  SUCCESS  FROM  A  DISCOVERY  SERVICE  |  STUART,  VARNUM,  AND   AHRONHEIM   53   Improving  the  success  rate  for  users  can  best  be  achieved  by  acquiring  a  solid  understanding  of   the  nature  and  frequency  of  problems  that  inhibit  full-­‐text  retrieval.  While  anecdotal  data  and   handling  of  individual  complaints  can  provide  incremental  improvement,  larger  improvement   resulting  from  systematic  changes  requires  more  substantial  data,  data  that  characterizes  the   extent  of  linking  failure  and  the  categories  of  situations  that  inhibit  it.   LITERATURE  REVIEW   OpenURL  link  resolvers  are  “tool[s]  that  helps  library  users  connect  to  their  institutions’   electronic  resources.  The  data  that  drives  such  a  tool  is  stored  in  a  knowledge  base.”1  Since  the   codification  of  the  OpenURL  as  an  ANSI/NISO  standard  in  2004,2  OpenURL  has  become,  in  a  sense,   the  glue  that  holds  the  infrastructure  of  traditional  library  research  together,  connecting  citations   and  full  text.  It  is  well  recognized  that  link  resolution  is  an  imperfect  science.  Understanding  what   and  how  OpenURLs  fail  is  a  time-­‐consuming  and  labor-­‐intensive  process,  typically  conducted   through  analysis  of  log  files  recording  attempts  by  users  to  access  a  full-­‐text  item  via  OpenURL.   Research  has  been  conducted  from  the  perspective  of  OpenURL  providers,  showing  which   metadata  elements  encoded  in  an  OpenURL  were  most  common  and  most  significant  in  leading  to   an  appropriate  full-­‐text  version  of  the  article  being  cited.  In  2010,  Chandler,  Wiley,  and  LeBlanc   reported  on  a  systematic  approach  they  devised,  as  part  of  a  Mellon  grant,  to  review  the  outbound   OpenURLs  from  L’Année  Philologique.3  They  began  with  an  analysis  of  the  metadata  elements   included  in  each  OpenURL  and  compared  this  to  the  standard.  They  found  that  elements  critical  to   the  delivery  of  a  full-­‐text  item,  such  as  the  article’s  starting  page,  were  never  included  in  the   OpenURLs  generated  by  L’Année  Philologique.4  Their  work  led  to  the  creation  of  the  Improving   OpenURLs  Through  Analytics  (IOTA)  working  group  within  the  National  Information  Standards   Organization  (NISO).   IOTA,  in  turn,  was  focused  on  improving  OpenURL  link  quality  at  the  provider  end.  “The  quality  of   the  data  in  the  link  resolver  knowledge  base  itself  is  outside  the  scope  of  IOTA;  this  is  being   addressed  through  the  NISO  KBART  initiative.”5,6  Where  IOTA  provided  tools  to  content  providers   for  improving  their  outbound  OpenURLs,  KBART  provided  tools  to  knowledge  base  and  linking   tool  providers  for  improving  their  data.  Pesch,  in  a  study  to  validate  the  IOTA  process,  discovered   that  well-­‐formed  OpenURLs  were  generally  successful,  however:   The  quality  of  the  OpenURL  links  is  just  part  of  the  equation.  Setting  the  proper  expectations   for  end  users  also  need  to  be  taken  into  consideration.  Librarians  can  help  by  educating  their   users  about  what  is  expected  behavior  for  a  link  resolver  and  end  user  frustrations  can  also  be   reduced  if  librarians  take  advantage  of  the  features  most  content  providers  offer  to  control   when  OpenURL  links  display  and  what  the  links  say.  Where  possible  the  link  text  should   indicate  to  the  user  what  they  will  get  when  they  click  it.7       INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  MARCH  2015   54   Missing  from  the  standards-­‐based  work  described  above  is  the  role  of  the  OpenURL  middleman,   the  library.  Price  and  Trainor  describe  a  method  for  reviewing  OpenURL  data  and  identifying  the   root  causes  of  failures.  8  Through  testing  of  actual  OpenURLs  in  each  of  their  systems,  they  arrived   at  a  series  of  steps  that  could  be  taken  by  other  libraries  to  proactively  raise  OpenURL  resolution   success  rates.  Several  specific  recommendations  include  “optimize  top  100  most  requested   journals”  and  “optimize  top  ten  full  text  target  providers.”9  That  is,  make  sure  that  OpenURLs   leading  to  content  from  the  most  frequently  used  journals  and  content  sources  are  tested  and  are   functioning  correctly.  Chen  describes  a  similar  analysis  of  broken  link  reports  derived  from   Bradley  University  library’s  SFX  implementation  over  four  years,  with  a  summary  of  the  common   reasons  links  failed.10  Similarly,  O’Neill  conducted  a  small  usability  study  whose  recommendations   included  providing  “a  system  of  support  accessible  from  the  page  where  users  experience   difficulty,”11  although  her  recommendations  focused  on  inline,  context-­‐appropriate  help  rather   than  error-­‐reporting  mechanisms.   Not  found  in  the  literature  are  several  systematic  approaches  that  a  library  can  take  to  proactively   collect  problem  reports  and  manage  the  knowledge  base  accordingly.   METHOD   We  have  taken  a  two-­‐pronged  approach  to  improving  link  resolution  quality,  each  relying  on  a   different  kind  of  input.  The  first  uses  problem  reports  submitted  by  users  of  our  SummonTM-­‐ powered  article  discovery  tool,  ArticlesPlus.12  The  second  focuses  on  the  most  commonly-­‐accessed   full-­‐text  titles  in  our  environment,  based  on  reports  from  360  Link.  We  have  developed  this  dual   approach  in  the  expectation  that  we  will  catch  more  problems  on  lesser-­‐used  full-­‐text  sources   through  the  first  approach,  and  problems  whose  resolution  will  benefit  the  most  individuals   through  the  second.   User  Reports   The  University  of  Michigan  Library  uses  Summon  as  the  primary  article  discovery  tool.  When  a   user  completes  a  search  and  clicks  the  “MGet  It”  button  (see  figure  1)—MGet  It  is  our  local  brand   for  the  entire  full-­‐text  delivery  process—the  user  is  directed  to  the  actual  document  through  one   of  two  mechanisms:   1. Access  to  the  full-­‐text  article  through  a  Summon  Index-­‐Enhanced  Direct  Link.  (Some  of   Summon’s  full-­‐text  content  providers  contribute  a  URL  to  Summon  for  direct  access  to  the   full  text.  This  is  known  as  an  Index-­‐Enhanced  Direct  Linking  [Direct  Linking].)   2. Access  to  the  full  text  article  through  the  University  Library’s  link  resolver,  360  Link.  At  this   point,  one  of  two  things  can  happen:   a. The  University  Library  has  configured  a  number  of  full-­‐text  sources  as  “direct  to  full   text”  links.  When  a  citation  leads  to  one  of  these  sources,  the  user  is  directed  to  the   article  (or  as  close  to  it  as  the  content  provider’s  site  allows  (sometimes  to  an  issue       MEASURING  JOURNAL  LINKING  SUCCESS  FROM  A  DISCOVERY  SERVICE  |  STUART,  VARNUM,  AND   AHRONHEIM   55   table  of  contents,  sometimes  to  a  list  of  items  in  the  appropriate  volume,  and—rarely  in   this  instance——to  the  journal’s  front  page;  the  last  outcome  is  rare  in  our  environment   because  the  University  Library  prefers  full-­‐text  links  that  get  closer  to  the  article  and   has  configured  360  Link  for  that  outcome).     b. For  those  full-­‐text  sources  that  do  not  have  direct-­‐to-­‐article  links,  360  Link  is   configured  to  provide  a  range  of  possible  delivery  mechanisms,  including  journal-­‐,   volume-­‐  or  issue-­‐level  entry  points,  document-­‐delivery  options  (for  cases  where  the   library  does  not  license  any  full-­‐text  online  sources),  the  library  catalog  (for  identifying   print  holdings  for  a  journal),  and  so  on.   From  the  user  perspective,  mechanisms  1  and  2a  are  essentially  identical.  In  both  cases,  a  click  on   the  MGet  It  icon  takes  the  user  to  the  full  text  in  a  new  browser  window.  If  the  link  does  not  lead  to   the  correct  article  for  any  reason,  there  is  no  way  in  the  new  window  for  the  library  to  collect  that   information.  Users  may  consider  item  2b  results  as  a  failure  because  the  article  is  not  immediately   perceptible,  even  if  the  article  is  actually  available  in  full  text  after  two  or  more  subsequent  clicks.   Because  of  this  user  perception,  we  interpreted  2b  results  as  “failures.”     Figure  1.  Sample  Citation  from  ArticlesPlus   In  an  attempt  to  understand  this  type  of  problem,  following  the  advice  given  by  O’Neill  and  Chen,   we  provide  a  problem-­‐reporting  link  in  the  ArticlesPlus  search-­‐results  interface  each  time  the  full-­‐ text  icon  appears  (see  the  right  side  of  figure  1).  When  the  user  clicks  this  problem-­‐reporting  link,   they  are  taken  to  a  Qualtrics  survey  form  that  asks  for  several  basic  pieces  of  information  from  the   user  but  also  captures  the  citation  information  for  the  article  the  user  was  trying  to  reach  (see   figure  2).         INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  MARCH  2015   56     Figure  2.  Survey  Questionnaire  for  Reporting  Linking  Problems   This  survey  instrument  asks  the  user  to  characterize  the  type  of  delivery  failure  with  one  of  four   common  problems,  along  with  an  “other”  text  field:   • There  was  no  article   • I  got  the  wrong  article   • I  ended  up  at  a  page  on  the  journal's  web  site,  but  not  the  article   • I  was  asked  to  log  in  to  the  publisher's  site   • Something  else  happened  (please  explain):   The  form  also  asks  for  any  additional  comments  and  requires  that  the  user  provide  an  email   address  so  that  library  staff  can  contact  the  user  with  a  resolution  (often  including  a  functioning   full-­‐text  link)  or  to  ask  for  more  information.   In  addition  to  the  information  requested  from  the  user,  hidden  fields  on  this  form  capture  the   Summon  record  ID  for  the  article,  the  IP  address  of  the  user’s  computer  (to  help  us  identify  if  the   problem  could  be  a  related  to  our  EZProxy  configuration),  a  time  and  date  stamp  of  the  report’s   submission,  and  the  brand  and  version  of  web  browser  being  used.       MEASURING  JOURNAL  LINKING  SUCCESS  FROM  A  DISCOVERY  SERVICE  |  STUART,  VARNUM,  AND   AHRONHEIM   57   The  results  of  the  form  are  sent  by  email  to  the  University  Library’s  Ask  a  Librarian  service,  the   library’s  online  reference  desk.  Ask  a  Librarian  staff  make  sure  that  the  problem  is  not  associated   with  the  individual  user’s  account  (that  they  are  entitled  to  get  full  text,  that  they  were  accessing   the  item  from  on  campus  or  via  the  proxy  server  or  VPN,  etc.).  When  user-­‐centric  problems  are   ruled  out,  the  problem  is  passed  on  to  the  library’s  Electronic  Access  Unit  in  Technical  Services  for   further  analysis  and  resolution.   Random  Sampling   User-­‐reported  problems  are  only  one  picture  of  issues  in  the  linking  process.  We  were  concerned   that  user  reports  might  not  be  the  complete  story.  We  wanted  to  ensure  that  our  samples   represented  the  full  range  of  patron  experiences,  not  just  that  of  the  ones  who  reported.  So,  to  get   a  different  perspective,  we  instituted  a  series  of  random  sample  testing  using  logs  of  document   requests  from  the  link  resolver,  360  Link.   2011  Linking  Review   Our  first  large-­‐scale  review  of  linking  from  ArticlesPlus  was  conducted  in  2011.  This  first  approach   was  based  on  a  log  of  the  Summon  records  that  had  appeared  in  patron  searches  and  for  which   our  link  resolver  link  had  been  clicked.  For  this  test  we  chose  a  slice  of  the  log  covering  the  period   from  January  30–February  12,  2011.  This  period  was  chosen  because  it  was  well  into  the  academic   term  and  before  Spring  Break,  so  it  would  provide  a  representative  sample  of  the  searches  people   had  performed.  The  resulting  slice  contained  13,161  records.  For  each  record  the  log  contained   the  Summon  ID  of  the  record.  We  used  this  to  remove  duplicate  records  from  the  log  to  ensure  we   were  not  testing  linking  for  the  same  record  more  than  once,  leaving  us  with  a  spreadsheet  of   10,497  records,  one  record  per  row.  From  the  remaining  records  we  chose  a  sample  of  685   records  using  a  random  number  generator  tool,  Research  Randomizer   (http://www.randomizer.org/form.htm),  to  produce  a  random,  nonduplicating  list  of  685   numbers  with  values  from  1  to  10,497.  Each  of  the  685  numbers  produced  was  matched  to  the   corresponding  row  in  the  spreadsheet  starting  with  the  first  record  listed  in  the  spreadsheet.  For   each  record  we  collected  the  data  in  figure  3.                 INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  MARCH  2015   58   1.  The  Summon  ID  of  the  record   2.  The  raw  OpenURL  provided  with  the  record.   3.  A  version  of  the  OpenURL  that  may  have  been  locally  edited  to  put  dates  in  a  standard   format.   4.  The  final  URL  provided  to  the  user  for  linking  to  the  resource.  This  would  usually  be   the  OpenURL  from  #3  containing  the  metadata  used  by  the  link  resolver  to  build  its   full-­‐text  links.  Currently  it  is  an  intermediary  URL  provided  by  the  Summon  API.  This   URL  may  lead  to  an  OpenURL  or  to  a  Direct  Link  to  the  resource  in  the  Summon  record.   5.  The  classification  of  the  link  in  the  Summon  record.  This  was  either  “Full  Text  Online”   or  “Citation-­‐Only.”   6.  The  date  the  link  resolver  link  was  clicked.   7.  The  page  in  the  Summon  search  results  the  link  resolver  link  was  found.   8.  The  position  within  the  page  of  search  results  where  the  link  resolver  link  was  located.   9.  The  search  query  that  produced  the  search  results.   Figure  3.  Data  Points  Collected   The  results  from  this  review  were  somewhat  disappointing,  with  only  69.5%  of  the  citations   tested  leading  directly  to  full  text.  At  the  time  Direct  Linking  did  not  yet  exist,  so  “direct  to  full  text”   linking  was  only  available  through  the  1-­‐Click  feature  of  360  Link.  The  1-­‐Click  feature  attempts  to   lead  patrons  directly  to  the  full  text  of  a  resource  without  first  going  through  the  360  Link  menu.   1-­‐Click  was  used  for  579  or  84.5%  of  the  citations  tested  with  15.3%  leading  to  the  360  Link  menu.   Of  the  citations  that  used  1-­‐Click,  476  or  82.2%  led  directly  to  full  text,  so  when  1-­‐Click  was  used  it   was  rather  successful.  Links  for  about  30.5%  of  the  citations  led  either  to  a  failed  attempt  to  reach   full  text  through  1-­‐Click  or  directly  to  the  360  Link  menu.  The  2011  review  included  looking  at  the   full-­‐text  links  that  360  Link  indicated  should  lead  directly  to  the  full  text  as  opposed  to  the  journal,   volume  or  issue  level.  When  we  reviewed  all  of  the  “direct  to  full  text”  links  generated  by  360  Link,   not  only  the  ones  used  by  1-­‐Click,  we  found  a  variety  of  reasons  why  those  links  did  not  succeed  in   leading  to  the  full  text.  The  top  five  reasons  found  for  linking  failures  are  the  following:   1. incomplete  target  collection   2. incorrect  syntax  in  the  article/chapter  link  generated  by  360  Link   3. incorrect  metadata  in  the  Summon  OpenURL   4. article  not  individually  indexed   5. target  error  in  targeturl  translation       MEASURING  JOURNAL  LINKING  SUCCESS  FROM  A  DISCOVERY  SERVICE  |  STUART,  VARNUM,  AND   AHRONHEIM   59   Collectively,  these  reasons  were  associated  with  the  failure  of  71.5%  of  the  “direct  to  full  text”   links.  As  we  will  show  later,  these  problems  were  also  noted  in  our  most  recent  review  of  linking   quality.   Move  to  Quarterly  Testing   After  this  review  in  2011,  we  decided  to  perform  quarterly  testing  of  the  linking  so  we  would  have   current  data  on  the  quality  of  linking.  This  would  give  us  information  on  the  effectiveness  of  any   changes  we  and  ProQuest  had  made  independently  to  improve  the  linking.  We  could  see  where   linking  problems  found  in  previous  testing  had  been  resolved  and  where  new  ones  might  exist.     However,  we  needed  to  change  how  we  created  our  sample.  While  the  data  gathered  in  2011   provided  much  insight  into  the  workings  of  360  Link,  testing  the  685  records  produced  2,210  full-­‐ text  links.  Gathering  the  data  for  such  a  large  number  of  links  required  two  months  of  part-­‐time   effort  by  two  staff  members  as  well  as  an  additional  month  of  part-­‐time  effort  by  one  staff  member   for  analysis.  This  would  not  be  workable  for  quarterly  testing.  As  an  alternative  we  decided  to  test   two  records  from  each  of  the  100  serials  most  accessed  through  the  link  resolver.  This  gave  us  a   sample  we  could  test  and  analyze  within  a  quarter  based  on  serials  that  our  patrons  were  using.   We  felt  that  we  could  gather  data  for  such  a  sample  within  three  to  four  weeks  instead  of  two   months.  The  list  was  generated  using  the  “Click-­‐Through  Statistics  by  Title  and  ISSN  (Journal  Title)”   usage  report  generated  through  the  ProQuest  Client  Center  administration  GUI.  We  searched  for   each  serial  title  within  Summon  using  the  serial’s  ISSN  or  the  serial’s  title  when  the  ISSN  was  not   available.     We  ordered  the  results  by  date,  with  the  newest  records  first.  We  wanted  an  article  within  the  first   two  to  three  pages  of  results  so  we  would  have  a  recent  article,  but  not  one  so  new  it  was  not  yet   available  through  the  resources  that  provide  access  to  the  serial.  Then  we  reordered  the  results  to   show  the  oldest  records  first  and  chose  an  article  from  the  first  or  second  page  of  results.  Our  goal   was  to  choose  an  article  at  random  from  the  second  or  third  page  while  ignoring  the  actual  content   of  the  article  so  as  not  to  introduce  a  selection  bias  by  publisher  or  journal.  Another  area  where   our  sample  was  not  truly  random  involved  supplement  issues  of  journals.  One  problem  we  found   with  the  samples  collected  was  that  they  contained  few  items  from  supplemental  issues  of   journals.  Linking  to  articles  in  supplements  is  particularly  difficult  because  of  the  different  ways   supplement  information  is  represented  among  different  databases.  To  attempt  to  capture  linking   information  in  this  case  we  added  records  for  articles  in  supplemental  issues.  Those  records  were   chosen  from  journals  found  in  earlier  testing  to  contain  supplemental  issues.  We  searched   Summon  for  articles  within  those  supplemental  issues  and  selected  one  or  two  to  add  to  our   sample.   One  notable  thing  is  the  introduction  of  Direct  Linking  in  our  Summon  implementation  between   the  reviews  for  the  first  and  second  quarters  of  2012.  ProQuest  developed  Direct  Linking  to       INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  MARCH  2015   60   improve  linking  to  resources  (including  but  not  limited  to  full  text  of  articles)  through  Summon.   Instead  of  using  an  OpenURL  which  must  be  sent  to  a  link  resolver,  Direct  Linking  uses   information  received  from  the  providers  of  the  records  in  Summon  to  create  links  directly  to   resources  through  those  providers.  Ideally,  since  these  links  use  information  from  those  providers,   Direct  Linking  would  not  have  the  problems  found  with  OpenURL  linking  through  a  link  resolver   such  as  360  Link.  Not  all  links  from  Summon  use  Direct  Linking,  and  as  a  result  we  had  to  take  into   account  the  possibility  that  any  link  we  clicked  could  use  either  OpenURL  linking  or  Direct  Linking.   Current  Review:  Back  to  Random  Sampling   While  the  above  sampling  method  resulted  in  useful  data,  we  also  found  it  had  some  limitations.   When  we  performed  the  review  for  the  second  quarter  of  2012,  we  found  a  significant  increase  in   the  effectiveness  of  360  Link  since  the  first  quarter  2012  review.  This  is  further  described  in  the   findings  section  of  this  article.  We  were  able  to  trace  some  of  this  improvement  to  changes   ProQuest  had  made  to  360  Link  and  to  the  OpenURLs  produced  from  Summon.  However,  we  were   unable  to  fully  trace  the  cause  of  the  improvement  and  were  unable  to  determine  if  this  was  real   improvement  that  would  be  persistent.  To  resolve  these  problems,  we  have  returned  to  using  a   random  sample  in  our  latest  review,  but  with  a  change  in  methods.     Current  Review:  Determining  the  Sample  Size   We  wanted  to  perform  a  review  that  would  be  statistically  relevant  and  could  help  us  determine  if   any  changes  in  linking  quality  were  persistent  and  not  just  a  one-­‐time  event.  Instead  of  testing  a   single  sample  each  quarter  we  decided  to  test  a  sample  each  month  over  a  period  of  months.  One   concern  with  this  was  the  sample  size,  as  we  wanted  a  sample  that  would  be  statistically  valid  but   not  so  large  we  could  not  test  it  within  a  single  month.  We  determined  that  a  sample  size  of  300   would  be  sufficient  to  determine  if  any  month-­‐to-­‐month  changes  represent  a  real  change.   However,  in  previous  testing  we  had  learned  that  because  of  re-­‐indexing  of  the  Summon  records,   Summon  IDs  that  were  valid  when  a  patron  performed  a  search  might  no  longer  be  valid  by  the   time  of  our  testing.  We  wanted  a  sample  of  300  still-­‐valid  records,  so  we  selected  a  random  sample   larger  than  that  amount.  So,  we  decided  to  test  600  records  each  month  to  determine  if  the   Summon  IDs  were  still  valid.   Current  Review:  Methods   When  generating  each  month's  sample  we  used  the  same  method  as  in  2011.  We  asked  our  Web   Systems  group  for  the  logs  of  full-­‐text  requests  from  the  library’s  Summon  interface  for  the  period   of  November  2012–February  2013.13  We  processed  each  month’s  log  file  within  two  months  of  the   user  interactions.  To  generate  the  600-­‐record  sample,  after  removing  records  with  duplicate   Summon  IDs,  we  used  a  random  number  generator  tool,  Research  Randomizer,  to  produce  a   random,  nonduplicating  list  of  600  numbers  with  values  from  1  to  the  number  of  unique  records.   Each  of  the  600  numbers  produced  was  matched  to  the  corresponding  row  in  the  spreadsheet  of       MEASURING  JOURNAL  LINKING  SUCCESS  FROM  A  DISCOVERY  SERVICE  |  STUART,  VARNUM,  AND   AHRONHEIM   61   records  with  unique  Summon  IDs.  Once  the  600  records  were  tested  and  we  had  a  subset  with   valid  Summon  IDs,  we  generated  a  list  of  300  random,  nonduplicating  numbers  with  values  from  1   to  the  number  of  records  with  valid  Summon  IDs.  Each  of  the  300  numbers  produced  was  matched   to  the  corresponding  row  in  a  spreadsheet  of  the  subset  of  records  with  valid  Summon  IDs.  This   gave  us  the  300-­‐record  sample  for  analysis.     Testing  was  performed  by  two  people,  a  permanent  staff  member  and  a  student  hired  to  assist   with  the  testing.  The  staff  member  was  already  familiar  with  the  data  gathering  and  recording   procedure  and  trained  the  student  on  this  procedure.  The  student  was  introduced  to  the  library’s   Summon  implementation  and  shown  how  to  recognize  and  understand  the  different  possible   linking  types:  Summon  Direct  Linking,  360  Link  using  1-­‐Click,  and  360  Link  leading  to  the  360   Link  menu.  Once  this  background  was  provided,  the  student  was  introduced  to  the  procedure  for   gathering  and  recording  data.  The  student  was  given  suggestions  on  how  to  find  the  article  on  the   target  site  if  the  link  did  not  lead  directly  to  the  article  and  how  to  perform  some  basic  analysis  to   determine  why  the  link  did  not  function  as  expected.  The  permanent  staff  member  reviewed  the   analysis  of  the  links  that  did  not  lead  to  full  text  and  applied  a  code  to  describe  the  reason  for  the   failure.     Based  on  our  2011  testing,  we  expected  to  see  one  of  two  general  results  in  the  current  round.     1. 360  Link  would  attempt  to  connect  directly  to  the  article  because  of  our  activating  the  1-­‐ Click  feature  of  360  Link  when  we  implemented  the  link  resolver.  With  1-­‐Click,  360  Link   attempts  to  lead  patrons  directly  to  the  full  text  of  a  resource  without  first  having  to  go   through  the  link  resolver  menu.  Even  with  1-­‐Click  active  we  provide  patrons  a  link  leading   to  the  full  360  Link  menu,  which  may  have  other  options  for  leading  to  the  full  text  as  well   as  links  to  search  for  the  journal  or  book  in  our  catalog.     2. The  other  possible  result  was  the  link  from  Summon  leading  directly  to  the  360  Link  menu.     Once  Direct  Linking  was  implemented  after  we  began  this  round,  a  third  result  became  possible  (a   direct  link  from  Summon  to  the  full  text).     For  each  record  we  collected  the  data  shown  in  figure  4.                 INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  MARCH  2015   62   1.      Date  the  link  from  Summon  record  was  tested.   2.      The  URL  of  the  Summon  record.   3.    *The  OpenURL  generated  by  clicking  the  link  from  Summon.  This  was  the  URL  in  the   address  bar  of  the  page  to  which  the  link  led.  This  is  not  available  when  Direct  Linking  is   used.   4.      The  ISSN  of  the  serial  or  ISBN  of  the  book.   5.      The  DOI  of  the  article/book  chapter  if  it  was  available.   6.      The  citation  for  the  article  as  shown  in  the  360  Link  menu  or  in  the  Summon  record  if   Direct  Linking  was  used.   7.    *Each  package  (collection  of  journals  bundled  together  in  the  knowledgebase)  for  which   360  Link  produced  an  electronic  link  for  that  citation.   8.    *The  order  in  the  list  of  electronic  resources  in  which  the  package  in  #7  appeared  in  the   360  Link  menu.   9.    *The  Linking  Level  assigned  to  the  link  by  360  Link.  This  level  indicates  how  close  to  the   article  the  link  should  lead  the  patron,  with  article-­‐level  or  chapter-­‐level  links  ideally   taking  the  patron  directly  to  the  article/book  chapter.  The  linking  levels  recorded  in  our   testing  starting  with  the  closest  to  full  text  were  article/book  chapter,  issue,  volume,   journal/book  and  database.   10.  *For  article-­‐level  links,  the  URL  that  360  Link  used  to  attempt  to  connect  to  the  article.   11.  For  all  full-­‐text  links  in  the  360  Link  menu,  the  URL  to  which  the  links  led.  This  was  the   link  displayed  in  the  browser  address  bar.   12.  A  code  assigned  to  that  link  describing  the  results.   13.  A  note  indicating  if  full  text  was  available  on  the  site  to  which  the  link  led.  This  was  only   an  indicator  of  whether  or  not  full  text  could  be  accessed  on  that  site  not  an  indicator  of   the  success  of  1-­‐Click/Direct  Linking  or  the  article-­‐level  link.   14.  A  note  if  this  was  the  link  used  by  1-­‐Click.   15.  A  note  if  Direct  Linking  was  used.   16.  A  note  if  the  link  was  for  a  citation  where  1-­‐Click  was  not  used  and  clicking  the  link  in   Summon  led  directly  to  the  360  Link  menu.   17.  Notes  providing  more  detail  for  the  results  described  by  #12.  This  included  error   messages,  search  strings  shown  on  the  target  site,  and  any  unusual  behavior.  The  notes   also  included  conclusions  reached  regarding  the  cause(s)  of  any  problems.   *  Collected  only  if  the  link  resolver  was  used.   Figure  4.  Data  Collected  from  Sample       MEASURING  JOURNAL  LINKING  SUCCESS  FROM  A  DISCOVERY  SERVICE  |  STUART,  VARNUM,  AND   AHRONHEIM   63   Each  link  was  categorized  based  on  whether  it  led  to  the  full  text.  Then  the  links  that  failed  were   further  categorized  on  the  basis  of  the  reason  for  failure  (see  figure  5  for  failure  categories).     1.        Incorrect  metadata  in  the  Summon  OpenURL.   2.        Incomplete  metadata  in  the  Summon  OpenURL.   3.        Difference  in  the  metadata  between  Summon  and  the  target.  In  this  case  we  were  unable  to   determine  which  site  had  the  correct  metadata.   4.        Inaccurate  data  in  the  knowledgebase.  This  includes  incorrect  URL  and  incorrect   ISSN/ISBN.   5.        Incorrect  coverage  in  the  knowledgebase.   6.        Link  resolver  insufficiency.  The  link  resolver  has  not  been  configured  to  provide  deep   linking.  This  may  be  something  that  we  could  configure  or  something  that  would  require   changes  in  360  Link.   7.        Incorrect  syntax  in  the  article/chapter  link  generated  by  360  Link.   8.        Target  site  does  not  appear  to  support  linking  to  article/chapter  level.   9.        Article  not  individually  indexed.  This  often  happens  with  conference  abstracts  and  book   reviews  which  are  combined  in  a  single  section.   10.    Translation  error  of  the  “targeturl”  by  target  site.   11.    Incomplete  target  collection.  Site  is  missing  full  text  for  items  that  should  be  available  on   the  site.   12.    Incorrect  metadata  on  the  target  site.   13.    Citation-­‐Only  record  in  Summon.  Summon  indicates  only  the  citation  is  available  so  access   to  full  text  is  not  expected.   14.    Error  indicating  cookie  could  not  be  downloaded  from  target  site.  This  sometimes   happened  with  1-­‐Click  but  the  same  link  would  work  from  the  360  Link  menu.   15.    Item  does  not  appear  to  have  a  DOI.  The  360  Link  menu  may  provide  an  option  to  search  for   the  DOI.  Sometimes  these  searches  fail  and  we  are  unable  to  find  a  DOI  for  the  item.   16.    Miscellaneous.  Results  that  do  not  fall  into  the  other  categories.  Generally  used  for  links  in   packages  for  which  360  Link  only  provides  journal/book-­‐level  linking  such  as  Directory  of   Open  Access  Journals  (DOAJ).   17.    Unknown.  The  link  failed  with  no  identifiable  cause.     Figure  5.  List  of  Failure  Categories  Assigned       INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  MARCH  2015   64   User-­‐Reported  Problems   In  March  2012,  we  began  recording  the  number  of  full-­‐text  clicks  in  ArticlesPlus  search  results   (using  Google  Analytics  events).  For  each  month,  we  calculated  the  number  of  problems  reported   per  1,000  searches  and  per  1,000  full-­‐text  clicks.  Graphed  over  time,  the  number  of  problem   reports  in  both  categories  shows  an  overall  decline.  See  figures  6  and  7.     Figure  6.  Problems  Reported  per  1,000  ArticlesPlus  Searches  (June  2011–April  2014)     Figure  7.  Problems  Reported  per  1,000  ArticlesPlus  Full-­‐Text  Clicks  (March  2012-­‐April  2014)       MEASURING  JOURNAL  LINKING  SUCCESS  FROM  A  DISCOVERY  SERVICE  |  STUART,  VARNUM,  AND   AHRONHEIM   65   Our  active  work  to  update  the  Summon  and  360  Link  knowledge  bases  began  in  June  2011.  The   change  to  Summon  Direct  Linking  happened  on  February  27,  2012,  at  a  time  when  we  were   particularly  dissatisfied  with  the  volume  of  problems  reported.  We  felt  the  poor  quality  of   OpenURL  resolution  was  a  strong  argument  in  favor  of  activating  Summon  Direct  Linking.  We   believe  this  change  led  to  a  noticeable  improvement  in  the  number  of  problems  reported  per   1,000  searches  (see  figure  6).  We  do  not  have  data  for  clicks  on  the  full-­‐text  links  in  our   ArticlesPlus  interface  prior  to  March  2012,  but  do  know  that  reports  per  1,000  full-­‐text  clicks  have   been  on  the  decline  as  well  (see  figure  7).   FINDINGS   Summary  of  Random-­‐Sample  Testing  of  Link  Success     In  early  2013  we  tested  linking  from  ArticlesPlus  to  gather  data  on  the  effectiveness  of  the  linking   and  to  attempt  to  determine  if  there  were  any  month-­‐to-­‐month  changes  in  the  effectiveness  that   could  indicate  persistent  changes  in  linking  quality.  In  this  section  we  will  review  the  data   collected  from  the  four  samples  used  in  this  testing.  We  will  discuss  the  different  paths  to  full  text,   Direct  Linking  vs.  OpenURL  linking  through  360  Link,  and  their  relative  effectiveness.  We  will  also   discuss  the  reasons  we  found  for  links  to  not  lead  to  full  text.   Paths  to  Full-­‐Text  Access   As  shown  below  (see  table  1)  most  of  the  records  tested  in  Summon  used  Direct  Linking  to   attempt  to  reach  the  full  text.  The  percentage  varied  with  each  sample  tested  but  they  ranged  from   61%  to  70%.  The  remaining  records  used  360  Link  to  attempt  to  reach  the  full  text.  Most  of  the   time  when  360  Link  was  used,  1-­‐Click  was  also  used  to  reach  the  full  text.  Between  Direct  Linking   and  1-­‐Click  about  93%  to  94%  of  the  time  an  attempt  was  made  to  lead  users  directly  to  the  full   text  of  the  article  without  first  going  through  the  360  Link  menu.     Sample  1   November  2012   Sample  2   December  2012   Sample  3   January  2013   Sample  4   January  2013   Direct  Linking   205   68.3%   210   70.0%   184   61.3%   190   63.3%   360  Link/1-­‐Click   77   25.7%   70   23.3%   98   32.7%   87   29.0%   360  Link/360  Link  Menu   18   6.0%   20   6.7%   18   6.0%   23   7.7%   Total   300     300     300     300     Table  1.  Type  of  Linking       INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  MARCH  2015   66   Attempts  to  reach  the  full  text  through  Direct  Linking  and  1-­‐Click  were  rather  successful.  In  the   testing,  we  were  able  to  reach  full  text  through  those  methods  from  79%  to  about  84%  of  the  time   (see  table  2).  The  remaining  cases  were  situations  where  Direct  Linking/1-­‐Click  did  not  lead   directly  to  the  full  text  or  we  reached  the  360  Link  menu.       Sample  1   November  2012   Sample  2   December  2012   Sample  3   January  2013   Sample  4   January  2013   Direct  Linking   197   65.7%   204   68.0%   173   57.7%   185   61.7%   360  Link/1-­‐Click   45   15.0%   47   15.7%   64   21.3%   55   18.3%   Total  out  of  300   242   80.7%   251   83.7%   237   79.0%   240   80.0%   Table  2.  Percentage  of  Citations  Leading  Directly  to  Full  Text   Table  3  contains  the  same  data  but  adjusted  to  remove  results  that  Summon  correctly  indicated   were  citation-­‐only.  Instead  of  calculating  the  percentages  based  on  the  full  300  citation  samples,   they  are  calculated  based  on  the  sample  minus  the  citation-­‐only  records.  The  last  row  shows  the   number  of  records  excluded  from  the  full  samples.       Sample  1   November  2012   Sample  2   December  2012   Sample  3   January  2013   Sample  4   January  2013   Direct  Linking   197   65.9%   204   69.2%   173   59.0%   185   62.5%   360  Link/1-­‐Click   45   15.1%   47   15.9%   64   21.8%   55   18.6%   Total   242   80.9%   251   85.1%   237   80.9%   240   81.1%   Records  excluded   1     5     7     4     Table  3.  Percentage  of  Citations  Leading  Directly  to  Full  Text,  Excluding  Citation-­‐Only  Results   Link  Failures  with  Summon  Direct  Linking  and  360  Link  1-­‐Click   The  next  two  tables  show  the  results  of  linking  for  records  that  used  Direct  Linking  and  the   citations  that  used  1-­‐Click  through  360  Link.  Records  that  used  Direct  Linking  were  more  likely  to   lead  testers  to  full  text  than  360  Link  with  1-­‐Click.  For  the  four  samples,  Direct  Linking  led  to  full   text  more  than  90%  of  the  time  while  1-­‐Click  led  to  full  text  from  about  58%  to  about  67%  of  the   time.   For  those  records  using  Direct  Linking  where  Direct  Linking  did  not  lead  directly  to  the  text,  the   result  was  usually  a  page  that  did  not  have  a  link  to  full  text  (see  table  4).         MEASURING  JOURNAL  LINKING  SUCCESS  FROM  A  DISCOVERY  SERVICE  |  STUART,  VARNUM,  AND   AHRONHEIM   67     Sample  1   Nov.  2012   n  =  205   Sample  2   Dec.  2012   n  =  210   Sample  3   Jan.  2013   n  =  184   Sample  4   Jan.  2013   n  =  190   Full  Text/Page  with  full-­‐text  link   197   96.1%   204   97.1%   173   94.0%   185   97.4%   Abstract/Citation  Only   6   2.9%   5   2.4%   6   3.3%   5   2.6%   Unable  to  access  full  text  through   available  full-­‐text  link   1   0.5%   1   0.5%   3   1.6%   0   0.0%   Error  and  no  full-­‐text  link  on  target   1   0.5%   0   0.0%   0   0.0%   0   0.0%   Listing  of  volumes/issues   0   0.0%   0   0.0%   1   0.5%   0   0.0%   Wrong  article   0   0.0%   0   0.0%   1   0.5%   0   0.0%   Minor  results14   0   0.0%   0   0.0%   0   0.0%   0   0.0%   Table  4.  Results  with  Direct  Linking   For  360  Link  with  1-­‐Click,  the  results  that  did  not  lead  to  full  text  were  more  varied  (see  table  5).   The  top  reasons  for  failure  included  the  link  leading  to  an  error  indicating  the  article  was  not   available  even  though  full  text  for  the  article  was  available  on  the  site,  the  link  leading  to  a  list  of   search  results  and  the  link  leading  to  the  table  of  contents  for  the  journal  issue  or  book.  In  the  last   case,  most  of  those  results  were  book  chapters  where  360  Link  only  generated  a  link  to  the  main   page  for  the  book  instead  of  a  link  to  the  chapter.           INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  MARCH  2015   68     Sample  1   Nov.  2012   n  =  77   Sample  2   Dec.  2012   n  =  70   Sample  3   Jan.  2013   n  =  98   Sample  4   Jan.  2013   n  =  87   Full  Text/Page  with  full-­‐text  link   45   58.4%   47   67.1%   64   65.3%   55   63.2%   Table  of  Contents   12   15.6%   6   8.6%   10   10.2%   6   6.9%   Error  but  full  text  available   6   7.8%   11   15.7%   10   10.2%   18   20.7%   Results  list   6   7.8%   2   2.9%   10   10.2%   4   4.6%   Error  and  no  full-­‐text  link  on  target   6   7.8%   1   1.4%   2   2.0%   2   2.3%   Wrong  article   1   1.3%   1   1.4%   1   1.0%   2   2.3%   Other   1   1.3%   0   0.0%   0   0.0%   0   0.0%   Abstract/Citation  Only   0   0.0%   0   0.0%   1   1.0%   0   0.0%   Unable  to  access  full  text  through  available   full-­‐text  link   0   0.0%   1   1.4%   0   0.0%   0   0.0%   Search  box   0   0.0%   1   1.4%   0   0.0%   0   0.0%   Minor  results15   0   0.0%   0   0.0%   0   0.0%   0   0.0%   Table  5.  Results  with  360  Link:  Citations  using  1-­‐Click   Link  Analysis  for  all  360  Link  Clicks   Unlike  the  above  tables,  which  show  the  results  on  a  citation  basis,  the  table  below  shows  the   results  for  all  links  produced  by  360  Link  (see  table  6).  This  includes  the  following:   1. links  used  for  1-­‐Click.   2. links  in  the  360  Link  menu  that  were  not  used  for  1-­‐Click  when  360  Link  attempted  to  link   to  full  text  using  1-­‐Click   3. links  in  the  360  Link  menu  where  clicking  the  link  in  Summon  led  directly  to  the  360  Link   menu  instead  of  using  1-­‐Click             MEASURING  JOURNAL  LINKING  SUCCESS  FROM  A  DISCOVERY  SERVICE  |  STUART,  VARNUM,  AND   AHRONHEIM   69     Sample  1   Nov.  2012   n  =  167   Sample  2   Dec.  2012   n  =  158   Sample  3   Jan.  2013   n  =  184   Sample  4   Jan.  2013   n  =  172   Full  Text/Page  with  full-­‐text  link   81   48.5%   84   53.2%   103   56.0%   87   50.6%   Abstract/Citation  Only   0   0.0%   0   0.0%   1   0.5%   0   0.0%   Unable  to  access  full  text  through  available   full-­‐text  link   0   0.0%   1   0.6%   0   0.0%   1   0.6%   Error  but  full  text  available   9   5.4%   14   8.9%   17   9.2%   23   13.4%   Error  and  full  text  not  accessible  through   full-­‐text  link  on  target   1   0.6%   0   0.0%   0   0.0%   0   0.0%   Error  and  no  full-­‐text  link  on  target   10   6.0%   1   0.6%   6   3.3%   5   2.9%   Failed  to  find  DOI  through  link  in  360  Link   menu   3   1.8%   5   3.2%   5   2.7%   8   4.7%   Main  journal  page   22   13.2%   24   15.2%   17   9.2%   15   8.7%   Other   2   1.2%   0   0.0%   1   0.5%   2   1.2%   360  Link  menu  with  no  full-­‐text  links   0   0.0%   2   1.3%   3   1.6%   3   1.7%   Results  list   9   5.4%   4   2.5%   10   5.4%   3   1.7%   Search  box   6   3.6%   7   4.4%   5   2.7%   8   4.7%   Table  of  Contents   12   7.2%   6   3.8%   10   5.4%   9   5.2%   Listing  of  volumes/issues   9   5.4%   9   5.7%   5   2.7%   6   3.5%   Wrong  article   3   1.8%   1   0.6%   1   0.5%   2   1.2%   Table  6.  Results  with  360  Link:  All  Links  Produced  by  360  Link   In  addition  to  recording  what  happened,  we  attempted  to  determine  why  links  failed  to  reach  full   text.  Even  though  Direct  Linking  is  very  effective,  it  is  not  100%  effective  in  linking  to  full  text.   When  excluding  records  that  indicated  that  only  the  citation,  not  full  text,  would  be  available   through  Summon,  most  of  the  problems  were  due  to  incorrect  information  in  Summon  (see  table   7).  Either  the  link  produced  by  Summon  was  incorrectly  leading  to  an  error  or  an  abstract  when       INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  MARCH  2015   70   full  text  was  available  on  the  target  site  or  Summon  incorrectly  indicated  access  to  full  text  was   available.       Sample  1   Nov.  2012   n  =  8   Sample  2   Dec.  2012   n  =  6   Sample  3   Jan.  2013   n  =  11   Sample  4   Jan.  2013   n  =  5   Citation-­‐Only  record  in  Summon   1   12.5%   3   50.0%   4   36.4%   1   20.0%   Incomplete  target  collection   1   12.5%   0   0.0%   1   9.1%   1   20.0%   Incorrect  coverage  in  knowledgebase   0   0.0%   0   0.0%   2   18.2%   0   0.0%   Summon  has  incorrect  link   3   37.5%   1   16.7%   2   18.2%   2   40.0%   Summon  incorrectly  indicating  available  access   to  full  text   3   37.5%   2   33.3%   2   18.2%   1   20.0%   Table  7.  Reasons  for  Linking  Failure  to  Link  to  Full  Text  through  Direct  Linking   Table  8  shows  the  reasons  links  generated  by  360  Link  and  used  for  1-­‐Click  did  not  lead  to  full  text.   Most  of  the  failures  were  caused  by  three  general  problems:   1. incorrect  metadata  in  Summon   2. incorrect  syntax  in  the  article/chapter  link  generated  by  360  Link   3. target  site  does  not  support  linking  to  the  article/chapter  level           MEASURING  JOURNAL  LINKING  SUCCESS  FROM  A  DISCOVERY  SERVICE  |  STUART,  VARNUM,  AND   AHRONHEIM   71     Sample  1   Nov.  2012   n  =  32   Sample  2   Dec.  2012   n  =  23   Sample  3   Jan.  2013   n  =  34   Sample  4   Jan.  2013   n  =  32   Incorrect  metadata  in  the  Summon  OpenURL   2   6.3%   4   17.4%   3   8.8%   4   12.5%   Incomplete  metadata  in  the  Summon   OpenURL   0   0.0%   2   8.7%   0   0.0%   0   0.0%   Difference  in  metadata  between  Summon  and   the  target   1   3.1%   5   21.7%   0   0.0%   2   6.3%   Inaccurate  data  in  knowledgebase   0   0.0%   0   0.0%   0   0.0%   1   3.1%   Incorrect  coverage  in  knowledgebase   0   0.0%   0   4.3%   0   0.0%   0   0.0%   Link  resolver  insufficiency   2   6.3%   0   0.0%   1   2.9%   0   0.0%   Incorrect  syntax  in  the  article/chapter  link   generated  by  360  Link   6   18.8%   3   13.0%   10   29.4%   7   21.9%   Target  site  does  not  support  linking  to   article/chapter  level   11   34.3%   4   17.4%   5   14.7%   6   18.8%   Article  not  individually  indexed   0   0.0%   1   4.3%   3   8.8%   5   15.6%   Target  error  in  targetURL  translation   0   0.0%   0   0.0%   5   14.7%   3   9.4%   Incomplete  target  collection   8   25.0%   1   4.3%   1   2.9%   3   9.4%   Incorrect  metadata  on  the  target  site   0   0.0%   1   4.3%   0   0.0%   1   3.1%   Citation-­‐Only  record  in  Summon   0   0.0%   0   0.0%   0   0.0%   0   0.0%   Cookie   2   6.3%   0   0.0%   0   0.0%   0   0.0%   Item  does  not  appear  to  have  a  DOI   0   0.0%   0   0.0%   0   0.0%   0   0.0%   Miscellaneous   0   0.0%   0   0.0%   4   0.0%   0   0.0%   Unknown   0   0.0%   1   4.3%   2   5.9%   0   0.0%   Table  8.  Reasons  for  Linking  Failure  to  Link  to  Full  Text  through  1-­‐Click       INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  MARCH  2015   72   Broadening  our  view  of  360  Link  to  include  all  links  generated  by  360  Link  during  the  testing,  not   only  the  ones  used  by  1-­‐Click  (see  table  9),  we  see  more  causes  of  failure  than  with  1-­‐Click.  Most  of   the  failures  were  caused  by  five  general  problems:   1. Incorrect  metadata  in  Summon.   2. Link  resolver  insufficiency.  We  mostly  used  this  classification  when  360  Link  only  provided   links  to  the  main  journal  page  or  database  page  instead  of  links  to  the  article  and  we   thought  it  might  have  been  possible  to  generate  a  link  to  the  article.  Sometimes  this  was   due  to  configuration  changes  that  we  could  have  made  and  sometimes  it  was  because  360   Link  would  only  create  article  links  if  particular  metadata  was  available  even  if  other   sufficient  identifying  metadata  was  available.   3. Incorrect  syntax  in  the  article/chapter  link  generated  by  360  Link.   4. Target  site  does  not  support  linking  to  the  article/chapter  level.   5. Miscellaneous.  Most  of  the  links  that  fell  in  this  category  were  ones  that  were  intended  to   go  the  main  journal  page  by  design.  These  were  for  journals  that  are  not  in  vendor-­‐specific   packages  in  the  knowledgebase  but  in  large  general  packages  with  many  journals  on   different  platforms.  Because  there  is  no  common  linking  syntax,  article-­‐level  linking  is  not   possible.  This  includes  packages  containing  open  source  titles  such  as  Directory  of  Open   Access  Journals  (DOAJ)  and  packages  of  subscription  titles  that  are  not  listed  in  vendor-­‐ specific  packages  in  the  knowledgebase.             MEASURING  JOURNAL  LINKING  SUCCESS  FROM  A  DISCOVERY  SERVICE  |  STUART,  VARNUM,  AND   AHRONHEIM   73     Sample  1   Nov.  2012   n  =  86   Sample  2   Dec.  2012   n  =  74   Sample  3   Jan.  2013   n  =  81   Sample  4   Jan.  2013   n  =  89   Incorrect  metadata  in  the  Summon  OpenURL   9   10.5%   5   6.8%   4   4.9%   8   9.0%   Incomplete  metadata  in  the  Summon   OpenURL   0   0.0%   2   2.7%   1   1.2%   3   3.4%   Difference  in  metadata  between  Summon  and   the  target   1   1.2%   6   8.1%   2   2.5%   2   2.2%   Inaccurate  data  in  knowledgebase   0   0.0%   0   0.0%   1   1.2%   5   5.6%   Incorrect  coverage  in  knowledgebase   3   3.5%   1   1.4%   2   2.5%   1   1.1%   Link  resolver  insufficiency   20   23.3%   15   20.3%   9   11.1%   8   9.0%   Incorrect  syntax  in  the  article/chapter  link   generated  by  360  Link   7   8.1%   3   4.1%   10   12.3%   11   12.4%   Target  site  does  not  support  linking  to   article/chapter  level   17   19.8%   6   8.1%   9   11.1%   10   11.2%   Article  not  individually  indexed   0   0.0%   1   1.4%   3   3.7%   5   5.6%   Target  error  in  targeturl  translation   1   1.2%   3   4.1%   7   8.6%   3   3.4%   Incomplete  target  collection   11   12.8%   2   2.7%   5   6.2%   5   5.6%   Incorrect  metadata  on  the  target  site   0   0.0%   1   1.4%   0   0.0%   1   1.1%   Citation-­‐Only  record  in  Summon   0   0.0%   2   2.7%   3   3.7%   3   3.4%   Cookie   2   2.3%   0   0.0%   0   0.0%   0   0.0%   Item  does  not  appear  to  have  a  DOI   2   2.3%   4   5.4%   5   6.2%   7   7.9%   Miscellaneous   13   15.1%   22   29.7%   18   22.2%   17   19.1%   Unknown   0   0.0%   1   1.4%   2   2.5%   0   0.0%   Table  9.  Reasons  for  Linking  Failure  to  Link  to  Full  Text  for  all  360  Link  Links       INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  MARCH  2015   74   Comparison  of  User  Reports  and  Random  Samples   When  we  look  at  user-­‐reported  problems  during  the  same  period  over  which  we  conducted  our   manual  process  (November  1,  2012–January  29,  2013),  we  see  that  users  reported  a  problem   roughly  0.2%  of  the  time  (0.187%  of  searches  resulted  in  a  problem  report  while  0.228%  of  full-­‐ text  clicks  resulted  in  a  problem  report).  See  table  10.   Sample  Period   Problems   Reported   ArticlesPlus   Searches   MGet  It   Clicks   Problems   Reported  per   Search   Problems   Reported  per   MGet  It  Click   11/1/2012– 11/30/2012   225   111,062   95,218   0.203%   0.236%   12/1/2012– 12/31/2012   105   74,848   58,346   0.140%   0.180%   1/1/2013– 1/29/2013   100   44,204   34,692   0.226%   0.288%               Overall   430   230,114   188,256   0.187%   0.228%   Table  10.  User  Problem  Reports  During  the  Sample  Period   The  number  of  user-­‐reported  errors  is  significantly  less  than  what  we  found  through  our   systematic  sampling  (see  table  2).  Where  the  error  rate  based  on  user  reports  would  be  roughly   0.2%,  the  more  systematic  approach  showed  a  20%  error  rate.  Relying  solely  on  user  reports  of   errors  to  judge  the  reliability  of  full-­‐text  links  dramatically  underreports  true  problems  by  a  factor   of  100.     CONCLUSIONS  AND  NEXT  STEPS   Comparison  of  user  reports  to  random  sample  testing  indicates  a  significant  underreporting  of   problems  on  the  part  of  users.  While  we  have  not  conducted  similar  studies  across  other  vendor   databases,  we  suspect  that  user-­‐generated  reports  likewise  significantly  lag  behind  true  errors.   Future  research  in  this  area  is  recommended.     The  number  of  problems  discovered  in  full-­‐text  items  that  are  linked  via  an  OpenURL  is   discouraging;  however,  the  ability  of  the  Summon  Discovery  Service  to  provide  accurate  access  to   full  text  is  an  overall  positive  because  of  its  direct  link  functionality.  More  than  95%  of  direct-­‐ linked  articles  in  our  research  led  to  the  correct  resource  (table  3).  One-­‐click  (OpenURL)   resolution  was  noticeably  poorer,  with  about  60%  of  requests  leading  directly  to  the  correct  full-­‐ text  item.  More  alarming,  we  found  that,  of  full-­‐text  requests  linked  through  an  OpenURL,  a  large       MEASURING  JOURNAL  LINKING  SUCCESS  FROM  A  DISCOVERY  SERVICE  |  STUART,  VARNUM,  AND   AHRONHEIM   75   portion—20%—fail.  The  direct  links  (the  result  of  publisher-­‐discovery  service  negotiations)  are   much  more  effective.  This  discourages  us  from  feeling  any  complacency  about  the  effectiveness  of   our  OpenURL  link  resolution  tools.  The  effort  spent  maintaining  our  link  resolution  knowledge   base  does  not  make  a  long-­‐term  difference  in  the  link  resolution  quality.     Based  on  the  data  we  have  collected,  it  would  appear  that  more  work  needs  to  be  done  if  OpenURL   is  to  continue  as  a  working  standard.  While  our  data  shows  that  direct  linking  offers  improved   service  for  the  user  as  an  immediate  reward,  we  do  feel  some  concern  about  the  longer-­‐term  effect   of  closed  and  proprietary  access  paths  on  the  broader  scholarly  environment.  From  the  library’s   perspective,  the  trend  to  direct  linking  creates  the  risk  of  vendor  lock-­‐in  because  the  vendor-­‐ created  direct  links  will  not  work  after  the  library’s  business  relationship  with  the  vendor  ends.   An  OpenURL  is  less  tightly  bound  to  the  vendor  that  provided  it.  This  lock-­‐in  increases  the  cost  of   changing  vendors.  The  emergence  of  direct  links  is  a  two-­‐edged  sword:  users  gain  reliability  but   libraries  lose  flexibility  and  the  ability  to  adapt.   The  impetus  for  improving  OpenURL  linking  must  come  from  libraries  because  vendors  do  not   have  a  strong  incentive  to  take  the  lead  in  this  effort,  especially  when  it  interferes  with  their   competitive  advantage.  We  recommend  that  libraries  collaborate  more  actively  on  identifying   patterns  of  failure  in  OpenURL  link  resolution  and  remedies  for  those  issues  so  that  OpenURL   continues  as  a  viable  and  open  method  for  full-­‐text  access.  With  more  data  on  the  failure  modes   for  OpenURL  transactions,  libraries  and  content  providers  may  be  able  to  implement  systematic   improvements  in  standardized  linking  performance.  We  hope  that  the  methods  and  data  we  have   presented  form  a  helpful  beginning  step  in  this  activity.   ACKNOWLEDGEMENT   The  authors  thank  Kat  Hagedorn  and  Heather  Shoecraft  for  their  comments  on  a  draft  of  this   manuscript.   REFERENCES     1.     NISO/UKSG  KBART  Working  Group,  KBART:  Knowledge  Bases  and  Related  Tools,  January  2010,   http://www.uksg.org/sites/uksg.org/files/KBART_Phase_I_Recommended_Practice.pdf.     2.     National  Information  Standards  Organization  (NISO),  “ANSI/NISO  Z39.88  -­‐  The  OpenURL   Framework  for  Context-­‐Sensitive  Services,”  May  13,  2010,   http://www.niso.org/kst/reports/standards?step=2&project_key=d5320409c5160be4697dc 046613f71b9a773cd9e.     3.     Adam  Chandler,  Glen  Wiley,  and  Jim  LeBlanc,  “Towards  Transparent  and  Scalable  OpenURL   Quality  Metrics,”  D-­‐Lib  Magazine  17,  no.  3/4  (March  2011),   http://dx.doi.org/10.1045/march2011-­‐chandler.       INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  MARCH  2015   76     4.     Ibid.   5.     National  Information  Standards  Organization  (NISO),  Improving  OpenURLs  through  Analytics   (IOTA):  Recommendations  for  Link  Resolver  Providers,  April  26,  2013,   http://www.niso.org/apps/group_public/download.php/10811/RP-­‐21-­‐2013_IOTA.pdf.   6.     NISO/UKSG  KBART  Working  Group,  KBART:  Knowledge  Bases  and  Related  Tools.   7.     Oliver  Pesch,  “Improving  OpenURL  Linking,”  Serials  Librarian  63,  no.  2  (2012):  135–45,   http://dx.doi.org/10.1080/0361526X.2012.689465.   8     Jason  Price  and  Cindi  Trainor,  “Chapter  3:  Digging  into  the  Data:  Exposing  the  Causes  of   Resolver  Failure,”  Library  Technology  Reports  46,  no.  7  (October  2010):  15–26.   9.     Ibid.,  26.   10.    Xiaotian  Chen,  “Broken-­‐Link  Reports  from  SFX  Users,”  Serials  Review  38,  no.  4  (December   2012):  222–27,  http://dx.doi.org/10.1016/j.serrev.2012.09.002.     11.    Lois  O’Neill,  “Scaffolding  OpenURL  Results,”  Reference  Services  Quarterly  14,  no.  1–2  (2009):   13–35,  http://dx.doi.org/10.1080/10875300902961940.   12.    http://www.lib.umich.edu/.  See  the  ArticlesPlus  tab  of  the  search  box.   13.    One  problem  we  had  in  testing  was  that  log  data  for  February  2013  was  not  preserved.  This   would  have  been  used  to  build  the  sample  tested  in  April  2013.  To  get  around  this  we  decided   to  take  two  samples  from  the  January  2013  log.   14.    The  “Minor  results”  row  is  a  combination  of  all  results  that  did  not  represent  at  least  0.5%  of   the  records  using  Direct  Linking  for  at  least  one  sample.  This  includes  the  following  results:   Error  but  full  text  available,  Error  and  full  text  not  accessible  through  full  text  link  on  target,   Main  journal  page,  360  Link  menu  with  no  full  text  links,  Results  list,  Search  box,  Table  of   Contents,  and  Other.   15.   The  “Minor  results”  row  is  a  combination  of  all  results  that  did  not  represent  at  least  0.5%  of   the  records  using  360  Link  for  at  least  one  sample.  This  includes  the  following  results:  Error   and  full  text  not  accessible  through  full  text  link  on  target,  Main  journal  page,  360  Link  menu   with  no  full  text  links,  Listing  of  volumes/issues. 5625 ---- Microsoft Word - March_ITAL_young_TC proofread.docx Building  Library  Community   Through  Social  Media   Scott  W.  H.  Young     and  Doralyn  Rossmann     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  MARCH  2015             20   ABSTRACT   In  this  article  academic  librarians  present  and  analyze  a  model  for  community-­‐building  through   social  media.  Findings  demonstrate  the  importance  of  strategy  and  interactivity  via  social  media  for   generating  new  connections  with  library  users.  Details  of  this  research  include  successful  guidelines   for  building  community  and  developing  engagement  online  with  social  media.  By  applying   intentional  social  media  practices,  the  researchers’  Twitter  user  community  grew  100  percent  in  one   year,  with  a  corresponding  275  percent  increase  in  user  interactions.  Using  a  community  analysis   approach,  this  research  demonstrates  that  the  principles  of  personality  and  interactivity  can  lead  to   community  formation  for  targeted  user  groups.  Discussion  includes  the  strategies  and  research   approaches  that  were  employed  to  build,  study,  and  understand  user  community,  including  user-­‐type   analysis  and  action-­‐object  mapping.  From  this  research  a  picture  of  the  library  as  a  member  of  an   active  academic  community  comes  into  focus.     INTRODUCTION   This  paper  describes  an  academic  library’s  approach  to  building  community  through  Twitter.   Much  of  the  literature  offers  guidance  to  libraries  on  approaches  to  using  social  media  as  a   marketing  tool.  The  research  presented  here  reframes  that  conversation  to  explore  the  role  of   social  media  as  it  relates  to  building  community.  The  researchers’  university  library  formed  a   social  media  group  and  implemented  a  social  media  guide  to  bring  an  intentional,  personality-­‐rich,   and  interaction-­‐driven  approach  to  its  social  media  activity.  Quantitative  analyses  reveal  a   significant  shift  and  increase  in  Twitter  follower  population  and  interactions,  and  suggest   promising  opportunities  for  social  media  to  strengthen  the  library’s  ties  with  academic   communities.     LITERATURE  REVIEW   Research  in  libraries  has  long  brought  a  critical  analysis  to  the  value,  purpose,  and  practical  usage   of  social  media.  Glazer  asked  of  library  Facebook  usage,  “Clever  outreach  or  costly  diversion?”1   Three  years  later,  Glazer  presented  a  more  developed  perspective  on  Facebook  metrics  and  the   nature  of  online  engagement,  but  social  media  was  still  described  as  “puzzling  and  poorly   defined.”2  Vucovich  et  al.  furthermore  notes  that  “the  usefulness  of  [social  networking  tools]  has   often  proven  elusive,  and  evaluating  their  impact  is  even  harder  to  grasp  in  library  settings.”3     Scott  W.  H.  Young  (swyoung@montana.edu)  is  Digital  Initiatives  Librarian  and     Doralyn  Rossmann  (doralyn@montana.edu)  is  Head  of  Collection  Development,                 Montana  State  University  Library,  Bozeman.     BUILDING  LIBRARY  COMMUNITY  THROUGH  SOCIAL  MEDIA  |  YOUNG  AND  ROSSMANN   21   Li  and  Li  similarly  observe  that  there  “seems  to  be  some  confusion  regarding  what  exactly  social   media  is.”4  Social  media  has  been  experimented  with  and  identified  variously  as  a  tool  for   enhancing  the  image  of  libraries,5  as  a  digital  listening  post,6  or  as  an  intelligence  gathering  tool.7   With  such  a  variety  of  perspectives  and  approaches,  the  discussion  around  social  media  in   libraries  has  been  somewhat  disjointed.     If  there  is  a  common  thread  through  library  social  media  research,  however,  it  ties  together  the   broadcast-­‐based  promotion  and  marketing  of  library  resources  and  services,  what  Li  calls  “the   most  notable  achievement  of  many  libraries  that  have  adopted  social  media.”8  This  particularly   common  approach  has  been  thoroughly  examined.9,10,11,12,13,14,15  In  evaluating  the  use  of  Facebook   at  Western  Michigan  University’s  Waldo  Library,  Sachs,  Eckel,  and  Langan  found  that  promotion   and  marketing  was  the  only  “truly  successful”  use  for  social  media.16  A  survey  of  Estonian   librarians  revealed  that  Facebook  “is  being  used  mainly  for  announcements;  it  is  reduplicating   libraries’  web  site[s].  Interestingly  librarians  don’t  feel  a  reason  to  change  anything  or  to  do   something  differently.”17  With  this  widespread  approach  to  social  media,  much  of  the  library   literature  is  predominated  by  exploratory  descriptions  of  current  usage  and  implementation   methods  under  the  banner  of  promoting  resources  by  meeting  users  where  they  are  on  social   media.18,19,20,21,22,23,24,25,26,27  This  research  is  effective  at  describing  how  social  media  is  used,  but  it   often  does  not  extend  the  discussion  to  address  the  more  difficult  and  valuable  question  of  why   social  media  is  used.     The  literature  of  library  science  has  not  yet  developed  a  significant  body  of  research  around  the   practice  of  social  media  beyond  the  broadcast-­‐driven,  how-­‐to  focus  on  marketing,  promotion,  and   public-­‐relations  announcements.  This  deficiency  was  recognized  by  Saw,  who  studied  social   networking  preferences  of  international  and  domestic  Australian  students,  concluding  “to  date,   the  majority  of  libraries  that  use  social  networking  have  used  it  as  a  marketing  and  promotional   medium  to  push  out  information  and  announcements.  Our  survey  results  strongly  suggest  that   libraries  need  to  further  exploit  the  strengths  of  different  social  networking  sites.”28  From  this   strong  emphasis  on  marketing  and  best  practices  emerges  an  opportunity  to  examine  social  media   from  another  perspective—community  building—which  may  represent  an  untapped  strength  of   social  networking  sites  for  libraries.     While  research  in  library  and  information  science  has  predominantly  developed  around  social   media  as  marketing  resource,  a  small  subset  has  begun  to  investigate  the  community-­‐building   capabilities  of  social  media.29,30,31,32  By  making  users  feel  connected  to  a  community  and   increasing  their  knowledge  of  other  members,  “sites  such  as  Facebook  can  foster  norms  of   reciprocity  and  trust  and,  therefore,  create  opportunities  for  collective  action.”33  Lee,  Yen,  and   Hsiao  studied  the  value  of  interaction  and  information  sharing  on  social  media:  “A  sense  of   belonging  is  achieved  when  a  friend  replies  to  or  ‘Likes’  a  post  on  Facebook.”34  Lee  found  that   Facebook  users  perceived  real-­‐world  social  value  from  shared  trust  and  shared  vision  developed   and  expressed  through  information-­‐sharing  on  social  media.  Research  from  Oh,  Ozkaya,  and     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  MARCH  2015      22   LaRose  indicated  that  users  who  engaged  in  a  certain  quality  of  social  media  interactivity   perceived  an  enhanced  sense  of  community  and  life  satisfaction.35   Broader  discussion  of  social  media  as  a  tool  for  community-­‐building  has  been  advanced  within  the   context  of  political  activity,  where  social  media  is  identified  as  a  method  for  organizing  civic  action   and  revolutionary  protests.36,37,38  Related  research  focuses  on  the  online  social  connections  and   “virtual  communities”  developed  around  common  interests  such  as  religion,39  health,40   education,41  social  interests  and  norms,42  politics,43  web-­‐video  sharing,44  and  reading.45  In  these   analyses,  social  media  is  framed  as  an  online  instrument  utilized  to  draw  together  offline  persons.   Hofer  notes  that  communities  formed  online  through  social  media  activity  can  generate  a  sense  of   “online  bonding  social  capital.”46  Further  marking  the  online/offline  boundary,  research  from   Grieve  et  al.  investigates  the  value  of  social  connectedness  in  online  contexts,  suggesting  that   social  connectedness  on  Facebook  “is  a  distinct  construct  from  face-­‐to-­‐face  social   connectedness.”47  Grieve  et  al.  acknowledges  that  the  research  design  was  predicated  on  the   assumptive  existence  of  an  online/offline  divide,  noting  “it  is  possible  that  such  a  separation  does   not  exist.”48   Around  this  online/offline  separation  has  developed  “digital  dualism,”  a  theoretical  approach  that   interrogates  the  false  boundaries  and  contrasts  between  an  online  world  as  distinct  from  an   offline  world.49,50  Sociologist  Zeynep  Tufekci  expressed  this  concisely:  “In  fact,  the  Internet  is  not  a   world;  it’s  part  of  the  world.”51  A  central  characteristic  of  community-­‐building  through  social   media  is  that  the  “online”  experience  is  so  connected  and  interwoven  with  the  “offline”  experience   as  to  create  a  single  seamless  experience.  This  concept  is  related  to  a  foundational  study  from   Ellison,  Steinfield,  and  Lampe,  who  identified  Facebook  as  a  valuable  subject  of  research  because   of  its  “heavy  usage  patterns  and  technological  capacities  that  bridge  online  and  offline   connections.”52  They  conclude,  “Online  social  network  sites  may  play  a  role  different  from  that   described  in  early  literature  on  virtual  communities.  Online  interactions  do  not  necessarily   remove  people  from  their  offline  world  but  may  indeed  be  used  to  support  relationships.”53   This  paper  builds  on  existing  online  community  research  while  drawing  on  the  critical  theory  of   “digital  dualism”  to  argue  that  communities  built  through  social  media  do  not  reside  in  a  separate   “online”  space,  but  rather  are  one  element  of  a  much  more  significant  and  valuable  form  of  holistic   connectedness.  Our  research  represents  a  further  step  in  shifting  the  focus  of  library  social  media   research  and  practice  from  marketing  to  community  building,  recasting  library-­‐led  social  media  as   a  tool  that  enables  users  to  join  together  and  share  in  the  commonalities  of  research,  learning,  and   the  university  community.  As  library  social  media  practice  advances  within  the  framework  of   community,  it  moves  from  a  one-­‐dimensional  online  broadcast  platform  to  a  multidimensional   socially  connected  space  that  creates  value  for  both  the  library  and  library  users.   METHOD     In  May  2012,  Montana  State  University  Library  convened  a  social  media  group  (SMG)  to  guide  our     BUILDING  LIBRARY  COMMUNITY  THROUGH  SOCIAL  MEDIA  |  YOUNG  AND  ROSSMANN   23   social  media  activity.  The  formation  of  SMG  marked  an  important  shift  in  our  social  media  activity   and  was  crucial  in  building  a  strategic  and  programmatic  focus  around  social  media.  This  internal   committee,  comprising  three  librarians  and  one  library  staff  member,  aimed  to  build  a  community   of  student  participants  around  the  Twitter  platform.  SMG  then  created  a  social  media  guide  to   provide  structure  for  our  social  media  program.  This  guide  outlines  eight  principal  components  of   social  media  activity  (see  table  1).     Social  Media  Guide  Component   Twitter  Focus   Audience  focus   Undergraduate  and  graduate  students   Goals   Connect  with  students  and  build  community   Values   Availability,  care,  scholarship   Activity  focus   Information  sharing;  social  interaction   Tone  &  tenor   Welcoming,  warm,  energetic   Posting  frequency   Daily,  with  regular  monitoring  of  subsequent  interactions   Posting  categories   Student  life,  local  community   Posting  personnel   1  librarian,  approximately  .10  FTE     Table  1.  Social  Media  Activity  Components   Prior  to  the  formation  of  SMG,  our  Twitter  activity  featured  automated  posts  that  lacked  a  sense  of   presence  and  personality.  After  the  formation  of  SMG,  our  Twitter  activity  featured  hand-­‐crafted   posts  that  possessed  both  presence  and  personality.  To  measure  the  effectiveness  of  our  social   media  program,  we  divided  our  Twitter  activity  into  two  categories  based  on  the  May  2012  date  of   SMG’s  formation:  Phase  1  (Pre-­‐SMG)  and  Phase  2  (Post-­‐SMG).  Phase  1  user  data  included   followers  1–514,  those  users  who  followed  the  library  between  November  2008,  when  the  library   joined  Twitter,  and  April  2012,  the  last  month  before  the  Library  formed  SMG.  Phase  2  included   followers  515–937,  those  users  who  followed  the  library  between  May  2012,  when  the  library   formed  SMG,  and  August  2013,  the  end  date  of  our  research  period.  Using  corresponding  dates  to   our  user  analysis,  Phase  1  Tweet  data  included  the  library’s  tweets  1–329,  which  were  posted   between  November  2008  and  April  2012,  and  Phase  2  included  the  library’s  tweets  330–998,   which  were  posted  between  May  2012  and  August  2013  (table  2).  For  the  purposes  of  this   research,  Phase  1  and  Phase  2  users  and  tweets  were  evaluated  as  distinct  categories  so  that  all   corresponding  tweets,  followers,  and  interactions  could  be  compared  in  relation  to  the  formation   date  of  SMG.  Within  Twitter,  “followers”  are  members  of  the  user  community,  “tweets”  are     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  MARCH  2015      24   messages  to  the  community,  and  “interactions”  are  the  user  behaviors  of  favoriting,  retweeting,  or   replying.  Favorites  are  most  commonly  employed  when  users  like  a  tweet.  Favoriting  a  tweet  can   indicate  approval,  for  instance.  A  user  may  also  share  another  user’s  tweet  with  their  own   followers  by  “retweeting.”         Followers   Tweets   Duration   Phase  1   1–514   1–329   Nov.  2008–April  2012   Phase  2   515–937   330–998   May  2012–August  2013   Table  2.  Comparison  of  Phase  1  and  2  Twitter  Activity   We  employed  three  approaches  for  evaluating  our  Twitter  activity:  user  type  analysis,  action-­‐ object  mapping,  and  interaction  analysis.  User  type  analysis  aims  to  understand  our  community   from  a  broad  perspective  by  creating  categories  of  users  following  the  library’s  Twitter  account.   After  reviewing  the  accounts  of  each  member  of  our  user  community,  we  collected  them  into  one   of  the  following  nine  groups:  alumni,  business,  community  member,  faculty,  library,  librarian,   other,  spam,  and  student.  Categorization  was  based  on  a  manual  review  of  information  found  from   each  user’s  biographical  profile,  Tweet  content,  account  name,  and  a  comparison  against  campus   directories.     Action-­‐object  mapping  is  a  quantitative  method  that  describes  the  relationship  between  the   performance  of  an  activity—the  action—in  relation  to  an  external  phenomenon—the  object.   Action-­‐object  mapping  aims  to  describe  the  interaction  process  between  a  system  and  its   users.54,55,56,57  Within  the  context  of  our  study,  the  system  is  Twitter,  the  object  is  an  individual   tweet,  and  the  action  is  the  user  behavior  in  response  to  the  object,  i.e.,  a  user  marking  a  tweet  as  a   favorite,  retweeting  a  tweet,  or  replying  to  a  tweet.  We  collected  our  library’s  tweets  into  sixteen   object  categories:  blog  post,  book,  database,  event,  external  web  resource,  librarian,  library  space,   local  community,  other  libraries/universities,  photo  from  archive,  topics—libraries,  service,   students,  think  tank,  hortative,  and  workshop.   Interaction  analysis  serves  as  an  extension  of  action-­‐object  mapping  and  aims  to  provide  further   details  about  the  level  of  interaction  between  a  system  and  its  users.  For  this  study  we  created  an   associated  metric,  “interaction  rate,”  that  measures  the  rate  by  which  each  object  category   received  an  action.  Within  the  context  of  our  study,  we  have  treated  the  “action”  of  action-­‐object   mapping  and  the  “interaction”  of  Twitter  as  equivalents.  To  identify  the  interaction  rate,  we  used   the  following  formula:  “total  number  of  Tweets  within  an  object  category”  divided  by  “number  of   Tweets  within  an  object  category  that  received  an  action.”  Interaction  rate  was  calculated  for  each   object  category  and  for  all  Tweets  in  Phase  1  and  in  Phase  2.     BUILDING  LIBRARY  COMMUNITY  THROUGH  SOCIAL  MEDIA  |  YOUNG  AND  ROSSMANN   25     RESULTS   The  changes  in  approach  to  the  library’s  Twitter  presence  through  SMG  and  the  Social  Media   Guide  are  evident  in  this  study’s  results  (figure  1).  An  analysis  of  user  types  in  Phase  1  reveals  a   large  portion,  48  percent,  were  business  followers.  In  comparison,  the  business  percentage   decreased  to  30  percent  in  Phase  2.  The  student  percentage  increased  from  6  percent  in  Phase  1  to   28  percent  in  Phase  2,  representing  a  366  percent  increase  in  student  users.  As  noted  earlier,  the   Social  Media  Guide  Component,  “audience  focus”  for  Twitter  is  “undergraduate  and  graduate   students”  and  includes  the  “goal”  to  “connect  with  students  and  build  community”  (table  1).  The   increase  in  the  percentage  of  students  in  the  follower  population  and  the  decrease  in  the  business   percentage  of  the  population  suggest  progress  towards  this  goal.     Figure  1.  Comparison  of  Twitter  Users  by  Type     The  object  categorization  for  Phase  1  shows  a  heavily  skewed  distribution  of  tweets  in  certain   areas,  while  Phase  2  has  a  more  even  and  targeted  distribution  reflecting  implementation  of  the   Social  Media  Guide  components  (figure  2).  In  Phase  1,  workshops  is  the  most  Tweeted  category   with  of  36  percent  of  all  posts.  Library  space  represents  18  percent  of  tweets  while  library  events   is  third  with  17  percent.  The  remaining  13  categories  range  from  5  percent  to  a  fraction  of  a   percent  of  tweets.  Phase  2  shows  a  more  balanced  and  intentional  distribution  of  tweets  across  all   object  categories,  with  a  strong  focus  on  the  Social  Media  Guide  “posting  category”  of  “student     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  MARCH  2015      26   life,”  which  accounted  for  25  percent  of  tweets.  Library  Space  consists  of  11  percent  of  tweets,  and   external  web  resource  composes  9  percent  of  tweets.  The  remaining  categories  range  from  8   percent  to  1  percent  of  tweets.       Figure  2.  Comparison  of  Tweets  by  Content  Category   Interaction  rates  were  low  in  most  object  categories  in  Phase  1  (see  figure  3).  Given  that  the  Social   Media  Guide  has  an  “activity  focus”  of  “social  interaction,”  a  tweet  category  with  a  high  percentage   of  posting  and  a  low  interaction  rate  suggests  a  disconnect  between  tweet  posting  and  meeting   stated  goals.  For  example,  workshops  represented  a  large  percentage  (36  percent)  of  the  tweets   but  yielded  a  0  percent  interaction  rate.  Library  space  was  18  percent  of  tweets  but  had  only  a  2   percent  interaction  rate.  Eleven  of  the  16  categories  in  Phase  1  had  no  associated  actions  and  thus   a  0  percent  interaction  rate.  The  interaction  rate  for  Phase  1  was  12.5  percent.  In  essence,  our   action-­‐object  data  and  interaction  rate  data  shows  us  that  during  Phase  1  we  created  content  most   frequently  about  topics  of  low  interest  to  our  community  while  we  tweeted  less  frequently  about   topics  of  high  interest  to  our  community.         BUILDING  LIBRARY  COMMUNITY  THROUGH  SOCIAL  MEDIA  |  YOUNG  AND  ROSSMANN   27     Figure  3.  Interaction  Rates,  Phase  1   In  contrast  to  Phase  1,  Phase  2  interaction  rate  demonstrates  an  increase  in  interaction  rate  across   nearly  every  object  category  (figure  4,  figure  5),  especially  student  life  and  local  community.     Figure  4.  Interaction  Rates,  Phase  2   INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  MARCH  2015      28     Figure  5.  Interaction  Rate  Comparison   The  local  community  category  of  tweets  had  the  highest  interaction  rate  at  68  percent.  The   student  life  category  had  the  second  highest  interaction  rate  at  62  percent.  Only  2  of  the  16   categories  in  Phase  2  had  no  associated  actions  and  thus  a  0  percent  interaction  rate.  The   interaction  rate  for  Phase  2  was  46.8  percent,  which  represented  an  increase  of  275  percent  from   Phase  1.  In  essence,  our  action-­‐object  data  and  interaction  rate  data  shows  us  that  during  Phase  2   we  created  content  most  frequently  about  topics  of  higher  interest  to  our  community  while  we   tweeted  less  frequently  about  topics  of  low  interest  to  our  community.   DISCUSSION   This  research  suggests  a  strong  community-­‐building  capability  of  social  media  at  our  academic   library.  The  shift  in  user  types  from  Phase  1  to  Phase  2,  notably  the  increase  in  student  Twitter   Followers,  indicates  that  the  shape  of  our  Twitter  community  was  directly  affected  by  our  social   media  program.  Likewise,  the  marked  increase  in  interaction  rRate  between  Phase  1  and  Phase  2   suggests  the  effectiveness  of  our  programmatic  community-­‐focused  approach.     The  Montana  State  University  Library  social  media  program  was  fundamentally  formed  around  an   approach  described  by  Glazer:  “Be  interesting,  be  interested.”58  Our  Twitter  user  community  has   thrived  since  we  adopted  this  axiom.  We  have  interpreted  “interesting”  as  sharing  original     BUILDING  LIBRARY  COMMUNITY  THROUGH  SOCIAL  MEDIA  |  YOUNG  AND  ROSSMANN   29   personality-­‐rich  content  with  our  community  and  “interested”  as  regularly  interacting  with  and   responding  to  members  of  our  community.  The  twofold  theme  of  personality-­‐rich  content  and   interactivity-­‐based  behavior  has  allowed  us  to  shape  our  Phase  2  user  community.   Prior  to  the  formation  of  SMG,  social  media  at  the  MSU  Library  was  a  rather  drab  affair.  The  library   Twitter  account  during  that  time  was  characterized  by  automated  content,  low  responsiveness,  no   dedicated  personnel,  and  no  strategic  vision.  Our  resulting  Twitter  community  was  composed  of   mostly  businesses,  at  47  percent  of  followers,  with  students  representing  just  6  percent  of  our   followers.  The  resulting  interaction  rate  of  12.5  percent  reflects  the  broadcast-­‐driven  approach,   personality-­‐devoid  content,  and  disengaged  community  that  together  characterized  Phase  1.   Following  the  formation  of  SMG,  the  library  Twitter  account  benefitted  from  original  and  unique   content,  high  responsiveness,  dedicated  personnel,  and  a  strategic,  goal-­‐driven  vision.  Our  Phase  2   Twitter  community  underwent  a  transformation,  with  business  representation  decreasing  to  30   percent  and  student  representation  increasing  to  28  percent.  The  resulting  interaction  rate  of  46.8   percent  reflects  our  refocused  community-­‐driven  program,  personality-­‐rich  content,  and  engaged   community  of  Phase  2.     Figure  6.  Typical  Phase  1  Tweet   Figure  6  illustrates  a  typical  Phase  1  Tweet.  The  object  category  for  this  tweet  is  database  and  it   yielded  no  actions.  The  announcement  of  a  new  database  trial  was  auto-­‐generated  from  our   library  blog,  a  common  method  for  sharing  content  during  Phase  1.  This  tweet  is  problematic  for   community-­‐building  for  two  primary  reasons:  the  style  and  content  lacks  a  sense  and  personality   of  a  human  author  and  does  not  offer  compelling  opportunities  for  interaction.         INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  MARCH  2015      30     Figure  7.  Typical  Phase  2  Tweet   Figure  7  illustrates  a  typical  Phase  2  Tweet.  The  object  category  for  this  tweet  is  student  life  and  it   yielded  6  actions  (2  retweets  and  4  favorites).  The  content  relates  to  a  meaningful  and  current   event  for  our  target  student  user  community,  and  is  fashioned  in  such  a  way  as  to  invite   interaction  by  providing  a  strong  sense  of  relevancy  and  personality.  Figure  8  further   demonstrates  the  community  effect  of  Phase  2.  In  this  example  we  have  reminded  our  Twitter   community  of  the  services  available  through  the  library,  and  one  student  user  has  replied.  During   our  Phase  2  Twitter  activity,  we  prioritized  responsiveness,  availability,  and  scholarship  with  the   goal  of  connecting  with  students  and  building  a  sense  of  community.  In  many  ways  the  series  of   tweets  shown  in  figure  8  encapsulates  our  social  media  program.  We  were  able  to  deliver   resources  to  this  student,  who  then  associates  these  interactions  with  a  sense  of  pride  in  the   university.  This  example  illustrates  the  overall  connectedness  afforded  by  social  media.  In   contacting  the  Library  Twitter  account,  this  user  asked  a  real-­‐world  research  question.  Neither  his   inquiry  nor  our  response  was  located  strictly  within  an  online  world.  While  we  pointed  this  user   to  an  online  resource,  his  remarks  indicated  “offline”  feelings  of  satisfaction  with  the  interaction.   Lee  and  Oh  found  that  social  media  interactivity  and  information  sharing  can  create  a  shared   vision  that  leads  to  a  sense  of  community  belonging.59,60  By  creating  personality-­‐rich  content  that   invites  two-­‐way  interaction,  our  strategic  social  media  program  has  helped  form  a  holistic   community  of  users  around  our  Twitter  activity.         BUILDING  LIBRARY  COMMUNITY  THROUGH  SOCIAL  MEDIA  |  YOUNG  AND  ROSSMANN   31     Figure  8.  Phase  2  Example,  Community  Effect   Currently  our  work  addresses  the  formation  of  community  through  social  media.  A  next  step  will   introduce  a  wider  scope  by  addressing  the  value  of  community  formed  through  social  media.   There  is  a  rich  area  of  study  around  the  relationship  between  social  media  activity,  perceived   sense  of  community  and  connectedness,  and  student  success.  61,62,63,64,65  Further  research  along   this  line  will  allow  us  to  explore  whether  a  library-­‐led  social  media  community  can  serve  as  an  aid   in  undergraduate  academic  performance  and  graduation  rate.  Continued  and  extended  analysis     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  MARCH  2015      32   will  allow  us  to  increase  the  granularity  of  results,  for  example  by  mapping  user  types  to  action-­‐ object  pairs  and  identifying  the  interaction  rate  for  particular  users  such  as  students  and  faculty.   CONCLUSION   In  articulating  and  realizing  an  intentional  and  strategic  social  media  program,  we  have  generated   results  that  demonstrate  the  community-­‐building  capability  of  social  media.  Over  the  course  of   one  year,  we  transformed  our  social  media  activity  from  personality-­‐devoid  one-­‐way  broadcasting   to  personality-­‐rich  two-­‐way  interacting.  The  research  that  followed  this  fundamental  shift   provided  new  information  about  our  users  that  enabled  us  to  tailor  our  Twitter  activity  and  shape   our  community  around  a  target  population  of  undergraduate  students.  In  so  doing,  we  have   formed  a  community  that  has  shown  new  interest  in  social  media  content  published  by  the  library.   Following  the  application  of  our  social  media  program,  our  student  user  community  grew  by  366   percent  and  the  rate  of  interaction  with  our  community  grew  by  275  percent.  Our  research   demonstrates  the  value  of  social  media  as  a  community-­‐building  tool,  and  our  model  can  guide   social  media  in  libraries  toward  this  purpose.   REFERENCES   1. Harry  Glazer,  “Clever  Outreach  or  Costly  Diversion?  An  Academic  Library  Evaluates  Its   Facebook  experience,”  College  &  Research  Libraries  News  70,  no.  1  (2009):  11,   http://crln.acrl.org/content/70/1/11.full.pdf+html.     2. Harry  Glazer,  “‘Likes’  are  Lovely,  but  Do  They  Lead  to  More  Logins?  Developing  Metrics   for  Academic  Libraries’  Facebook  pages,”  College  &  Research  Libraries  News  73,  no.  1   (2012):  20,  http://crln.acrl.org/content/73/1/18.full.pdf+html.   3. Lee  A.  Vucovich  et  al.,  “Is  the  Time  and  Effort  Worth  It?  One  Library’s  Evaluation  of  Using   Social  Networking  Tools  for  Outreach,”  Medical  Reference  Services  Quarterly  32,  no.  1   (2013):  13,  http://dx.doi.org/10.1080/02763869.2013.749107.   4. Xiang  Li  and  Tang  Li,  “Integrating  Social  Media  into  East  Asia  Library  Services:  Case   Studies  at  University  of  Colorado  and  Yale  University,”  Journal  of  East  Asian  Libraries  157,   no.  1  (2013):  24,   https://ojs.lib.byu.edu/spc/index.php/JEAL/article/view/32663/30799.     5. Colleen  Cuddy,  Jamie  Graham,  and  Emily  G.  Morton-­‐Owens,  “Implementing  Twitter  in  a   Health  Sciences  Library,”  Medical  Reference  Services  Quarterly  29,  no.  4  (2010),   http://dx.doi.org/10.1080/02763869.2010.518915.     6. Steven  Bell,  “Students  Tweet  the  Darndest  Things  about  Your  Library—And  Why  You   Need  to  Listen,”  Reference  Services  Review  40,  no.  2  (2012),   http://dx.doi.org/10.1108/00907321211228264.         BUILDING  LIBRARY  COMMUNITY  THROUGH  SOCIAL  MEDIA  |  YOUNG  AND  ROSSMANN   33   7. Robin  R.  Sewell,  “Who  is  Following  Us?  Data  Mining  a  Library’s  Twitter  Followers,”   Library  Hi  Tech  31,  no.  1  (2013),  http://dx.doi.org/10.1108/07378831311303994.     8. Li  and  Li,  “Integrating  Social  Media  into  East  Asia  Library  Services,”  25.   9. Remi  Castonguay,  “Say  It  Loud  Spreading  the  Word  with  Facebook  and  Twitter,”  College  &   Research  Libraries  News  72,  no.  7  (2011),   http://crln.acrl.org/content/72/7/412.full.pdf+html.     10. Dianna  E.  Sachs,  Edward  J.  Eckel,  and  Kathleen  A.  Langan,  “Striking  a  Balance:  Effective   Use  of  Facebook  in  an  Academic  Library,”  Internet  Reference  Services  Quarterly  16,  nos.  1– 2  (2011),  http://dx.doi.org/10.1080/10875301.2011.572457.   11. Christopher  Chan,  “Marketing  the  Academic  Library  with  Online  Social  Network   Advertising,”  Library  Management  33,  no.  8,  (2012),   http://dx.doi.org/10.1108/01435121211279849.   12. Melissa  Dennis,  “Outreach  Initiatives  in  Academic  Libraries,  2009–2011,”  Reference   Services  Review  40,  no.  3,  (2012),  http://dx.doi.org/10.1108/00907321211254643.   13. Melanie  Griffin  and  Tomaro  I.  Taylor,  “Of  Fans,  Friends,  and  Followers:  Methods  for   Assessing  Social  Media  Outreach  in  Special  Collections  Repositories,”  Journal  of  Web   Librarianship  7,  no.  3  (2013),  http://dx.doi.org/10.1080/19322909.2013.812471.     14. Lili  Luo,“Marketing  via  Social  Media:  A  Case  Study,”  Library  Hi  Tech  31,  no.  3  (2013),   http://dx.doi.org/10.1108/LHT-­‐12-­‐2012-­‐0141.     15. Li  and  Li,  “Integrating  Social  Media  into  East  Asia  Library  Services,”  25.   16. Sachs,  Eckel,  and  Langan,  “Striking  a  Balance,”  48.   17. Jaana  Roos,  “Why  University  Libraries  Don’t  Trust  Facebook  Marketing?,”  Proceedings  of   the  21st  International  BOBCATSSS  Conference  (2013):  164,   http://bobcatsss2013.bobcatsss.net/proceedings.pdf.   18. Noa  Aharony,  “Twitter  Use  in  Libraries:  An  Exploratory  Analysis,”  Journal  of  Web   Librarianship  4,  no.  4  (2010),  http://dx.doi.org/10.1080/19322909.2010.487766.   19. A.  R.  Riza  Ayu  and  A.  Abrizah,  “Do  You  Facebook?  Usage  and  Applications  of  Facebook   Page  among  Academic  Libraries  in  Malaysia,”  International  Information  &  Library  Review   43,  no.  4  (2011),  http://dx.doi.org/10.1016/j.iilr.2011.10.005.   20. Alton  Y.  K.  Chua  and  Dion  H  Goh.,  “A  Study  of  Web  2.0  Applications  in  Library  Websites,”   Library  &  Information  Science  Research  32,  no.  3  (2010),   http://dx.doi.org/10.1016/j.lisr.2010.01.002.         INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  MARCH  2015      34   21. Andrea  Dickson  and  Robert  P.  Holley,  “Social  Networking  in  Academic  Libraries:  The   Possibilities  and  the  Concerns,”  New  Library  World  111,  nos.  11/12  (2010),   http://dx.doi.org/10.1108/03074801011094840.     22. Valerie  Forrestal,  “Making  Twitter  Work:  a  Guide  for  the  Uninitiated,  the  Skeptical,  and   the  Pragmatic,”  Reference  Librarian  52,  nos.  1–2  (2010),   http://dx.doi.org/10.1080/02763877.2011.527607.   23. Gang  Wan,  “How  Academic  Libraries  Reach  Users  on  Facebook,”  College  &  Undergraduate   Libraries  18,  no.  4  (2011),  http://dx.doi.org/10.1080/10691316.2011.624944.   24. Dora  Yu-­‐Ting  Chen,  Samuel  Kai-­‐Wah  Chu,  and  Shu-­‐Qin  Xu,  “How  Do  Libraries  Use  Social   Networking  Sites  to  Interact  with  Users,”  Proceedings  of  the  American  Society  for   Information  Science  and  Technology  49,  no.  1  (2012),   http://dx.doi.org/10.1002/meet.14504901085.   25. Rolando  Garcia-­‐Milian,  Hannah  F.  Norton,  and  Michele  R.  Tennant,  “The  Presence  of   Academic  Health  Sciences  Libraries  on  Facebook:  The  Relationship  between  Content  and   Library  Popularity,”  Medical  Reference  Services  Quarterly  31,  no.  2  (2012),   http://dx.doi.org/10.1080/02763869.2012.670588.   26. Elaine  Thornton,  “Is  Your  Academic  Library  Pinning?  Academic  Libraries  and  Pinterest,”   Journal  of  Web  Librarianship  6,  no.  3  (2012),   http://dx.doi.org/10.1080/19322909.2012.702006.   27. Katie  Elson  Anderson  and  Julie  M.  Still,  “Librarians’  Use  of  Images  on  LibGuides  and  Other   Social  Media  Platforms,”  Journal  of  Web  Librarianship  7,  no.  3  (2013),   http://dx.doi.org/10.1080/19322909.2013.812473.   28. Grace  Saw,  “Social  Media  for  International  Students—It’s  Not  All  about  Facebook,”  Library   Management  34,  no.  3  (2013):  172,  http://dx.doi.org/10.1108/01435121311310860.   29. Ligaya  Ganster  and  Bridget  Schumacher,  “Expanding  Beyond  Our  Library  Walls:  Building   an  Active  Online  Community  through  Facebook,”  Journal  of  Web  Librarianship  3,  no.  2   (2009),  http://dx.doi.org/10.1080/19322900902820929.     30. Sebastián  Valenzuela,  Namsu  Park,  and  Kerk  F.  Kee,  “Is  There  Social  Capital  in  a  Social   Network  Site?  Facebook  Use  and  College  Students’  Life  Satisfaction,  Trust,  and   Participation,”  Journal  of  Computer-­‐Mediated  Communication  14,  no.  4  (2009),   http://dx.doi.org/10.1111/j.1083-­‐6101.2009.01474.x.   31. Nancy  Kim  Phillips,  “Academic  Library  Use  of  Facebook:  Building  Relationships  with   Students,”  Journal  of  Academic  Librarianship  37,  no.  6  (2011),   http://dx.doi.org/10.1016/j.acalib.2011.07.008.       BUILDING  LIBRARY  COMMUNITY  THROUGH  SOCIAL  MEDIA  |  YOUNG  AND  ROSSMANN   35   32. Tina  McCorkindale,  Marcia  W.  Distaso,  and  Hilary  Fussell  Sisco,  “How  Millennials  Are   Engaging  and  Building  Relationships  with  Organizations  on  Facebook,”  Journal  of  Social   Media  in  Society  2,  no.  1  (2013),   http://thejsms.org/index.php/TSMRI/article/view/15/18.     33. Valenzuela,  Park,  and  Kee,  “Is  There  Social  Capital  in  a  Social  Network  Site?,”  882.   34. Maria  R.  Lee,  David  C.  Yen,  and  C.  Y.  Hsiao,  “Understanding  the  Perceived  Community   Value  of  Facebook  Users,”  Computers  in  Human  Behavior  35  (February  2014):  355,   http://dx.doi.org/10.1016/j.chb.2014.03.018.   35. Hyun  Jung  Oh,  Elif  Ozkaya,  and  Robert  LaRose,  “How  Does  Online  Social  Networking   Enhance  Life  Satisfaction?  The  Relationships  among  Online  Supportive  Interaction,  Affect,   Perceived  Social  Support,  Sense  of  Community,  and  Life  Satisfaction,”  Computers  in  Human   Behavior  30  (2014),  http://dx.doi.org/10.1016/j.chb.2013.07.053.   36. Rowena  Cullen  and  Laura  Sommer,  “Participatory  Democracy  and  the  Value  of  Online   Community  Networks:  An  Exploration  of  Online  and  Offline  Communities  Engaged  in  Civil   Society  and  Political  Activity,”  Government  Information  Quarterly  28,  no.  2  (2011),   http://dx.doi.org/10.1016/j.giq.2010.04.008.   37. Mohamed  Nanabhay  and  Roxane  Farmanfarmaian,  “From  Spectacle  to  Spectacular:  How   Physical  Space,  Social  Media  and  Mainstream  Broadcast  Amplified  the  Public  Sphere  in   Egypt’s  ‘Revolution,’”  Journal  of  North  African  Studies  16,  no.  4  (2011),   http://dx.doi.org/10.1080/13629387.2011.639562.   38. Nermeen  Sayed,  “Towards  the  Egyptian  Revolution:  Activists  Perceptions  of  Social  Media   for  Mobilization,”  Journal  of  Arab  &  Muslim  Media  Research  4,  nos.  2–3  (2012):  273–98,   http://dx.doi.org/10.1386/jammr.4.2-­‐3.273_1.   39. Morton  A.  Lieberman  and  Andrew  Winzelberg,  “The  Relationship  between  Religious   Expression  and  Outcomes  in  Online  Support  Groups:  A  Partial  Replication,”  Computers  in   Human  Behavior  25,  no.  3  (2009),  http://dx.doi.org/10.1016/j.chb.2008.11.003.   40. Christopher  E.  Beaudoin  and  Chen-­‐Chao  Tao,  “Benefiting  from  Social  Capital  in  Online   Support  Groups:  An  Empirical  Study  of  Cancer  Patients,”  Cyberpsychology  &  Behavior:  The   Impact  of  the  Internet,  Multimedia  and  Virtual  Reality  on  Behavior  and  Society  10,  no.  4   (2007),  http://dx.doi.org/10.1089/cpb.2007.9986.     41. Manuela  Tomai  et  al.,  “Virtual  Communities  in  Schools  as  Tools  to  Promote  Social  Capital   with  High  Schools  Students,”  Computers  &  Education  54,  no.  1  (2010),   http://dx.doi.org/10.1016/j.compedu.2009.08.009.         INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  MARCH  2015      36   42. Edward  Shih-­‐Tse  Wang  and  Lily  Shui-­‐Lien  Chen,  “Forming  Relationship  Commitments  to   Online  Communities:  The  Role  of  Social  Motivations,”  Computers  in  Human  Behavior  28,   no.  2  (2012),  http://dx.doi.org/10.1016/j.chb.2011.11.002.   43. Pippa  Norris  and  David  Jones,  “Virtual  Democracy,”  Harvard  International  Journal  of   Press/Politics  3,  no.  2  (1998),  http://dx.doi.org/10.1177/1081180X98003002001.     44. Xu  Cheng,  Jiangchuan  Liu,  and  Cameron  Dale,  “Understanding  the  Characteristics  of   Internet  Short  Video  Sharing:  A  YouTube-­‐Based  Measurement  Study,”  IEEE  Transactions   on  Multimedia  15,  no.  5  (2013),  http://dx.doi.org/10.1109/TMM.2013.2265531.   45. Nancy  Foasberg,  “Online  Reading  Communities:  From  Book  Clubs  to  Book  Blogs,”  Journal   of  Social  Media  in  Society  1,  no.1  (2012),   http://thejsms.org/index.php/TSMRI/article/view/3/4.     46. Matthias  Hofer  and  Viviane  Aubert,  “Perceived  Bridging  and  Bonding  Social  Capital  of   Twitter:  Differentiating  between  Followers  and  Followees,”  Computers  in  Human  Behavior   29,  no.  6  (2013):  2137,  http://dx.doi.org/10.1016/j.chb.2013.04.038.     47. Rachel  Grieve  et.  al.,  “Face-­‐to-­‐Face  or  Facebook:  Can  Social  Connectedness  be  Derived   Online?,”  Computers  in  Human  Behavior  29,  no.  3  (2013):  607,   http://dx.doi.org/10.1016/j.chb.2012.11.017.   48. Ibid.,  608.   49. Nathan  Jurgenson,  “When  Atoms  Meet  Bits:  Social  Media,  the  Mobile  Web  and  Augmented   Revolution,”  Future  Internet  4,  no.  1  (2012),  http://dx.doi.org/10.3390/fi4010083.   50. R.  Stuart  Geiger,  “Bots,  Bespoke,  Code  and  the  Materiality  of  Software  Platforms,”   Information,  Communication  &  Society  17,  no.  3  (2014),   http://dx.doi.org/10.1080/1369118X.2013.873069.     51. Zeynep  Tufekci,  “The  Social  Internet:  Frustrating,  Enriching,  but  Not  Lonely,”  Public   Culture  26,  no.  1,  iss.  72  (2013):  14,  http://dx.doi.org/10.1215/08992363-­‐2346322.   52. Nicole  B.  Ellison,  Charles  Steinfield,  and  Cliff  Lampe,  “The  Benefits  of  Facebook  ‘Friends’:   Social  Capital  and  College  Students’  Use  of  Online  Social  Network  Sites,”  Journal  of   Computer-­‐Mediated  Communication  12,  no.  4  (2007):  1144,   http://dx.doi.org/10.1111/j.1083-­‐6101.2007.00367.x.     53. Ibid.,  1165.   54. Roger  Brown,  A  First  Language:  The  Early  Stages  (Cambridge:  Harvard  University  Press,   1973).       BUILDING  LIBRARY  COMMUNITY  THROUGH  SOCIAL  MEDIA  |  YOUNG  AND  ROSSMANN   37   55. Mimi  Zhang  and  Bernard  J.  Jansen,  “Using  Action-­‐Object  Pairs  as  a  Conceptual  Framework   for  Transaction  Log  Analysis,”  in  Handbook  of  Research  on  Web  Log  Analysis,  edited  by   Bernard  J.  Jansen,  Amanda  Spink,  and  Isak  Taksa  (Hershey,  PA:  IGI,  2008).   56. Bernard  J.  Jansen  and  Mimi  Zhang,  “Twitter  Power:  Tweets  as  Electronic  Word  of  Mouth,”   Journal  of  the  American  Society  for  Information  Science  &  Technology  60,  no.  11  (2009),   http://dx.doi.org/10.1002/asi.v60:11.     57. Sewell,  “Who  is  Following  Us?”     58. Glazer,  “‘Likes’  are  Lovely,”  20.   59. Lee,  Yen,  and  Hsiao,  “Understanding  the  Perceived.“   60. Oh,  Ozkaya,  and  LaRose,  “How  Does  Online  Social  Networking.”     61. Reynol  Junco,  Greg  Heiberger,  and  Eric  Loken,  “The  Effect  of  Twitter  on  College  Student   Engagement  and  Grades,”  Journal  of  Computer  Assisted  Learning  27,  no.  2  (2011),   http://dx.doi.org/10.1111/j.1365-­‐2729.2010.00387.x.   62. Susannah  K.  Brown  and  Charles  A.  Burdsal,  “An  Exploration  of  Sense  of  Community  and   Student  Success  Using  the  National  Survey  of  Student  Engagement,”  Journal  of  General   Education  61,  no.  4  (2012),  http://dx.doi.org/10.1353/jge.2012.0039.   63. Jill  L.  Creighton  et  al.,  “I  Just  Look  It  Up:  Undergraduate  Student  Perception  of  Social  Media   Use  in  Their  Academic  Success,”  Journal  of  Social  Media  in  Society  2,  no.  2  (2013),   http://thejsms.org/index.php/TSMRI/article/view/48/25.     64. David  C.  DeAndrea  et  al.,  “Serious  Social  Media:  On  the  Use  of  Social  Media  for  Improving   Students’  Adjustment  to  College,”  The  Internet  and  Higher  Education  15,  no.  1  (2012),   http://dx.doi.org/10.1016/j.iheduc.2011.05.009.   65. Rebecca  Gray  et  al.,  “Examining  Social  Adjustment  to  College  in  the  Age  of  Social  Media:   Factors  Influencing  Successful  Transitions  and  Persistence,”  Computers  &  Education  67   (2013),  http://dx.doi.org/10.1016/j.compedu.2013.02.021.     5629 ---- Microsoft Word - June_ITAL_Nelson_final.docx What’s  In  A  Word?  Rethinking  Facet   Headings  in  a  Discovery  Service       David  Nelson  and   Linda  Turney     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  JUNE  2015           76   ABSTRACT   The  emergence  of  discovery  systems  has  been  well  received  by  libraries  who  have  long  been   concerned  with  offering  a  smorgasbord  of  databases  that  require  either  individual  searching  of   databases  or  the  problematic  use  of  federated  searching.  The  ability  to  search  across  a  wide  array  of   subscribed  and  open-­‐access  information  resources  via  a  centralized  index  has  opened  up  access  for   users  to  a  library’s  wealth  of  information  resources.  This  capability  has  been  particularly  praised  for   its  “Google-­‐like”  search  interface,  thereby  conforming  to  user  expectations  for  information  searching.   Yet  all  discovery  services  also  include  facets  as  a  search  capability  and  thus  provide  faceted   navigation  that  is  a  search  feature  for  which  Google  is  not  particularly  well  suited.  Discovery  services   thus  provide  a  hybrid  search  interface.  An  examination  of  e-­‐commerce  sites  clearly  shows  that   faceted  navigation  is  an  integral  part  of  their  discovery  systems.  Many  library  OPACs  also  now  are   being  developed  with  faceted  navigation  capabilities.  However,  the  discovery  services  faceted   structures  suffer  from  a  number  of  problems  that  inhibit  their  usefulness  and  their  potential.  This   article  examines  several  of  these  issues  and  offers  suggestions  for  improving  the  discovery  search   interface.  It  also  argues  that  vendors  and  libraries  need  to  work  together  to  more  closely  analyze  the   user  experience  of  the  discovery  system.   INTRODUCTION   The  emergence  of  Google  as  the  premier  search  engine1  has  had  a  very  profound  effect  on  searcher   expectations  regarding  information.2  By  virtue  of  its  simplicity  and  the  remarkably  powerful   search  algorithms  that  enable  its  highly  relevant  results,  the  simple  search  box  of  Google  has   clearly  triumphed  as  the  preferred  way  to  find  information.     But  is  the  Google  search  model  really  the  panacea  that  libraries  need  to  resolve  their  search   interface  requirements?  The  nature  of  search  engine  and  search  interface  design  is  a  very  complex   issue.  Unfortunately  for  academic  libraries,  Google  has  dominated  discussions  and  thinking  about   search  engine  interfaces:  “Just  Google  it!”  Is  a  simple  Google  search  box  really  the  preferred   vehicle  with  which  libraries  should  be  delivering  their  content,  both  licensed  and  unlicensed?     The  assumption  librarians  make  to  justify  the  use  of  a  Google  model  is  that  library  users  are   essentially  Google  users,3  or  that  they  have  the  same  information  searching  needs.4  This  is  a     David  Nelson  (David.Nelson@mtsu.edu),  is  Chair,  Collection  Development  and  Management,  and   Linda  Turney  (Linda.Turney@mtsu.edu)  is  Cataloging  Librarian,  James  E.  Walker  Library,  Middle   Tennessee  State  University,  Murfreesboro,  Tennessee.     WHAT’S  IN  A  WORD?  RETHINKING  FACET  HEADINGS  IN  A  DISCOVERY  SERVICE  |  NELSON  AND  TURNEY   doi:  10.6017/ital.v34i2.5629   77   flawed  assumption.  As  an  academic  library,  we  are  tasked  with  making  discoverable  not  simply   digital-­‐only  information,  but  information  objects  with  discrete  characteristics  that  often  constitute   the  object  of  a  search,  e.g.,  an  audio  book,  a  film,  or  even  a  book  on  a  shelf.  Google  has  put  the   emphasis  on  the  keyword,  with  remarkably  gratifying  results  for  the  average  lay  user.  However,  a   recent  Project  Information  Literacy  study  concluded  that  “Google-­‐centric  search  skills  that   freshmen  bring  from  high  school  only  get  them  so  far—but  not  far  enough—with  finding  and   using  the  trusted  sources  they  need  for  fulfilling  college  research  assignments.”5  Until  now,  the   library  web  development  focus  on  providing  a  “Google-­‐like”  search  has,  unfortunately,  diverted   attention  from  an  appreciation  of  the  developments  in  other  areas  of  the  Internet  world,  such  as  e-­‐ commerce,  where  searching  for  information  is  an  integral  component  of  the  buyer–seller   relationship.     Commercial  entities  have  a  vested  interest  in  developing  their  websites  to  enable  each  user  to   have  a  successful  search  outcome.  While  the  search  interfaces  routinely  encountered  at  various  e-­‐ commerce  sites  may  seem  obvious,  it  is  important  to  remember  that  one  is  looking  at  a  series  of   deliberate  decisions  made  with  regard  to  the  interface  organization  and  structure.  For  companies,   the  search  interface  represents  millions  of  dollars  in  investment,  and  the  design  is  part  of  their   search  engine  optimization  strategy.6  In  this  way,  companies  and  other  organizations  create   robust  search  interfaces  that  enable  visitors  to  effectively  and  efficiently  find  what  they  want  in   the  company  “knowledgebase.”     It  is  clear  that  the  product  search  industry  has  arrived  at  some  very  significant  conclusions  about   user  search  behavior,  and  that  they  strive  to  optimize  their  interfaces  to  accommodate  those   conclusions.  Three  features  stand  out:  (1)  the  importance  of  facets  as  a  key  component  in  the   search  design;  (2)  the  personalization  of  the  text  that  instructs  the  user;  and  (3)  intelligibility  of   facet  labels.  In  a  blog  article  on  facets  in  e-­‐commerce  websites,  Scharnell  advises  that,  when   determining  what  the  facets  are,  there  are  two  rules  to  follow:  (1)  keep  it  simple;  and  (2)  create  an   intuitive  structure.7     The  primary  goal  of  a  commercial  website  is  to  bring  about  what  is  called  conversion—that  is,   getting  someone  to  the  site  (driving  traffic)  and,  ultimately,  making  a  sale  (the  conversion).   Companies  have  discovered  that  facets  are  the  key  to  enabling  their  potential  customers  to  locate   discrete  pieces  of  information  (e.g.,  a  product)  almost  intuitively.  Broughton  observes  that  “there   is  an  evident  faceted  approach  to  product  information  in  many  commercial  websites.”8  An   important  characteristic  of  the  faceted  structure  is  that  it  enables  the  user  to  have  the  ability  to   browse  a  collection.  Thus  the  goals  of  a  commercial  site  successfully  employing  faceted  navigation   is  not  that  different  from  the  objectives  which  a  library  discovery  layer  seeks  to  accomplish.  While   the  literature  on  information  literacy  is  now  vast,  very  few  articles  deal  with  the  role  that  facets   play  in  the  discovery  process  for  student  searchers.  Fagan  is  an  author  who  has  addressed  this   issue  of  facets.9  Ramdeen  and  Hemminger  discuss  the  role  of  facets  in  the  library  catalog.  10  To   date,  reviews  of  discovery  systems  or  catalog  interfaces  tend  to  place  emphasis  on  helping  patrons     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  JUNE  2015           78   to  search  our  demonstrably  flawed  systems  rather  than  considering  the  interfaces  as  the  actual   source  of  problems  for  users.11   While  it  can  be  argued  that  comparing  an  academic  site  and  a  commercial  site  compares  apples   and  oranges,  there  being  little  connection  between  the  complex,  open-­‐ended  subject/research   questions  and  searching  a  company’s  inventory  of  goods.  However,  there  are  elements  of   commonality  at  the  higher  level  of  an  information  need  that  drive  an  individual  to  perform  any   kind  of  information  search.  In  both  the  subject/topic  search  and  the  product  search  there  is  a  need   to  evaluate  results  as  they  appear  and  to  make  various  decisions  while  going  through  a  search   process  to  limit  and  narrow  a  search.  That  is,  for  the  information  that  libraries  seek  to  make   discoverable,  it  is  often  their  extratextual  characteristics  that  are  every  bit  as  important  as  the   content  itself.     This  leads  to  a  discussion  of  facets,  the  various  attributes  by  which  we  can  further  describe  the   “manifestation”  and  the  “expression”  (using  the  FRBR  sense  here)  of  an  intellectual  creation.  We   need  to  pay  more  attention  to  the  importance  of  facets  as  a  critical  component  of  the  search   process.  That  is,  we  must  begin  to  move  away  from  the  mantra  that  our  single  search  box  will   provide  a  successful  result  without  additional  considerations,  with  the  idea  that  the  facets  are  of   secondary,  even  tertiary  importance.  Badke  observes  “that  users  of  Google  actually  need  a  deeper   level  of  information  literacy  because  Google  offers  so  little  opportunity  to  nuance  or  facet   results.”12     Yet  facets  are  a  key  part  of  our  discovery  interface  design.  However,  a  full  and  successful   exploitation  of  their  possibilities  has  been  significantly  hobbled  by  a  use  of  jargon-­‐heavy   terminology  that  assumes  users  will  immediately  and  instinctively  grasp  the  concept  of  a  faceted   term.  Even  a  superficial  study  of  many  successful  commercial  websites  quickly  leads  the   thoughtful  observer  to  the  conclusion  that  their  web  developers  and  designers  have  been  making   excellent  use  of  focus  groups  and  surveys  to  make  the  search  process  as  easy  as  possible.  While   businesses  have  an  obvious  monetary  incentive  to  make  sure  their  users  do  not  leave  a  site   because  the  site  itself  presented  a  problem,  libraries  have  the  same  interest  in  making  sure  our   users  are  equally  able  to  easily  search  our  site.  A  library’s  site  should  not,  by  its  assumptions  about   the  user,  present  obstacles  to  their  search  success.13   With  the  growing  use  of  discovery  systems,14  academic  libraries  are  entering  into  a  new  phase  of   search  engine  deployment.15  By  making  use  of  a  preindexed  database  rather  than  the  more   restrictive  federated  search  process,  the  discovery  service  interface  allows  a  user  to  search  for   content  in  a  wide  variety  of  publication  and  media  types  (e.g.,  journals,  books,  dictionaries,  audio   books,  videos,  manuscripts,  newspapers,  images,  etc.).  To  assist  searchers,  discovery  systems   provide  faceted  navigation  along  with  the  search  box  interface.  Several  studies  have  shown  that   the  use  of  facets  in  the  library  environment  has  proven  effective  in  assisting  searchers.16  However,   it  is  equally  clear  that  library  vendors  have  not  thought  deeply  about  the  facet  category  labels,  and   libraries,  which  can  do  a  certain  amount  of  customization,  tend  toward  unquestioning  acceptance     WHAT’S  IN  A  WORD?  RETHINKING  FACET  HEADINGS  IN  A  DISCOVERY  SERVICE  |  NELSON  AND  TURNEY   doi:  10.6017/ital.v34i2.5629   79   of  the  vendor-­‐supplied  labels.  This  is  a  critical  area  involving  both  the  user  interface  and  the  user   experience;  libraries  and  vendors  need  to  spend  far  more  time  and  effort  on  ensuring  the   intelligibility  of  the  facet  labels  and  on  finding  effective  ways  to  encourage  their  use.   The  presence  of  facets  is  a  standard  feature  for  all  library  discovery  systems.17  However,  as  we   will  show  below,  facet  labels  are  not  easily  understandable  for  the  average  user  and  our  search   systems  tend  toward  emphasizing  our  users  as  “anonymous  service  recipient(s).”     What  are  facets?  A  review  of  various  discussions  of  facets  in  information  retrieval  literature   reveals  the  elasticity  of  the  term,  along  with  related  terms.18  Will  observes  that  “what  a  facet  is  has   been  stretched  .  .  .  and  the  term  is  used  loosely  to  mean  any  system  in  which  terms  are  selected   from  pre-­‐defined  groups  at  the  time  of  searching.”19  It  is  probably  easiest  to  understand  the  use  of   the  term  facet  in  information  retrieval  systems  as  categories  derived  from  the  universe  of  objects   that  one  is  seeking  to  discover,  whether  we  are  dealing  with  manufactured  products  at  Home   Depot  or  Greek  manuscripts  in  a  library  collection.20  What  adds  to  the  problem  of  definition  is  the   number  of  synonyms:  “The  term  facet  is  commonly  considered  as  analogous  to  category,  attribute,   class  and  concept.”21  How  objects  are  grouped  would  most  logically  determine  the  facets  that  are   necessary  for  the  classification  scheme.  It  is  the  objects  that  are  under  a  facet  that  present  a   problem  in  understanding.  NISO  Z39.19  defines  facets  as  “attributes  of  content  objects   encompassing  various  non-­‐semantic  aspects  of  a  document,”22  thus  including  such  things  as  author,   language,  format,  etc.  The  terms  that  are  indexed  are  not  the  facets  but  rather  concepts  that  exist   in  a  unique  relationship  to  the  facet.  “Homer”  is  indexed  under  a  facet  “author,”  but  indexing  the   term  author  is  meaningless.     Another  source  of  confusion  is  the  failure  to  distinguish  between  facets  and  filters,  both  of  which   are  used  to  refine  or  narrow  a  search.23  When  a  search  interface  states  that  it  is  using  “faceted   navigation,”  usually  both  facets  and  filters  are  present.     Because  both  a  facet  and  a  filter  are  part  of  retrieval,  it  is  often  difficult  to  separate  the  two.  Once   again,  we  encounter  a  terminological  problem.  For  example,  one  can  speak  of  how  a  facet  itself  is   used  to  filter  a  search  in  the  sense  that  it  refines  or  narrows  a  search  to  a  smaller  segment  of  the   universe  of  objects.  Here  the  term  filter  refers  to  the  process  of  narrowing  a  search.  But  we  also   have  filters  that  deal  with  ranges.  Thus,  the  filter  “date”  covers  a  range  of  time,  from  say  one   month  or  one  year,  to  a  range  over  a  specific  period  of  time.  The  same  can  be  seen  for  the  filter   “price,”  used  to  specify  only  one  amount,  say  $5,  or  a  range  from  $100  to  $299.  The  critical   difference  between  a  facet  and  a  range  filter  is  that  the  terms  found  in  a  facet  are  indexed  while  a   range  filter  (e.g.,  date  or  price)  is  not  an  indexed  term.  It  is  important  to  maintain  a  clear   distinction  between  a  facet  and  a  range  filter  because  the  underlying  metadata  is  different.  A  range   filter  sorts  the  content  in  a  specific  way  and  at  the  same  time  narrows  the  results.     Our  examples,  along  with  the  closer  analysis  of  the  EBSCO  EDS  discovery  system  below,  will  amply   demonstrate  that  facets  and  filters  are  extremely  effective  in  information  retrieval  systems.  The     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  JUNE  2015           80   challenge  that  libraries  face  is  the  need  to  make  sure  that  users  are  aware  of  their  presence  on  a   search  interface  rather  than  relying  exclusively  on  keywords  alone  and  solely  on  the   algorithmically  based  result.24  The  value  of  the  faceted/filtered  search  is  the  ability  to  lead  the   searcher  quickly  and  efficiently  to  the  desired  result,  a  result  that  will  too  often  elude  the  user   even  with  a  powerful  Google  search,  unless  that  user  gets  most  of  the  terms  exactly  right.   We  chose  various  e-­‐commerce  websites  because  they  have  extremely  large  numbers  of  site  visits   or  because  they  were  smaller  specialty  sites  that  reflected  a  more  highly  optimized  use  of  facets.  A   wide  range  of  product  types  was  in  the  selection.  The  frequency  of  visits  indicates  that  large   numbers  of  users  are  exposed  to  a  search  page  structure  and  terminology,  which  in  turn   establishes  a  standard  for  a  set  of  user  expectations.  Best  Buy,  Target,  and  Home  Depot  are  among   the  top  on  hundred  accessed  websites,  a  fact  richly  indicative  of  the  type  of  influence  they  will   have  in  setting  user  search  expectations.  An  examination  of  these  websites  reveals  an  underlying   set  of  best  practices  for  making  use  of  faceted  navigation  with  text  searching.   Linguistic  Personalization   With  the  advent  of  Web  2.0  there  are  several  forms  of  interaction  an  individual  can  have  with  a   website.  These  can  be  considered  forms  of  personalization  of  websites.25  Usually,  personalization   is  “largely  about  filtering  content  to  satisfy  an  individual’s  particular”  information  needs.26  We  see   personalization  at  its  most  complex  in  the  algorithmically  adjusted  results  to  a  search  based  on   previous  searches.  There  we  find  the  feature  of  suggestions  that  are  offered  to  an  individual  on  the   basis  of  search  results,  a  feature  offered  by  Amazon  and  Netflix.  While  we  will  not  be  able  to   personalize  our  discovery  services  in  a  manner  similar  to  Netflix  or  Amazon,  we  can  improve  the   quality  of  the  interaction  in  other  areas  of  “personalization.”  We  should  be  seeking  ways  we  can   more  directly  speak  to  individual  searchers,  for  example,  by  selecting  words  and  phrases  that   speak  directly  to  a  person’s  needs.   Our  examination  of  many  e-­‐commerce  sites  reveals  a  robust  use  of  linguistically  personalized   features  as  an  intrinsic  part  of  their  website  design  and  enhancement.  That  is,     e-­‐commerce  sites  make  use  of  their  interface  itself  to  directly  communicate  with  their  customers   in  a  way  that  makes  use  of  certain  linguistic  features  that  can  be  easily  adopted  by  library  sites.   Combined  with  faceted  searching,  adding  certain  linguistic  features  should  prove  effective  in   encouraging  the  use  of  the  facets,  and  in  the  process  improve  both  the  search  results  and  the  user   experience.  This  constitutes  the  fundamental  challenge  for  the  academic  library—to  help  shape   the  mental  model  with  regard  to  the  universe  of  content  that  we  provide  through  our  search   interface.  Finally,  there  is  what  we  can  consider  a  form  of  linguistic  personalization  with  which   language  is  used  to  “speak”  more  directly  to  a  searcher.  It  is  this  third  feature  of  linguistic   personalization  that  libraries  can  more  easily  control  and  customize  with  the  discovery  services,   as  well  as  at  other  places  on  the  library  website.     WHAT’S  IN  A  WORD?  RETHINKING  FACET  HEADINGS  IN  A  DISCOVERY  SERVICE  |  NELSON  AND  TURNEY   doi:  10.6017/ital.v34i2.5629   81   There  is,  of  course,  the  personalization  that  is  intended  primarily  for  those  who  register  and  then   set  up  their  own  accounts.  However,  there  is  also  the  personalization  in  terms  of  text   communication  in  which  the  website  uses  both  pronouns  and  verb  forms  that  directly  address  the   searcher.  This  is  seen  in  the  use  of  the  second-­‐person  pronoun,  either  the  subject  or  the  possessive,   “you”  or  “your,”  and  for  verbal  forms,  the  use  of  the  second-­‐person  imperative  (usually  the  same  as   the  infinitive  in  English).  This  type  of  personalization  is  a  web  design  decision.  The  search  box   now  frequently  contains  text,  ranging  from  simple  noun  lists  to  sentences,  all  of  which  are   intended  to  encourage  the  user  to  make  use  of  the  search  capabilities.  After  a  search  has  occurred,   the  results  are  also  indicated  with  text  that  speaks  directly  to  a  person  by  means  of  the  use  of   pronouns  and  verbs.  We  find  the  following  interesting  examples  in  table  1:   Pronoun   Site   Notes   What  are  you  looking  for  today?   Kroger   Search  box   What  can  we  help  you  find?   Home  Depot   Search  box   What  are  you  looking  for?   Lowe’s   Search  box   Your  selections   Target   Post-­‐search   We  found  x  results  for  [search  term]   Target   Post-­‐search   Narrow  your  results   Tigerdirect   Post-­‐search   Table  1.  Linguistic  Personalization  Examples   In  examining  the  features  that  are  found  at  these  e-­‐commerce  sites,  it  is  interesting  to  note  the  use   of  either  of  two  words  for  the  facet  instructions:  Refine  or  Narrow,  two  words  our  users  will   routinely  encounter  in  nonlibrary  searching.     The  various  sites  all  have  the  following  elements:   1.  search  box   2.  search  results  outcome  clearly  shown   3.  facet  instruction  [“refine,”  “narrow,”  “show”]     4.  facets         INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  JUNE  2015           82   Major  Problems  with  Library  Discovery  Interfaces   We  can  identify  three  important  areas  that  need  to  be  considered  with  the  discovery  interface   design:   1.  the  search  box  itself   2.  the  facet  labels  and  their  intelligibility   3.  getting  the  user  to  the  facets  area     The  Library  Search  Box   The  search  box  makes  an  excellent  point  of  departure  for  implementing  improvements  of  the   library’s  discovery  interfaces.  Note  that  companies  do  not  assume  prior  search  knowledge  on  the   part  of  their  potential  market;  they  explicitly  tell  people  what  they  can  do  in  the  search  box.  As  we   see  in  table  1,  many  companies  (e.g.,  Home  Depot  and  Lowe’s)  are  choosing  to  use  entire   sentences,  not  merely  clipped  phrases  or  strings  of  nouns.     Many  libraries  are  beginning  to  populate  the  search  box  with  text.  However,  that  text  is  often   simply  a  noun  list  of  types  of  formats,  e.g.,  articles,  books,  media,  etc.  It  is  important  to  point  out   that  there  is  an  implicit  expectation  of  an  action  present  in  a  search  box.  But  too  often  when  our   library  websites  supply  a  list  of  nouns,  we  are  assuming  that  we  are  answering  the  question  in  the   mind  of  the  searcher—they  are  looking  for  a  subject  or  topic—and  we  supply  a  string  of  nouns   that  enumerate  formats.  So  right  from  the  beginning,  we  find  a  mismatch  between  the  user’s   purpose  when  coming  to  a  library’s  search  box  and  our  arbitrary  enumeration  not  of  topics,  but  of   types  of  information  sources.   Once  we  recognize  this  problem,  we  have  some  very  good  options  to  choose  from  in  terms  of   personalizing  the  search  box  in  a  way  that  is  more  analogous  to  what  Home  Depot  and  Lowe’s   offer:   1.  What  are  you  looking  for?   The  sentence  above  is  colloquial;  it  is  exactly  what  a  person  would  expect  to  hear  when   approaching  a  reference  librarian  or  from  a  service  counter  experience  in  a  variety  of  settings.   2.  What  are  you  searching  for?   This  is  a  more  complex  concept  because  it  includes  what  can  be  considered  a  technical  term   (“search”),  a  word  now  commonly  understood  within  the  context  of  searching  for  information  and   not  only  applicable  to  a  lost  dog  or  strayed  notebook.     This  simple  adjustment  matches  the  user’s  intent  with  a  clearly  stated  purpose  in  the  search  box.   There  are  additional  ways  we  can  enrich  the  search  box  that  will  assist  the  users  in  their  queries.       WHAT’S  IN  A  WORD?  RETHINKING  FACET  HEADINGS  IN  A  DISCOVERY  SERVICE  |  NELSON  AND  TURNEY   doi:  10.6017/ital.v34i2.5629   83   Both  examples  use  the  pronoun  you  so  that  the  sentence  speaks  directly  to  the  individual  searcher.   There  is,  of  course,  the  option  to  just  use  a  verb  in  the  imperative:  “Search  for  .  .  .”    or  “Enter   [keywords,  terms,  etc.]”.  However,  the  added  feature  of  the  pronoun  you  promotes  the   involvement  of  the  participant-­‐searcher.  See  also  the  interesting  article  by  Thompson  on  the  use  of   personal  pronouns  in  social  media  communications  by  university  students.27     Facets  Column   All  library  discovery  services  make  use  of  facets.  Since  the  facets  column  does  constitute  a  far   more  challenging  area  of  linguistic  personalization  for  the  discovery  interface,  the  incorporation  of   specific  types  of  design  features  should  be  employed  to  immediately  attract  the  attention  of  the   user  to  the  facets  column.  This  is  a  very  complicated  area  that  deals  with  user  behavior,  interface   design,  etc.  How  do  we  direct  the  user’s  attention  to  the  facets  column,  let  alone  to  be  aware  of  the   facets  on  the  lefthand  side?  We  can  add  a  note  after  a  search  that  says  something  to  the  effect  of   “too  many  results/hits?  Try  narrowing  your  search  with  the  facets  below.”  Although  this  involves   difficult  interface  design  issues,  it  is  very  important  that  we  begin  to  think  more  seriously  about   ways  to  draw  our  users  into  the  search  process  more  intuitively  and  effectively.  If  we  don’t,  we   will  find  the  continual  underutilization  of  an  incredibly  powerful  searching  feature.   We  also  know  that  users  routinely  ignore  advertising  banners  so  often  that  the  literature  has   christened  this  tendency  “banner  blindness”;  in  the  same  way,  if  our  facet  labels  are  meaningless,   they  will  be  overlooked.28  We  condemn  the  discovery  service  interface  to  the  same  fate  if  we  are   not  careful  to  choose  meaningful  labels  that  make  sense  when  the  “average”  student  or  faculty   user  encounters  them.  Currently,  we  are  also  assuming  knowledge  on  the  part  of  our  users  that  is   clearly  misplaced  or  we  anticipate  a  much  greater  success  with  instruction  than  is  usually   warranted.  There  are  several  studies  that  show  the  disparity  between  the  searcher’s  self-­‐ assessment  and  the  reality  of  the  actual  skill  possessed.29   One  of  the  main  problems  users  experience  with  search  engines  is  their  inability  to  narrow  their   searches,  especially  because  we  are  now  dealing  with  such  a  large  array  of  information  source   types.30  This  is  where  the  use  of  facets  comes  into  its  own.  As  we  seek  to  make  the  discovery   interface  the  first  and,  eventually,  probably  the  only  primary  interface  to  our  selected  resources,   the  user  needs  to  know  how  to  easily  find  a  video  or  a  sound  recording  as  well  as  a  pertinent   article.  This  should  be  done  through  an  easily  accessible  and  understandable  search  interface.  The   success  of  the  e-­‐commerce  sites  in  making  effective  and  profitable  use  of  facets  amply   demonstrates  the  value  of  facets  even  for  complex  research  questions  and  topics.   This  brings  up  the  matter  of  naming  conventions  for  the  facets.  It  is  clear  that,  despite  the  newness   of  discovery  services,  the  facet  labels  simply  continue  the  naming  conventions  that  are  used  in   databases.  We  know  from  usability  studies  that  library  jargon  is  a  stumbling  block  for  our  users.31   When  we  do  not  pay  close  attention  to  the  appropriateness  of  each  facet  category  label,  we  simply   continue  the  utilization  of  a  terminology  that  is  foreign  to  the  understanding  of  many  of  our     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  JUNE  2015           84   users,32  undermining  the  use  of  a  powerful  searching  feature  merely  because  of  user  ignorance  of   the  terms.  An  honest  appraisal  of  the  discovery  interface  will  bring  us  immediately  face-­‐to-­‐face   with  one  of  our  primary  legacy  library  problems,  our  heavily  jargon-­‐laden  vocabulary.  In  fact,  we   are  actually  dealing  simultaneously  with  two  problems—the  facet  labels  that  are  chosen  and  the   complexity  of  the  information  universe  that  discovery  systems  expose.  At  a  presentation  on   discovery  services  at  the  2014  ALA  Annual  Conference,  one  speaker  went  so  far  as  to  say  that   facets  are  not  used  in  discovery  searches.33  This  underscores  the  unpleasant  reality  that  we  are   dealing  with  both  a  design  problem  and  an  intelligibility  problem,  not  the  failure  of  facets  as  a   navigational  feature.  At  a  recent  LOEX  presentation,  one  school  had  already  thrown  in  the  towel   and  will  concentrate  on  teaching  Academic  Search  Premier  over  the  discovery  service  Primo.34   Again,  this  reveals  that  users  are  having  a  problem  with  the  interface  and  its  display  content.     Suggestions  for  Improving  Facets  and  the  Facet  Labels   Currently,  the  facet  labels  in  library  discovery  service  interfaces  are  limited  to  a  list  of  nouns  that   designate  the  facets  that  can  be  used  for  narrowing  or  limiting  a  search.  However,  the  labels  that   we  use  may  not  be  meaningful  to  our  users  and  are  simply  a  list  of  nouns  that  are,  by  and  large,   not  really  understood.35  Second,  a  facet  label  is  also  intended  to  have  the  user  do  something,  hence   a  verb  of  action  is  implied.  In  standard  classification  taxonomies,  the  facet  is  used  for  organizing   and  grouping  the  objects  that  will  be  included  in  the  facet.  For  a  discovery  system,  the  facet  is   there  to  lead  the  searcher  to  content  on  the  basis  of  the  content’s  differing  characteristics  as   expressed  through  a  facet.  One  has  to  ask  the  question,  exactly  why  would  a  student  do  something   simply  because  that  student  sees  a  noun  on  the  lefthand  side?  We  need  to  provide  more  context   during  the  search  process.   Below  we  make  recommendations  that  we  think  will  enhance  the  intelligibility  and  the  usability  of   facets.36  It  will  be  important  for  libraries  and  vendors  to  do  substantial  user  experience   investigations  into  the  various  options  that  are  available  for  use  on  a  discovery  page.  Our  goal  is  to   draw  attention  to  the  current  inadequacies  in  how  facets  have  been  implemented  in  discovery   services  and  to  encourage  a  more  systematic  approach  to  this  important  area  of  our  library   information  delivery  capabilities.     1. As  observed  above,  in  the  e-­‐commerce  sites,  the  facet  is  indicated  by  the  presence  of  an   icon  marker  that  allows  for  the  facet  to  expand  and  contract.  In  our  sample  of  sites,   there  was  a  parity  between  using  the  +/-­‐  sign  or  a  triangle  (a  full  triangle,  not  a  right   and  downward  chevron).  EDS  made  the  decision  to  go  with  the  chevron  symbol.  This  is   a  user  interface  issue  and  one  that  needs  further  examination  and  testing.  We  think   that  the  +/-­‐  sign  is  a  more  suitable  visual  icon  indicator  for  a  user  to  take  a  specific   action.  +/-­‐  also  have  a  value  attached  to  them  that  says  to  a  user  “yes”  for  the  +  sign  and   “no”  for  the  -­‐  sign,  thereby  signaling  a  user  to  expand  (+)  or  contract  (-­‐)  a  list.  We  want   to  attract  users  to  the  facets  and  to  take  an  action.       WHAT’S  IN  A  WORD?  RETHINKING  FACET  HEADINGS  IN  A  DISCOVERY  SERVICE  |  NELSON  AND  TURNEY   doi:  10.6017/ital.v34i2.5629   85   2. Make  sure  that  only  facets  and  filters  are  collapsible  and  expendable  and  that  the   design  interface  makes  this  clear.   3. The  term  limit  is  often  found  in  discovery  systems.  This  is  a  term  that  was  not  found  in   our  sample  of  e-­‐commerce  sites.  The  two  primary  terms  are  refine  and  narrow.  The   advantage  of  using  these  terms  is  that  one  can  more  easily  personalize  this  feature,   “Narrow  your  results  to”  [Full  text]  [Scholarly    .  .  .]  [Date];  these  are  two  words  that   users  normally  see  when  searching  e-­‐commerce  sites.   4. The  facet  “source  types”  is  a  common  facet  label.  This  is  obscure  terminology  that  users,   especially  students,  tend  not  to  know.  A  suitable  option  to  personalize  this  category   could  be,  “What  type  of  information  do  you  need?”  and  then  list  the  types.  At  least  by   asking  the  question,  a  user  will  be  encouraged  to  look  at  the  possibilities  available,  e.g.,   academic  journals,  trade  publications,  magazines,  etc.     In  the  following  list  of  facets,  we  can  see  that  the  facets  themselves  are  inherently  contradictory  or   do  not  actually  represent  what  they  purport  to  be.  This  is  not  an  argument  against  facets;  rather,   we  need  to  rethink  exactly  what  we  do  want  our  metadata  to  do.  To  simply  take  up  space  on  the   facets  column  does  not  serve  any  purpose.  It  is  also  clear  that  we  need  to  systematically  monitor   the  use  of  facets,  and  for  this  we  need  analytics.  At  this  point,  it  is  difficult,  if  not  impossible,  to   know  whether  facets  have  been  used  for  searches  and,  if  so,  which  facets  have  been  used.  Until  we   routinely  gather  this  sort  of  data,  we  will  not  have  the  appropriate  data  to  make  suitable  decisions   about  facets  and  their  use.   1.     Language—This  facet  represents  the  language  (both  written  and  spoken  content)  of   the  work.  While  the  term  language  is  understood  by  users,  we  need  to  consider   whether  the  word  alone  triggers  a  response.  Since  users  most  likely  want  only  English,   the  facet  label  can  ask  that  question,  and  then  the  selection  of  language  choices  will   appear,  making  it  clear  that  there  are  other  choices  as  well.     A  question  like  “do  you  want  English  only?”  will  then  elicit  a  response  to  narrow  the   results  by  language.  With  the  majority  of  the  materials  in  English,  this  may  be  moot,  but   it  does  encourage  the  searcher  to  think  about  the  language.   The  discovery  layer  adds  the  facet  term  “undetermined”  when  the  provided  metadata   does  not  specify  the  language  of  a  work.  In  a  sense,  the  metadata  has  holes  and  a  user   that  is  searching  for  a  particular  language  will  inadvertently  exclude  relevant  search   results  if  the  facet  is  used  too  soon  to  filter  out  undesired  languages.  We  recommend   that  filtering  by  language  should  be  used  only  as  necessary  and  only  when   overwhelmed  by  a  large  number  of  unwanted  languages.     2.     Publisher—This  facet  represents  the  entity  or  the  issuer  of  a  published  work.  This   applies  across  both  serial  and  nonserial  materials.  The  user  most  likely  understands   this  term.  But  the  question  is,  what  is  the  value  of  this  facet?  While  we  do  have  the     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  JUNE  2015           86   metadata  for  this,  it  is  difficult  to  understand  the  circumstances  under  which  one  will   actually  limit  a  search  by  the  publisher.  We  suggest  not  displaying  this  facet.   3.     Publication—This  facet  represents  the  source  title  of  the  published  work,  such  as  a   journal,  trade  magazine,  or  newspaper.  This  applies  primarily  to  articles,  book  reviews,   columns,  etc.,  and  not  to  publications  like  books,  sound  recordings,  and  videos.  The   user  must  be  made  aware  that  the  use  of  this  facet  should  be  used  for  serial-­‐type   materials  only.  Alternatives  to  “publication”  can  be  “article  source.”  This  facet  answers   the  implicit  search  query  and  could  be  a  pop-­‐up  window:  “What  journal  or  magazine   are  you  looking  for?”   4.     Content  providers—This  is  a  very  problematic  facet.  It  is  not  difficult  to  surmise  that   most  users  when  encountering  this  term  would  not  know  what  it  means  and,  more   significantly,  why  it  is  important.  In  fact,  the  term  itself  is  not  accurate—another   interesting  issue  that  must  be  dealt  with.  The  “content  providers”  may  not  be  the   actual  providers  of  content  but  rather  providers  of  the  metadata  content,  which  is   something  altogether  different.  For  example,  Emerald  is  the  actual  content  provider  for   an  article,  yet  a  different  provider,  the  metadata  provider,  is  listed  as   the  content  provider.  A  suggested  replacement  for  this  term  is  “sources.”  Wordings  for   a  pop-­‐up  window  could  be,  “To  narrow  your  search,  choose  from  a  source  that  most   closely  matches  your  topic.  The  sources  are  from  different  types  of  subject  databases.”     5.     Subject—The  use  of  the  facet  “subject”  may  seem  to  be  obvious,  yet,  upon  closer   inspection,  the  nature  of  this  facet  is  problematic.  What  is  the  cognitive  connection   between  first  doing  a  keyword  search  and  then  seeing  on  the  lefthand  side  the  facet   label  “subject?”  Why  should  a  user  assume  he  or  she  should  now  click  on  a  link  called   “subject,”  since  they  just  finished  doing  a  subject  search?  We  need  to  provide  the   context  for  an  action  that  takes  into  account  the  most  common  experience  of  the  user.   Using  the  term  “topic”  rather  than  “subject”  would  allow  us  to  offer  a  term  that  is  more   congruent  with  the  familiar  vocabulary  of  a  student’s  classroom  experience  because   generally  students  are  directed  to  research  topics.     A  University  of  Washington  Libraries  usability  study  from  the  prediscovery  era  (2004)  found  that   users  preferred  “browse  subjects”  to  “by  subject.”  37  Here  we  see  the  presence  of  a  verb  specifying   an  action.  The  significant  finding  for  our  purposes  from  this  earlier  study  is  the  fact  that  users   found  the  phrase  with  a  verb  more  meaningful  than  the  phrase  with  a  preposition.  We  suggest   making  it  clear  that  the  user  can  further  refine  the  search  by  the  suggested  subjects  that  are  listed   in  the  facets  by  using  the  phrase  “narrow  your  topic”  or  “further  narrow  your  topic.”  The  pop-­‐up   window  could  say,  “To  narrow  your  search,  choose  from  this  list  of  possible  topics  that  most   closely  match  your  search  terms.”     WHAT’S  IN  A  WORD?  RETHINKING  FACET  HEADINGS  IN  A  DISCOVERY  SERVICE  |  NELSON  AND  TURNEY   doi:  10.6017/ital.v34i2.5629   87   The  conclusion  reached  by  the  University  of  Arizona  study  is  even  more  relevant  for  the  discovery   layer  interface:  “We  learned  that  if  students  have  no  idea  why  or  when  they  should  use  an  index,   they  will  not  choose  a  link  labeled  index,  no  matter  how  well  designed  the  web  page  is.”38  This  is   the  situation  with  facet  labels.  If  they  are  not  intelligible,  or  at  least  provoke  some  response  to  a   question  posed,  they  will  be  ignored,  and  if  ignored,  their  potential  value  goes  completely  unused.     CONCLUSION   E-­‐commerce  has  concluded,  in  the  face  of  overwhelmingly  positive  evidence,  that  facets  are  an   essential  aspect  of  the  successful  (i.e.,  profitable)  user  experience  and  that  they  have  been  almost   universally  adopted  by  companies  who  sell  products,  have  very  large  product  lines,  and  need  to   lead  their  customer  to  exactly  the  type  of  product  they  want.  In  our  discovery  layers,  we  also  need   to  develop  the  kinds  of  features  that  promote  the  effective  use  of  the  resources  we  offer  our   academic  users,  and  build  in,  where  feasible,  appropriate  features.  Modifications  can  and  should   be  made  as  libraries  work  with  their  discovery-­‐services  vendor  to  rationalize  an  interface  page   that  should  include  natural  language,  easily  understandable  navigation,  logical  taxonomic  ordering   of  the  facets,  etc.  In  essence,  both  product  searches  and  academic  information  searches  present   the  same  scenario:  we  begin  with  an  information  need,  a  retrieval  system,  and  the  need  to  achieve   recall,  precision,  and  relevance.     Discovery  services  allow  for  an  information  search  to  be  carried  out  essentially  as  a  Google  search   while  limiting  the  scope  of  facets  to  assistance  in  refining  it.  We  can  be  confident  that  our  users,   many  (or  even  most)  of  whom  also  use  e-­‐commerce  faceted  search  sites,  are  able  to  recognize  a   similar  search  interface.  Thus  we  are  dealing  with  an  important  design  issue.  But  to  what  extent   do  our  users  take  advantage  of  faceted  searches?  As  it  stands  at  this  writing,  the  link  between  the   facets  and  their  corresponding  content  “documents”  (articles  or  video)  is  simply  not  clear.  The   characteristics  of  our  discoverable  objects  must  be  tied  in  with  what  a  user  would  be  likely  to   understand.   We  need  analytics  capable  of  supplying  this  sort  of  critical  user-­‐experience  information.  It  may  be   that  we  are  perhaps  dealing  with  conflicting  mental  models  about  information  searching.  Students   and  other  members  of  the  academic  community  may  simply  not  be  adequately  cognizant  of  the   implicit  faceted  nature  of  their  query,  and  this  becomes  a  new  opportunity  for  improvements  in   our  approach  to  user  instruction.     It  is  clear  that  libraries  and  vendors  need  to  work  together  to  properly  evaluate  the  facet  labels  if   facets  are  to  begin  to  achieve  their  potential  as  an  essential  search  function.  Disheartening   statements  to  the  effect  that  no  one  uses  them,  or  that  the  discovery  system  itself  is  already   branded  a  failure,  demonstrates  that  the  discovery  layer,  while  clearly  a  powerful  tool  for   integrating  a  range  of  accessible  resource,  is  still  in  its  infancy.  Our  purpose  in  this  paper  was  to   draw  attention  to  both  the  proven  value  of  faceted  navigation  and  the  ongoing  problem  of     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  JUNE  2015           88   confusing  or  inadequately  understood  library  terminology  that  is  presently  hindering  what  should   be  a  powerful  tool  in  our  information  discovery  warehouse.   REFERENCES     1.     Google  is  ranked  number  1  according  to  Alexa,  a  traffic-­‐ranking  website.  “The  top  500  sites  on   the  web,”  Alexa,  accessed  May  9,  2014,  http://www.alexa.com/topsites.   2.     Irene  Lopatovska,  Megan  R.  Fenton  and  Sara  Campot,  “Examining  Preferences  for  Search   Engines  and  Their  Effects  on  Information  Behavior,”  Proceedings  of  the  American  Society  for   Information  Science  &  Technology  49,  no.  1  (2012):  1–11.   3.     Betsy  Sparrow,  Jenny  Liu  and  Daniel  M.  Wegner,  “Google  Effects  on  Memory:  Cognitive   Consequences  of  Having  Information  at  Our  Fingertips,”  Science  333,  no.  6043  (2011):  776–78;   Daniel  M.  Wegner  and  Adrian  F.  Ward,  “How  Google  Is  Changing  Your  Brain,”  Scientific   American  309,  no.  6  (2013):  58–61;  Robin  Marantz  Henig  and  Samantha  Henig,   Twentysomething:  Why  do  Young  Adults  Seem  Stuck?  (New  York:  Hudson  Street,  2012),  139– 43;  Matti  Näsi  and  Leena  Koivusilta,  “Internet  and  Everyday  Life:  The  Perceived  Implications   of  Internet  Use  on  Memory  and  Ability  to  Concentrate,”  Cyberpsychology,  Behavior,  and  Social   Networking  16,  no.  2  (2013):  88–93.   4.     Alison  J.  Head,  Learning  the  Ropes:  How  Freshmen  Conduct  Course  Research  Once  They  Enter   College  (Project  Information  Literacy,  December  5,  2013),   http://projectinfolit.org/images/pdfs/pil_2013_freshmenstudy_fullreport.pdf.   5.   Ibid.  2.   6.     Joshua  Steimle,  “What  Does  SEO  Cost?  [Infographic],”  Forbes,  September  12,  2013,   http://www.forbes.com/sites/joshsteimle/2013/09/12/what-­‐does-­‐seo-­‐cost-­‐infographic/.   7.     Frank  Scharnell,  “Guide  to  eCommerce  Facets,  Filters  and  Catelgories,”  YouMoz  (blog),  April   30,  2013,  http://moz.com/ugc/guide-­‐to-­‐ecommerce-­‐facets-­‐filters-­‐and-­‐categories   8.     Vanda  Broughton,  “Meccano,  Molecules,  and  the  Organization  of  Knowledge:  The  Continuing   Contribution  of  S.  R.  Ranganathan”  (presentation,  International  Society  for  Knowledge   Organization  UK  chapter,  London,  November  5,  2007),  2,   http://www.iskouk.org/presentations/vanda_broughton.pdf.   9.     Jody  Condit  Fagan,  “Discovery  Tools  and  Information  Literacy,”  Journal  of  Web  Librarianship  5,   no.  3  (2011):  171–78.   10.    Sarah  Ramdeen  and  Bradley  M.  Hemminger,  “A  Tale  of  Two  Interfaces:  How  Facets  Affect  the   Library  Catalog  Search,”  Journal  of  the  American  Society  for  Information  Science  &  Technology   63  (2012):  702–15.     WHAT’S  IN  A  WORD?  RETHINKING  FACET  HEADINGS  IN  A  DISCOVERY  SERVICE  |  NELSON  AND  TURNEY   doi:  10.6017/ital.v34i2.5629   89     11.    Amy  F.  Fyn,  Vera  Lux  and  Robert  J.  Snyder,  “Reflections  on  Teaching  and  Tweaking  a   Discovery  Layer,”  Reference  Services  Review  41,  no.  1  (2013):  113–24.  See  also  the  various   presentations  at  recent  LOEX  conferences.   12.    William  Badke,  “Pushing  a  Big  Rock  Up  a  Hill  All  Day:  Promoting  Information  Literacy  Skills,”   Online  Searcher  37,  no.  6  (2013):  67.   13.    See  the  following  blog  entry  on  library  jargon,  which  makes  observations  on  terms  such  as   “periodicals”  and  “databases”:  “Periodicals  and  Other  Library  Jargon,”  Mr.  Library  Dude  (blog),   March  18,  2011,  http://mrlibrarydude.wordpress.com/tag/library-­‐jargon/.  This  presentation   on  library  jargon  is  a  very  helpful  contribution  to  the  discussion:  Mark  Aaron  Polger,  “Re-­‐ thinking  Library  Jargon:  Maintaining  Consistency  and  Using  Plain  Language,”  (slideshow   presentation,  February  5,  2011),  http://www.slideshare.net/markaaronpolger/library-­‐ jargon-­‐newestjan2010feb2010-­‐6815908.     14.    We  are  referring  here  to  systems  such  as  EBSCO  EDS,  ProQuest  Summon,  Ex  Libris  Primo.   15.    Beth  Thomsett-­‐Scott  and  Patricia  E.  Reese,  “Academic  Libraries  and  Discovery  Tools:  A  Survey   of  the  Literature,”  College  &  Undergraduate  Libraries  19,  no.  2–4  (2012):  123–43;  Helen   Dunford,  review  of  Planning  and  Implementing  Resource  Discovery  Tools  in  Academic  Libraries,   by  Mary  Pagliero  Popp  and  Diane  Dallis,  The  Australian  Library  Journal  62,  no.  2  (2013):  175– 76.   16.    Sarah  Ramdeen  and  Bradley  M.  Hemminger,  “A  Tale  of  Two  Interfaces:  How  Facets  Affect  the   Library  Catalog  Search,”  Journal  of  the  American  Society  for  Information  Science  &  Technology   63  (2012):  713;  Kathleen  Bauer  and  Alice  Peterson-­‐Hart,  “Does  Faceted  Display  in  a  Library   Catalog  Increase  Use  of  Subject  Headings?,”  Library  Hi  Tech  30,  no.  2  (2012):  354;  Jody  Condit   Fagan,  “Usability  Studies  of  Faceted  Browsing:  A  Literature  Review,”  Information  Technology  &   Libraries  29,  no.  2  (2010):  62,  http://dx.doi.org/10.6017/ital.v29i2.3144.   17.    William  F.  Chickering  and  Sharon  Q.  Yang,  “Evaluation  and  Comparison  of  Discovery  Tools:  An   Update,”  Information  Technology  &  Libraries  33,  no.  2  (2014),   http://dx.doi.org/10.6017/ital.v33i2.3471.   18.    Vanda  Broughton,  “The  Need  for  a  Faceted  Classification  as  the  Basis  of  All  Methods  of   Information  Retrieval,”  Aslib  Proceedings  58,  no.  1/2  (2006):  49–72.   19.    Leonard  Will,  “Rigorous  Facet  Analysis  as  the  Basis  for  Constructing  Knowledge  Organization   Systems  (KOS)  of  All  Kinds”  (paper  presented  at  2013  ISKO  UK  Conference,  London,  July  8–9,   2013):  4,  http://www.iskouk.org/conf2013/papers/WillPaper.pdf.   20.    Marti  A.  Hearst,  “Design  Recommendations  for  Hierarchical  Faceted  Search  Interfaces,”  in   Proceedings  of  the  ACM  SIGIR  Workshop  on  Faceted  Search  (2006),   http://flamenco.sims.berkeley.edu/papers/faceted-­‐workshop06.pdf.     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  JUNE  2015           90     21.    Kathryn  La  Barre,  “Traditions  of  Facet  Theory,  or  a  Garden  of  Forking  Paths?,”  in  Facets  of   Knowledge  Organization:  Proceedings  of  the  ISKO  UK  Second  Biennial  Conference,  4th–5th  July,   2011,  London  (Bingley,  UK  :  Emerald,  2012),  96.   22.    Barre,  “Traditions  of  Facet  Theory,  or  a  Garden  of  Forking  Paths?,”  98.   23.    Frank  Scharnell,  “Guide  to  eCommerce  Facets,  Filters  and  Categories.”   24.    Andrew  D.  Asher,  Lynda  M.  Duke,  and  Suzanne  Wilson,  “Paths  of  Discovery:  Comparing  the   Search  effectiveness  of  EBSCO  Discovery  Service,  Summon,  Google  Scholar,  and  Conventional   Library  Resources,”  College  &  Research  Libraries  74,  no.  5  (2013):  464–88.     25.    Saverio  Perugini,  “Personalization  by  Website  Transformation:  Theory  and  Practice,”   Information  Processing  &  Management  46,  no.  3  (2010):  284;  Elizabeth,  F.  Churchill,  “Putting   the  Person  Back  into  Personalization,”  Elizabeth  F.  Churchill  (blog),  July  24,  2013,   http://elizabethchurchill.com/uncategorized/putting-­‐the-­‐person-­‐back-­‐into-­‐personalization/.   26.    Churchill,  “Putting  the  Person  Back  into  Personalization.”   27.    Celia  Thompson,  Kathleen  Gray,  and  Hyejeong  Kim,  “How  Social  are  Social  Media  Technologies   (SMTs)?  A  Linguistic  Analysis  of  University  Students’  Experiences  of  Using  SMTs  for  Learning,”   The  Internet  &  Higher  Education  21  (2014):  31–40,   http://dx.doi.org/10.1016/j.iheduc.2013.12.001.   28.    “Banner  Blindness  Studies,”  BannerBlindness.org,  accessed  April  7,  2014,   http://bannerblindness.org/banner-­‐blindness-­‐studies/.   29.    Melissa  Gross  and  Don  Latham,  “Undergraduate  Perceptions  of  Information  Literacy:  Defining,   Attaining,  and  Self-­‐Assessing  Skills,”  College  &  Research  Libraries  70,  no.  4  (2009):  336–50.   30.    See  the  section  “Most  internet  users  say  they  do  not  know  how  to  limit  the  information  that  is   collected  about  them  by  a  website,”  Pew  Report  2012,   http://www.pewinternet.org/2012/03/09/main-­‐findings-­‐11/#most-­‐internet-­‐users-­‐say-­‐ they-­‐do-­‐not-­‐know-­‐how-­‐to-­‐limit-­‐the-­‐information-­‐that-­‐is-­‐collected-­‐about-­‐them-­‐by-­‐a-­‐website.   31.    Chris  Jasek,  “How  to  Design  Library  Websites  to  Maximize  Usability,”  Library   Connect,  Pamphlet  5  (2007):  4,   http://libraryconnectarchive.elsevier.com/lcp/0502/lcp0502.pdf.  See  also  the  results   compiled  in  this  paper  of  fifty-­‐one  intelligibility  studies,  John  Kupersmith,  “Library  Terms  that   Users  Understand”  (University  of  California,  2012),   http://escholarship.org/uc/item/3qq499w7.   32.    Paige  Alfonzo,  “My  Library  Usability  Study  Stage  1,”  Librarian  Enumerations  (blog),  June  19,   2013,  http://librarianenumerations.wordpress.com/2013/06/19/library-­‐usability-­‐study/.   33.    “Discussing  Discovery  Services:  What's  Working,  What’s  Not  and  What’s  Next?”  (discussion   forum,  ALA  2014  Annual  Conference,  Las  Vegas,  Nevada,  June  29,  2014).       WHAT’S  IN  A  WORD?  RETHINKING  FACET  HEADINGS  IN  A  DISCOVERY  SERVICE  |  NELSON  AND  TURNEY   doi:  10.6017/ital.v34i2.5629   91     34.    Susan  Avery  and  Lisa  Janicke  Hinchliffe,  “Hopes,  Impressions,  and  Reality:  Is  a  Discovery  Layer   the  Answer?”  (program,  LOEX  2014  Annual  Conference,  Grand  Rapids,  Michigan,  May  8–10,   2014),   http://www.loexconference.org/2014/presentations/'LOEX2014_'Hopes%20Impressions%2 0and%20Reality-­‐AveryHinchliffe.pdf.   35.    Kupersmith,  “Library  Terms  that  Users  Understand.”   36.    We  are  taking  our  examples  from  EBSCO  EDS  with  which  we  are  most  familiar.  The  issues   discussed  are  common  to  all  discovery  systems.     37.    Kupersmith,  “Library  Terms  that  Users  Understand.”   38.    Ruth  Dickstein  and  Vicki  Mills,  “Usability  Testing  at  the  University  of  Arizona  Library:  How  to   Let  the  Users  in  on  the  Design,”  Information  Technology  and  Libraries  19,  no.  3  (2000):  144–51.   5631 ---- Microsoft Word - December_ITAL_gonzales_final.docx Linking  Libraries  to  the  Web:     Linked  Data  and  the  Future  of  the   Bibliographic  Record       Brighid  M.  Gonzales       INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  DECEMBER  2014           10   ABSTRACT   The  ideas  behind  Linked  Data  and  the  Semantic  Web  have  recently  gained  ground  and  shown  the   potential  to  redefine  the  world  of  the  web.  Linked  Data  could  conceivably  create  a  huge  database  out   of  the  Internet  linked  by  relationships  understandable  by  both  humans  and  machines.  The  benefits  of   Linked  Data  to  libraries  and  their  users  are  potentially  great,  but  so  are  the  many  challenges  to  its   implementation.  The  BIBFRAME  Initiative  provides  the  possible  framework  that  will  link  library   resources  with  the  web,  bringing  them  out  of  their  information  silos  and  making  them  accessible  to   all  users.   INTRODUCTION   For  many  years  now  the  MARC  (MAchine-­‐Readable  Cataloging)  format  has  been  the  focus  of   rampant  criticisms  across  library-­‐related  literature,  and  though  an  increasing  number  of  diverse   metadata  formats  for  libraries,  archives,  and  museums  have  been  developed,  no  framework  has   shown  the  potential  to  be  a  viable  replacement  for  the  long-­‐established  and  widely  used   bibliographic  format.  Over  the  past  decade,  web  technologies  have  been  advancing  at  a   progressively  rapid  pace,  outpacing  MARC’s  ability  to  keep  up  with  the  potential  these   technologies  can  offer  to  libraries.  Standing  by  the  MARC  format  leaves  libraries  in  danger  of  not   being  adequately  prepared  to  meet  the  needs  of  modern  users  in  the  information  environments   they  currently  frequent  (increasingly,  search  engines  such  as  Google).   New  technological  developments  such  as  the  ideas  behind  Linked  Data  and  the  Semantic  Web  have   the  potential  to  bring  a  host  of  benefits  to  libraries  and  other  cultural  institutions  by  allowing   libraries  and  their  carefully  cultivated  resources  to  connect  with  users  on  the  web.  Though  there   remains  a  host  of  obstacles  to  their  implementation,  Linked  Data  has  much  to  offer  libraries  if  they   can  find  ways  to  leverage  this  technology  for  their  own  uses.  Libraries  are  slowly  finding  ways  to   take  advantage  of  the  opportunities  Linked  Data  present,  including  initiatives  such  as  the   Bibliographic  Framework  Initiative,  known  as  BIBFRAME,  which  may  have  the  potential  to  be  the   bibliographic  replacement  for  MARC  that  the  information  community  has  long  needed.  Such  a   change  may  help  libraries  not  only  to  stay  current  with  the  modern  information  world  and  stay   relevant  in  the  minds  of  users,  but  also  reciprocally  create  a  richer  world  of  data  available  to   information  seekers  on  the  web.   Brighid  Gonzales  (brighidmgonzales@gmail.com),  a  recent  MLIS  recipient  from  the  School  of   Library  and  Information  Science,  San  Jose  State  University,  is  winner  of  the  2014  LITA/Ex  Libris   Student  Writing  Award.     LINKING  LIBRARIES  TO  THE  WEB  |  GONZALES       11   The  Limitations  of  MARC   Much  has  been  written  over  the  years  about  the  issues  and  shortcomings  of  the  MARC  format.   Nonetheless,  MARC  formatting  has  been  widely  used  by  libraries  around  the  world  since  the   1960s,  when  it  was  first  created.  This  long-­‐established  and  ubiquitous  usage  has  resulted  in   countless  legacy  bibliographic  records  that  currently  exist  in  the  MARC  format.  To  lose  this   carefully  crafted  data  or  to  expend  the  finances,  time,  and  manual  effort  required  to  convert  all  of   this  legacy  data  into  a  new  format  may  be  a  cause  for  reservation  in  the  community.   But  the  fact  remains  that  in  spite  of  its  widespread  use,  there  are  many  issues  with  the  MARC   format  that  make  it  a  candidate  for  replacement  in  the  world  of  bibliographic  data.   Andresen  describes  several  different  versions  of  MARC  that  have  largely  been  wrapped  together  in   the  community’s  mind,  reminding  us  that  “although  MARC21  is  often  described  as  an  international   standard,  it  is  only  used  in  a  limited  number  of  countries.”1  In  actuality,  what  we  often  refer  to   simply  as  MARC  could  be  MARC21,  UKMARC,  UNIMARC  or  even  danMARC2.2  This  lack  of  a  unified   standard  has  long  been  an  issue  with  this  particular  format.   Then  there  is  MARC’s  notorious  inflexibility.  Originally  created  for  the  description  of  printed   materials,  MARC’s  rigidly  defined  standards  can  make  it  unsuited  for  the  description  of  digital,   visual,  or  multimedia  resources.  Andresen  writes  that  “the  lack  of  flexibility  means  that  local   additions  might  hinder  exchange  between  local  systems  and  union  catalogue  systems.”3  Tennant   has  also  expressed  frustration  with  MARC’s  inflexibility,  particularly  its  inability  to  express   hierarchical  relationships.  Tennant  posits  that  where  the  MARC  format  is  “flat,”  expressing   relationships  involving  hierarchy,  such  as  in  a  table  of  contents,  “would  be  a  breeze  in  XML,”  which   is  the  format  he  recommends  moving  toward  for  its  greater  extensibility.4  MARC’s  rigidity  may  also   be  a  reason  why  the  format  is  not  generally  used  outside  of  the  library  environment;  thus   information  contained  in  MARC  format  cannot  be  exchanged  with  information  from  nonlibrary   environments.5   Inconsistencies,  errors,  and  localized  practices  are  also  issues  frequently  cited  in  detailing  MARC’s   inherent  shortcomings.  With  shared  cataloging,  inconsistencies  may  be  less  common,  but  there   remains  the  fact  that  with  any  number  of  individual  catalogers  creating  records,  the  potential  for   error  is  still  great.  And  any  localized  changes  can  also  create  inconsistency  in  records  from  library   to  library.  Tennant  gives  as  an  example  recording  the  editor  of  a  book,  which  “should  be  encoded  in   a  700  field,  with  a  $e  subfield  that  specifies  the  person  is  the  editor.  But  the  $e  subfield  is   frequently  not  encoded,  thus  leaving  one  to  guess  the  role  of  the  person  encoded  in  the  700  field.”6   When  it  comes  to  issues  with  MARC  in  the  modern  computing  environment,  however,  one  of  the   biggest  and  seemingly  insurmountable  problems  is  its  inability  to  express  the  relationships   between  entities.  Andresen  points  out  that  it  is  “difficult  to  handle  relations  between  data  that  are   described  in  different  fields,”7  while  Tennant  writes  that  “relationships  among  related  titles  are   problematic.”8  Alemu  et  al.  also  write  of  MARC’s  “document-­‐centric”  structure,  which  prevents  it     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  DECEMBER  2014   12   from  recognizing  relationships  between  entities  that  might  be  possible  in  a  more  “actionable  data-­‐ centric  format.”9   Though  Tennant  advocates  the  embrace  of  XML-­‐based  formats  as  a  way  to  transition  from  MARC,   Breeding  writes  that  even  MARCXML  “cannot  fully  make  intelligible  the  quirky  MARC  coding  in   terms  of  semantic  relationships.”10  Alemu  et  al.  also  note  that  MARC  may  continue  to  be  widely   used  mainly  because  alternatives,  including  XML,  have  not  yet  been  found  to  be  an  adequate   replacement.11   It  is  clear  that  if  libraries  and  their  carefully  crafted  bibliographic  records  are  to  remain  relevant   and  viable  in  today’s  modern  computing  world,  a  more  modern  metadata  format  that  addresses   these  issues  will  be  required.  Clearly  needed  is  a  more  flexible  and  extensible  format  that  allows   for  the  expression  of  relationships  between  points  of  data  and  the  ability  to  link  that  data  to  other   related  information  outside  of  the  presently  insular  library  catalog.   Linked  Data  and  the  Semantic  Web   Linked  Data  works  as  the  framework  behind  the  Semantic  Web,  an  idea  by  World  Wide  Web   inventor  Tim  Berners-­‐Lee,  which  would  turn  the  Internet  into  something  closer  to  one  large   database  rather  than  simply  a  disparate  collection  of  documents.  Since  the  Internet  is  often  the   first  place  users  turn  to  for  information,  libraries  should  take  advantage  of  the  concepts  behind   Linked  Data  to  both  put  their  resources  out  on  the  web,  where  they  can  be  found  by  users,  and  in   turn  bring  those  users  back  to  the  library  through  the  lure  of  authoritative,  high-­‐quality  resources.   In  the  world  of  Linked  Data,  the  relationships  between  data,  not  just  the  documents  in  which  they   are  contained,  are  made  explicit  and  readable  by  both  humans  and  machines.  With  the  ability  to   “understand”  and  interpret  these  semantically  explicit  connections,  computers  will  have  the  power   to  lead  users  to  a  web  of  related  data  based  on  a  single  information  search.  Underpinning  the   Semantic  Web  are  the  web-­‐specific  standards  XML  and  RDF  (Resource  Description  Framework).   These  work  as  universal  languages  for  semantically  labeling  data  in  such  a  way  that  both  a  person   and  a  computer  can  interpret  their  meaning  and  then  distinguishing  the  relationships  between  the   various  data  sources.   These  relationships  are  expressed  using  RDF,  “a  flexible  standard  proposed  by  the  W3C  to   characterize  semantically  both  resources  and  the  relationships  which  hold  between  them.”12  Baker   notes  that  RDF  supports  “the  process  of  connecting  dots—of  creating  “knowledge”—by  providing   a  linguistic  basis  for  expressing  and  linking  data.”13  RDF  is  organized  into  triples,  expressing   meaning  as  subject,  verb,  and  object  and  detailing  the  relationships  between  them.  An  example  is   The  Catcher  in  the  Rye  is  written  by  J.  D.  Salinger,  where  The  Catcher  in  the  Rye  acts  as  the  subject,  J.   D.  Salinger  is  the  object  and  the  “verb”  is  written  by  expresses  the  semantic  relationship  between   the  two,  naming  J.  D.  Salinger  as  the  author  of  The  Catcher  in  the  Rye.  By  using  this  framework,   computers  can  link  to  other  RDF-­‐encoded  data,  leading  users  to  other  works  written  by  J.  D.   Salinger,  other  adaptations  of  The  Catcher  in  the  Rye,  and  other  related  data  sources  from  around     LINKING  LIBRARIES  TO  THE  WEB  |  GONZALES       13   the  web.   RDF  gives  machines  the  ability  to  “understand”  the  semantic  meaning  of  things  on  the  web  and  the   nature  of  the  relationships  between  them.  In  this  way  it  can  make  connections  for  people,  leading   them  to  related  information  they  may  not  have  otherwise  found.  The  use  of  XML  allows  developers   to  create  their  own  tags,  adding  an  explicit  semantic  structure  to  their  documents  that  they  can   exploited  using  RDF.   The  Semantic  Web  is  based  on  four  rules  explicated  by  web  inventor  Tim  Berners-­‐Lee.  The  rules   for  the  Semantic  Web  are  as  follows:   1. Use  URIs  (uniform  resource  identifiers)  as  names  for  things.   2. Use  HTTP  URIs  so  that  people  can  look  up  those  names.   3. When  someone  looks  up  a  URI,  provide  useful  information,  using  the  standards  (RDF*,   SPARQL).   4. Include  links  to  other  URIs  so  that  they  can  discover  more  things.14   URIs  act  as  a  permanent  signpost  for  things,  both  on  and  off  the  web.  Using  consistent  URIs  allows   data  to  be  linked  between  and  back  to  certain  places  on  the  web  without  the  worry  of  broken  or   dead  links.  RDF  triples  map  the  relationships  between  each  thing,  which  can  then  be  linked  to   more  things,  opening  up  a  wide  world  of  interrelated  data  for  users.     The  concept  behind  Linked  Data  would  allow  for  the  integration  of  library  data  and  data  from   other  resources,  whether  from  “scientific  research,  government  data,  commercial  information,  or   even  data  that  has  been  crowd-­‐sourced.”15  However,  to  create  an  open  web  of  data  facilitated  by   Linked  Data  theories,  open  standards  such  as  RDF  must  be  used,  making  data  interoperable  with   resources  from  various  communities.  This  interoperability  is  key  to  being  able  to  mix  library   resources  with  those  from  other  parts  of  the  web.   Interoperability  helps  to  make  “data  accessible  and  available,  so  that  they  can  be  processed  by   machines  to  allow  their  integration  and  their  reuse  in  different  applications.”16  In  this  way,   machines  would  be  able  to  understand  the  relationships  and  connections  between  data  contained   within  documents  and  thus  lead  users  to  related  data  they  may  not  have  otherwise  found.  Using   Linked  Data  would  bring  carefully  crafted  and  curated  library  data  out  of  the  information  silos  in   which  they  have  long  been  enclosed  and  connect  them  with  the  rest  of  the  web  where  users  can   more  easily  find  them.   Benefits  For  Libraries   Libraries  and  their  users  have  much  to  gain  from  participation  in  the  Linked  Data  movement.  In  an   age  when  Google  is  often  the  first  place  users  turn  when  searching  for  information,  freeing  library   data  from  their  insulated  databases  and  getting  them  out  onto  the  web  where  the  users  are  can   help  make  library  resources  both  relevant  and  available  for  users  who  may  not  make  the  library     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  DECEMBER  2014   14   the  first  place  they  look  for  information.  This  can  lead  not  only  to  increased  use  by  library  patrons   and  nonpatrons  (who  would  now  be  potential  library  patrons)  alike,  but  also  to  increased  visibility   for  the  library.  Creating  and  using  Linked  Data  technologies  also  opens  the  door  for  libraries  to   share  metadata  and  other  information  in  a  way  previously  limited  by  MARC.  Libraries  also  have   the  potential  to  add  to  the  richness  of  data  that  is  available  on  the  web,  creating  a  reciprocal   benefit  with  the  Semantic  Web  itself.   Coyle  writes  that  “every  minute  an  untold  number  of  new  resources  is  added  to  our  digital  culture,   and  none  of  these  is  under  the  bibliographic  control  of  the  library.”17  Indeed,  the  World  Wide  Web   is  a  participatory  environment  where  anyone  can  create,  edit  or  manipulate  information  resources.   Libraries  still  consider  themselves  the  province  of  quality,  reliable  information,  but  users  don’t   necessarily  go  to  libraries  when  searching  and  don’t  necessarily  have  the  Internet  acumen  to   distinguish  between  authoritative  information  and  questionable  resources.  Coyle  also  notes  that   “the  push  to  move  libraries  in  the  direction  of  linked  data  is  not  just  a  desire  to  modernize  the   library  catalog;  it  represents  the  necessity  to  transform  the  library  catalog  from  a  separate,  closed   database  to  an  integration  with  the  technology  that  people  use  for  research.”18  Using  Linked  Data,   libraries  can  still  create  the  rich,  reliable,  authoritative  data  they  are  known  for  while  also  making   it  available  on  the  web,  where  potentially  anyone  can  find  it.   Much  has  been  written  about  libraries’  information  silos,  and  many  researchers  are  finding  in   Linked  Data  the  possibility  to  free  this  information.  For  the  information  contained  in  the  library   catalog  to  be  significantly  more  usable  it  “must  be  integrated  into  the  web,  queryable  from  it,  able   to  speak  and  to  understand  the  language  of  the  web.”19  Alemu  et  al.  write  that  linking  library  data   to  the  web  “would  allow  users  to  navigate  seamlessly  between  disparate  library  databases  and   external  information  providers  such  as  other  libraries,  and  search  engines.”20  Users  are  likely  to   find  the  world  of  Linked  Data  immeasurably  more  useful  than  individually  searching  library   databases  one-­‐by-­‐one  or  relying  on  Google  search  results  for  the  information  they  need.   Linked  Data  also  allows  for  the  possibility  of  serendipity  in  information  searching,  of  finding   information  one  didn't  even  know  they  were  looking  for,  something  akin  to  browsing  the  library   shelves.21  Linked  Data  “allows  for  the  richer  contextualization  of  sources  by  making  connections   not  only  within  collections  but  also  to  relevant  outside  sources.”22  Tillett  adds  that  Linked  Data   would  allow  for  “mashups  and  pathways  to  related  information  that  may  be  of  interest  to  the  Web   searcher—either  through  showing  them  added  facets  they  may  wish  to  consider  to  refine  their   search  or  suggesting  new  directions  or  related  resources  they  may  also  like  to  see.”23   The  use  of  Linked  Data  is  not  just  beneficial  to  users  though.  Libraries  are  also  likely  to  see   increased  benefits  in  the  sharing  of  metadata  and  other  resources.  Alemu  et  al.  write  that  “making   library  metadata  available  for  re-­‐use  would  eliminate  unnecessary  duplication  of  data  that  is   already  available  elsewhere,  through  reliable  sources.24  Tillett  also  writes  about  the  reduced  cost   to  libraries  for  storage  and  data  in  a  linked  data  environment  where  “libraries  do  not  need  to   replicate  the  same  data  over  and  over,  but  instead  share  it  mutually  with  each  other  and  with     LINKING  LIBRARIES  TO  THE  WEB  |  GONZALES       15   others  using  the  Web,”  reducing  costs  and  expanding  information  accessibility.25  Byrne  and   Goddard  also  note  that  “having  a  common  format  for  all  data  would  be  a  huge  boon  for   interoperability  and  the  integration  of  all  kinds  of  systems.”26   In  addition  to  the  reduced  cost  of  shared  resources,  something  with  which  libraries  are  already   very  familiar,  the  linking  of  data  from  libraries  to  one  another  and  to  the  web  would  also  allow  for   an  increased  richness  in  overall  data.  From  metadata  that  may  need  to  be  changed  or  updated   periodically  to  user-­‐generated  metadata  that  is  more  likely  to  include  current,  up-­‐to-­‐date   terminology,  the  “mixed  metadata”  approach  allowed  by  Linked  Data  would  be  “better  situated  to   provide  a  richer  and  more  complete”  description  of  various  resources  that  could  more  accurately   provide  for  the  variety  of  interpretation  and  terminology  possible  in  their  description.27   A  New  Bibliographic  Framework   One  of  the  most  important  ways  libraries  are  moving  toward  the  world  of  Linked  Data  is  with  the   Bibliographic  Framework  Initiative,  known  as  BIBFRAME,  which  was  announced  by  the  Library  of   Congress  in  2011.  Since  then,  though  BIBFRAME  is  still  in  development,  rapid  progress  has  been   made  that  suggests  that  BIBFRAME  may  be  the  long-­‐awaited  replacement  for  the  MARC  format   that  could  free  library  bibliographic  information  from  its  information  silos  and  allow  it  to  be   integrated  with  the  wider  web  of  data.   The  BIBFRAME  model  comprises  four  classes:  Creative  Work,  Instance,  Authority,  and  Annotation.   In  this  model,  Creative  Work  represents  the  “conceptual  essence”  of  the  item.  Instance  is  the   “material  embodiment”  of  the  Creative  Work.  Authority  is  a  resource  that  defines  relationships   reflected  by  the  Creative  Work  and  Instance,  such  as  People,  Places,  Topics,  and  Organizations.   Annotation  relates  the  Creative  Work  with  other  information  resources,  which  could  be  library   holdings  information,  cover  art,  or  reviews.28  These  are  similar  in  a  way  to  the  FRBR  (Functional   Requirements  for  Bibliographic  Records)  model,  which  uses  Work,  Expression,  Manifestation,  and   Item.29  Indeed,  BIBFRAME  is  built  with  RDA  (Resource  Description  and  Access)  as  an  important   source  for  content,  which  was  in  turn  built  around  the  principles  in  FRBR.  Despite  this,  BIBFRAME   “aims  to  be  independent  of  any  particular  set  of  cataloging  rules.”30   Realizing  the  vast  amounts  of  information  that  is  still  recorded  in  MARC  format,  the  BIBFRAME   initiative  is  also  working  on  a  variety  of  tools  that  will  help  to  transform  legacy  MARC  records  into   BIBFRAME  resources.31  These  tools  will  be  essential  as  “the  conversion  of  MARC  records  to   useable  Linked  Data  is  a  complicated  process.”32  Where  MARC  allowed  for  libraries  to  share   bibliographic  records  without  each  having  to  constantly  reinvent  the  wheel,  BIBFRAME  will  allow   library  metadata  to  be  “shared  and  reused  without  being  transported  and  replicated.”33   BIBFRAME  would  support  the  Linked  Data  model  while  also  incorporating  emerging  content   standards  such  as  FRBR  and  RDA.34  The  BIBFRAME  initiative  is  committed  to  compatibility  with   existing  MARC  records  but  would  eventually  replace  MARC  as  a  bibliographic  framework  “agnostic   to  cataloging  rules”35  rather  than  intertwined  with  them  as  MARC  was  with  AACR2.  Also  unlike     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  DECEMBER  2014   16   MARC,  which  is  rigidly  structured  and  not  amenable  to  incorporation  with  web  standards,   BIBFRAME  would  enable  library  metadata  to  be  found  on  the  web,  freeing  it  from  the  information   silos  that  have  contained  it  for  decades.  Whereas  MARC  is  not  very  web-­‐compatible,  “BIBFRAME  is   built  on  XML  and  RDF,  both  ‘native’  schemas  for  the  internet.  The  web-­‐friendly  nature  of  these   schemas  allows  for  the  widest  possible  indexing  and  exposure  for  the  resources  held  in   libraries.”36   Backed  by  the  Library  of  Congress,  BIBFRAME  already  has  a  great  deal  of  support  throughout  the   information  community,  though  it  is  not  yet  at  the  stage  of  implementation  for  most  libraries.   However,  half  a  dozen  libraries  and  other  institutions  are  acting  as  “Early  Experimenters”  working   to  implement  and  experiment  with  BIBFRAME  to  assist  in  the  development  process  and  get  the   framework  library  ready.  Participating  institutions  include  the  British  Library,  George  Washington   University,  Princeton  University,  Deutsche  National  Bibliothek,  National  Library  of  Medicine,  OCLC,   and  the  Library  of  Congress.37  Though  not  yet  fully  realized,  BIBFRAME  seems  to  offer  a   substantial  step  toward  the  implementation  of  Linked  Data  to  connect  library  bibliographic   materials  with  other  resources  on  the  web.   The  Challenges  Ahead   The  road  to  widespread  use  of  the  Semantic  Web,  Linked  Data,  and  even  possible  implementations   such  as  BIBFRAME  is  not  without  obstacles.  For  one,  knowledge  and  awareness  is  a  major  concern,   as  well  as  the  intimidating  thought  of  transitioning  away  from  MARC,  a  standard  that  has  been  in   widespread  use  for  as  long  as  many  of  the  professionals  using  it  have  been  alive.  There  is  also  the   challenge  and  significant  resources  required  for  converting  huge  stores  of  legacy  data  from  MARC   format  to  a  new  standard.  In  addition,  Linked  Data  has  its  own  set  of  specific  concerns,  such  as   legality  and  copyright  issues  involved  in  the  sharing  of  information  resources,  as  well  as  the   willingness  of  institutions  to  share  metadata  that  they  may  have  invested  a  great  deal  of  time  and   money  in  creating.   Many  organizations  may  be  hesitant  to  make  the  move  toward  Linked  Data  without  a  clear  sign  of   success  from  other  institutions.  Chudnov  writes  that  “a  new  era  of  information  access  where   library-­‐provided  resources  and  services  rose  swiftly  to  the  top  of  ambient  search  engines’  results   and  stayed  there”  is  what  may  be  necessary,  as  well  as  “tools  and  techniques  that  make  it  easier  to   put  content  online  and  keep  it  there.”38  Byrne  and  Goddard  also  note  that  “Linked  Data  becomes   more  powerful  the  more  of  it  there  is.  Until  there  is  enough  linking  between  collections  and   imaginative  uses  of  data  collections  there  is  a  danger  librarians  will  see  linked  data  as  simply   another  metadata  standard,  rather  than  the  powerful  discovery  tool  it  will  underpin.”39    Alemu  et  al.  concur  that  making  Linked  Data  easy  to  create  and  put  online  is  necessary  before   potential  implementers  will  begin  to  use  it.  “It  is  imperative  that  the  said  technologies  be  made   relatively  easy  to  learn  and  use,  analogous  to  the  simplicity  of  creating  HTML  pages  during  the   early  days  of  the  web.”40  The  potential  learning  curve  involved  in  Linked  Data  may  be  a  great   barrier  to  its  potential  use.  Tennant  writes  in  an  article  about  moving  away  from  MARC  to  a  more     LINKING  LIBRARIES  TO  THE  WEB  |  GONZALES       17   modern  bibliographic  framework  that  users  “must  dramatically  expand  our  understanding  of  what   it  means  to  have  a  modern  bibliographic  infrastructure,  which  will  clearly  require  sweeping   professional  learning  and  retooling.”41   Even  without  considering  ease-­‐of-­‐use  difficulties  or  the  challenges  in  teaching  practitioners  an   entirely  new  bibliographic  system,  the  fact  remains  that  transitioning  away  from  MARC  toward  any   new  bibliographic  infrastructure  system  will  require  a  great  deal  of  resources,  time  and  effort.   “There  are  literally  billions  of  records  in  MARC  formats;  an  attempt  at  making  the  slightest  move   away  from  it  would  have  huge  implications  in  terms  of  resources.”42  Breeding  also  writes  of  the   potential  trauma  involved  in  shifting  away  from  MARC,  which  is  currently  integral  to  many  library   automation  systems.43  A  shift  to  anything  else  would  require  not  just  the  cooperation  of  libraries   but  also  of  vendors,  who  may  see  no  reason  to  create  systems  compatible  with  anything  other  than   MARC.  As  Tennant  writes,  “Anyone  who  has  ever  been  involved  with  migrating  from  one  integrated   library  system  to  another  knows,  even  moving  from  one  system  based  on  MARC/AACR2  to  another   can  be  daunting.”44  Moving  from  a  MARC/AACR2-­‐based  system  to  one  based  on  an  entirely  new   framework  may  be  more  of  a  challenge  than  many  libraries  would  like  to  take  on.   A  move  to  something  such  as  BIBFRAME  may  be  fraught  with  even  more  difficulty,  though  it  is   impossible  to  say  before  such  an  implementation  has  been  fully  realized.  Library  system  software   is  not  yet  compatible  with  BIBFRAME,  and  as  Kroeger  writes,  “Most  libraries  will  not  be  able  to   implement  BIBFRAME  because  their  systems  do  not  support  it,  and  software  vendors  have  little   incentive  to  develop  BIBFRAME  integrated  library  systems  without  reasonable  certainty  of  library   implementation  of  BIBFRAME.”45  This  catch-­‐22  situation  may  be  difficult  to  remedy  without  a   large  cooperative  effort  between  libraries,  vendors,  and  the  entire  information  community.   Another  potential  obstacle  to  BIBFRAME  implementation  that  Kroeger  suggests  is  the  possible   difficulty  in  providing  interoperability  with  all  of  the  many  other  metadata  standards  currently  in   existence.46  This  is  an  issue  that  Tennant  also  considers  in  his  recommendations  that  a  new   bibliographic  infrastructure  compatible  with  modern  library  and  information  needs  must  be   versatile,  extensible,  and  especially  interoperable  with  other  metadata  schemes  currently  in  use.47   XML  has  proven  to  be  useful  for  a  wide  variety  of  metadata  schemas,  but  BIBFRAME  would  need  to   be  able  to  make  library  data  held  in  a  huge  variety  of  metadata  standards  available  for  use  on  the   web.   Another  issue,  cited  by  Byrne  and  Goddard,  is  that  of  privacy.  “Librarians,  with  their  long  tradition   of  protecting  the  privacy  of  patrons,  will  have  to  take  an  active  role  in  linked  data  development  to   ensure  rights  are  protected.”48  Issues  of  copyright  and  ownership,  something  libraries  already   grapple  with  in  the  licensing  of  various  library  journals,  databases,  and  other  electronic  resources,   may  be  insurmountable.  “Libraries  no  longer  own  much  of  the  content  they  provide  to  users;   rather  it  is  subscribed  to  from  a  variety  of  vendors.  Not  only  does  that  mean  that  vendors  will  have   to  make  their  data  available  in  linked  data  formats  for  improvements  to  federated  search  to   happen,  but  a  mix  of  licensed  and  free  content  in  a  linked  data  environment  would  be  extremely     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  DECEMBER  2014   18   difficult  to  manage.”49  Again,  overcoming  obstacles  such  as  these  would  require  intense   negotiation  and  cooperation  between  libraries  and  vendors.  A  sustainable  and  viable  move  to  a   Linked  Data  environment  would  need  to  be  a  cooperative  effort  between  all  involved  parties  and   would  have  to  have  the  full  support  and  commitment  of  everyone  involved  before  it  could  begin  to   move  forward.   Moving  Libraries  toward  Linked  Data   Making  the  move  toward  the  use  of  Linked  Data  and  modern  bibliographic  implementations  such   as  BIBFRAME  will  require  a  great  deal  of  cooperation,  sharing,  learning,  and  investigation,  but   libraries  are  already  starting  to  look  toward  a  linked  future  and  what  it  will  take  to  get  there.   Libraries  will  need  to  begin  incorporating  the  principles  of  Linked  Open  Data  in  their  own  catalogs   and  online  resources  as  well  as  publishing  and  sharing  as  much  data  as  possible.  Libraries  also   need  to  put  forth  a  concerted  effort  to  encourage  vendors  to  move  toward  library  systems  which   can  accommodate  a  linked  data  environment.   Alemu  et  al.  write  that  cooperation  and  collaboration  between  all  of  the  involved  stakeholders  will   be  a  crucial  piece  to  the  transfer  of  library  metadata  from  catalog  to  web.  In  the  process,  and  as   part  of  this  cooperative  effort,  libraries  will  have  to  wholeheartedly  adopt  the  RDF/XML  format,   something  Alemu  et  al.  deem  “mandatory.”50  This  would  support  the  “conceptual  shift  from   perceiving  library  metadata  as  a  document  or  record  to  what  Coyle  (2010)  terms  as  actionable   metadata,  i.e.,  one  that  is  machine-­‐readable,  mash-­‐able  and  re-­‐combinable  metadata.”51   Chudnov  adds  that  libraries  will  need  to  follow  “steady  URL  patterns”  for  as  much  of  their   resources  as  possible,  one  of  the  key  rules  of  Linked  Data.  52  He  also  notes  that  we  will  know  we   have  made  progress  on  the  implementation  of  Linked  Data  when  “link  hubs  at  smaller  libraries   (aka  catalogs  and  discovery  systems)  cross  link  between  local  holdings,  authorities,  these  national   authority  files,  and  peer  libraries  that  hold  related  items,”  though  the  real  breakthrough  will  come   when  “the  big  national  hubs  add  reciprocal  links  back  out  to  smaller  hub  sites.”53  Before  this  can   happen,  however,  libraries  must  make  sure  that  all  of  their  own  holdings  link  to  each  other,  from   the  catalog  to  items  in  online  exhibits.  Chudnov  also  advocates  adding  user-­‐generated  knowledge   into  the  mix  by  allowing  users  to  make  new  connections  between  resources  when  and  where  they   can.54   Borst,  Fingerle,  and  Neubert,  in  their  conference  report  from  2009,  write  that  libraries  and   projects  using  linked  data  need  to  regard  the  catalog  as  a  network,  publish  their  data  as  Linked   Data  using  the  Semantic  Web  standards  laid  out  by  Tim  Berners-­‐Lee,  and  link  to  external  URIs.55   They  also  suggest  libraries  use  and  help  to  further  develop  open  standards  that  are  already   available  rather  than  rely  on  in-­‐house  developments.56  In  their  final  recommendation,  they  write   that  while  libraries  need  to  publish  their  data  as  open  Linked  Data  on  the  web,  they  should  also  try   to  do  so  with  the  “least  possible  restrictions  imposed  by  licences  in  order  to  ensure  widest  re-­‐ usability.”57     LINKING  LIBRARIES  TO  THE  WEB  |  GONZALES       19   CONCLUSION   The  theories  behind  Linked  Data  and  the  Semantic  Web  are  still  in  the  process  of  being  drawn  out,   but  it  is  clear  that  at  this  point  they  are  more  than  hypotheticals.  Linked  Data  is  the  possible  future   of  the  web  and  how  information  will  be  organized,  searched  for,  discovered,  and  retrieved.  As   search  algorithms  continue  to  improve  and  users  continue  to  turn  to  them  first  (and  sometimes   entirely)  for  their  information  needs,  libraries  will  need  to  make  major  changes  to  ensure  the  data   they  have  painstaking  created  and  curated  over  the  decades  remains  relevant  and  reachable  to   users  on  the  web.  Linked  Data  provides  the  opportunity  for  libraries  to  integrate  their   authoritative  data  with  user-­‐generated  data  from  the  web,  creating  a  rich  network  of  reliable,   current,  far-­‐reaching  resources  that  will  meet  users’  needs  wherever  they  are.   Libraries  have  always  been  known  to  embrace  technology  to  stay  at  the  forefront  of  user  needs   and  provide  unique  and  irreplaceable  user  services.  To  stay  current  with  shifts  in  modern   technology  and  user  behavior,  libraries  need  to  be  a  driving  force  in  the  implementation  of  Linked   Data,  embrace  Semantic  Web  standards,  and  take  full  advantage  of  the  benefits  and  opportunities   they  present.  Ultimately,  libraries  can  leverage  the  advantages  created  by  Linked  Data  to  construct   a  better  information  experience  for  users,  keeping  libraries  both  a  relevant  and  more  highly  valued   part  of  information  retrieval  in  the  twenty-­‐first  century.   REFERENCES     1.   Leif  Andresen,  “After  MARC—What  Then?”  Library  Hi  Tech  22,  no.  1  (2004):  41.   2.     Ibid.,  40-­‐51.   3.     Ibid.,  43.     4.     Roy  Tennant,  “MARC  Must  Die,”  Library  Journal  127,  no.  17  (2002):  26–28,   http://lj.libraryjournal.com/2002/10/ljarchives/marc-­‐must-­‐die/#_.     5.     Andresen,  “After  MARC—What  Then?”   6.   Tenant,  “MARC  Must  Die.”   7.     Andresen,  “After  MARC—What  Then?”,  43.   8.     Tenant,  “MARC  Must  Die.”     9.     Getaneh  Alemu  et  al.,  “Linked  Data  for  Libraries:  Benefits  of  a  Conceptual  Shift  From  Library-­‐ Specific  Record  Structures  to  RDF-­‐based  Data  Models,”  New  Library  World  113,  no.  11/12   (2012):  549-­‐570,  http://dx.doi.org/10.1108/03074801211282920.     10.    Marshall  Breeding,  “Linked  Data:  The  Next  Big  Wave  or  Another  Tech  Fad?,”  Computers  in   Libraries  33,  no.  3  (2013):  20-­‐22,  http://www.infotoday.com/cilmag/.     11.    Alemu  et  al.,  “Linked  Data  for  Libraries.”     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  DECEMBER  2014   20     12.    Mauro  Guerrini  and  Tiziana  Possemato,  “Linked  Data:  A  New  Alphabet  for  the  Semantic  Web,”   Italian  Journal  of  Library  &  Information  Science  4,  no.  1  (2013):  79-­‐80,     http://dx.doi.org/10.4403/jlis.it-­‐6305.   13.    Tom  Baker,  “Designing  Data  for  the  Open  World  of  the  Web,”  Italian  Journal  of  Library  &   Information  Science  4,  no  1  (2013):  64,  http://dx.doi.org/10.4403/jlis.it-­‐6308.   14.    Tim  Berners-­‐Lee,  “Linked  Data,”  W3.org,  last  modified  June  18,   2009,  http://www.w3.org/DesignIssues/LinkedData.html.     15.    Karen  Coyle,  “Library  Linked  Data:  An  Evolution,”  Italian  Journal  of  Library  &  Information   Science  4,  no  1  (2013):  58,  http://dx.doi.org/10.4403/jlis.it-­‐5443.   16.    Gianfranco  Crupi,  “Beyond  the  Pillars  of  Hercules:  Linked  Data  and  Cultural  Heritage,”  Italian   Journal  of  Library  &  Information  Science  4,  no.  1  (2013),  36,    http://dx.doi.org/10.4403/jlis.it-­‐ 8587.     17.    Coyle,  “Library  Linked  Data:  An  Evolution,”  56.       18.    Ibid.,  56-­‐57.     19.    Crupi,  “Beyond  the  Pillars  of  Hercules,”  35.       20.   Alemu  et  al.,  “Linked  Data  for  Libraries,”  562.   21.    Ibid.   22.   Thea  Lindquistet  al.,  “Using  Linked  Open  Data  to  Enhance  Subject  Access  in  Online  Primary   Sources,”  Cataloging  &  Classification  Quarterly  51  (2013):  913-­‐928,   http://dx.doi.org/10.1080/01639374.2013.823583.   23.    Barbara  Tillett,  “RDA  and  the  Semantic  Web,  Linked  Data  Environment,”  Italian  Journal  of   Library  &  Information  Science  4,  no.  1  (2013):  140,  http://dx.doi.org/10.4403/jlis.it-­‐6303.     24.    Alemu  et  al.,  “Linked  Data  for  Libraries.”     25.    Tillett,  “RDA  and  the  Semantic  Web,  Linked  Data  Environment,”  140.     26.    Gillian  Byrne  and  Lisa  Goddard,  “The  Strongest  Link:  Libraries  and  Linked  Data,”  D-­‐Lib   Magazine  16,  no.  11/12  (2010),  http://dx.doi.org/10.1045/november2010-­‐byrne.   27.    Alemu  et  al.,  “Linked  Data  for  Libraries,”  560.   28.    Library  of  Congress,  Bibliographic  Framework  as  a  Web  of  Data:  Linked  Data  Model  and   Supporting  Services,  (Washington,  DC:  Library  of  Congress,  November  21  2012),   http://www.loc.gov/bibframe/pdf/marcld-­‐report-­‐11-­‐21-­‐2012.pdf.   29.    Barbara  Tillett,  “What  is  FRBR?  A  Conceptual  Model  for  the  Bibliographic  Universe,”  Library  of     LINKING  LIBRARIES  TO  THE  WEB  |  GONZALES       21     Congress,  2003,    http://www.loc.gov/cds/downloads/FRBR.PDF.   30.    “BIBFRAME  Frequently  Asked  Questions,”  Library  of  Congress,   http://www.loc.gov/bibframe/faqs/#q04.   31.    Ibid.   32.    Lindquist  et  al.,  “Using  Linked  Open  Data  to  Enhance  Subject  Access  in  Online  Primary   Sources,”  923.   33.    Alan  Danskin,  “Linked  and  Open  Data:  RDA  and  Bibliographic  Control.”  Italian  Journal  of   Library  &  Information  Science  4,  no.  1  (2013):  157,  http://dx.doi.org/10.4403/jlis.it-­‐5463.   34.    Erik  T.  Mitchell,  “Three  Case  Studies  in  Linked  Open  Data.”  Library  Technology  Reports  49,  no.  5   (2013):  26-­‐43.  http://www.alatechsource.org/taxonomy/term/106.   35.    Angela  Kroeger,  “The  Road  to  BIBFRAME:  The  Evolution  of  the  Idea  of  Bibliographic  Transition   into  a  Post  MARC  Future,”  Cataloging  &  Classification  Quarterly  51  (2013):  881,   http://dx.doi.org/10.1080/01639374.2013.823584.   36.    Jason  W.  Dean,  “Charles  A.  Cutter  and  Edward  Tufte:  Coming  to  a  Library  near  You,  via   BIBFRAME,”  In  the  Library  with  the  Lead  Pipe,  December  4,  2013,   http://www.inthelibrarywiththeleadpipe.org/2013/charles-­‐a-­‐cutter-­‐and-­‐edward-­‐tufte-­‐ coming-­‐to-­‐a-­‐library-­‐near-­‐you-­‐via-­‐bibframe/  .     37.    “BIBFRAME  Frequently  Asked  Questions,”  Library  of  Congress,   http://www.loc.gov/bibframe/faqs/#q04.   38.    Daniel  Chudnov,  “What  Linked  Data  Is  Missing,”  Computers  in  Libraries  31,  no.  8  (2011):  35-­‐ 36,http://www.infotoday.com/cilmag.   39.    Byrne  and  Goddard,  “The  Strongest  Link:  Libraries  and  Linked  Data.”   40.    Alemu  et  al.,  “Linked  Data  for  Libraries,”  557.   41.    Roy  Tennant,  “A  Bibliographic  Metadata  Infrastructure  for  the  Twenty-­‐First  Century,”  Library   Hi  Tech  22,  no.  2  (2004):  175-­‐181,  http://dx.doi.org/10.1108/07378830410524602.   42.    Alemu  et  al.,  “Linked  Data  for  Libraries,”  556.     43.    Breeding,  “Linked  Data.”   44.    Tennant,  “A  Bibliographic  Metadata  Infrastructure  for  the  Twenty-­‐First  Century.”   45.    Kroeger,  “The  Road  to  BIBFRAME,”  884-­‐885.   46.    Ibid.   47.    Tennant,  “A  Bibliographic  Metadata  Infrastructure  for  the  Twenty-­‐First  Century.”     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  DECEMBER  2014   22     48.    Byrne  and  Goddard,  “The  Strongest  Link:  Libraries  and  Linked  Data.   49.    Ibid.   50.    Alemu  et  al.,  “Linked  Data  for  Libraries.”   51.    Ibid.,  563.     52.    Chudnov,  “What  Linked  Data  is  Missing.”   53.    Ibid.   54.    Ibid.   55.    Timo  Borst,  Birgit  Fingerle,  and  Joachim  Neubert,  “How  Do  Libraries  Find  Their  Way  onto  the   Semantic  Web?”  Liber  Quarterly  19,  no  3/4  (2010):  336–43,   http://liber.library.uu.nl/index.php/lq/article/view/7970/8271.     56.    Ibid.   57.   Ibid.,  342-­‐343.   5650 ---- Microsoft Word - June_ITAL_Liu_final.docx A  Library  in  the  Palm  of  Your  Hand:   Mobile  Services  in  Top  100  University   Libraries     Yan  Quan  Liu  and     Sarah  Briggs     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  JUNE  2015             133   ABSTRACT   What  is  the  current  state  of  mobile  services  among  academic  libraries  of  the  country’s  top  100   universities,  and  what  are  the  best  practices  for  librarians  implementing  mobile  services  at  the   university  level?  Through  in-­‐depth  website  visits  and  survey  questionnaires,  the  authors  studied  each   of  the  top  100  universities’  libraries’  experiences  with  mobile  services.  Results  showed  that  all  of  these   libraries  offered  at  least  one  mobile  service,  and  the  majority  offered  multiple  services.  The  most   common  mobile  services  offered  were  mobile  sites,  text  messaging  services,  e-­‐books,  and  mobile   access  to  databases  and  the  catalog.  In  addition,  chat/IM  services,  social  media  accounts  and  apps   were  very  popular.    Survey  responses  also  indicated  a  trend  towards  responsive  design  for  websites  so   that  patrons  can  access  the  library’s  full  site  on  any  mobile  device.  Respondents  recommend  that   libraries  considering  offering  mobile  services  begin  as  soon  as  possible  as  patron  demand  for  these   services  is  expected  to  increase.   INTRODUCTION    Mobile  devices,  such  as  smart  phones,  tablets,  e-­‐book  readers,  handheld  gaming  tools  and   portable  music  players  are  practically  omnipresent  in  today’s  society.  According  to  Walsh  (2012),   “Mobile  data  traffic  in  2011  was  eight  times  the  size  of  the  global  internet  in  2000  and,  according   to  forecasts,  mobile  devices  will  soon  outnumber  human  beings”.1  Studies  have  revealed  that  use   of  mobile  devices  is  widespread  and  continues  to  increase.  As  of  2013,  56%  of  Americans  owned  a   smart  phone  (Smith  2013).  This  number  is  even  higher  among  people  ages  18  to  29.2  However,   Peters  (2011)  points  out  that  mobile  phones  at  least  can  be  found  among  people  of  all  ages,   nationalities  and  socioeconomic  classes.  He  writes,  “We  truly  are  in  the  midst  of  a  global  mobile   revolution.”3  In  2012,  the  ACRL  Research  Planning  and  Review  Committee  found  that  55%  of   undergraduates  have  smart  phones,  62%  have  iPods,  and  21%  have  some  kind  of  tablet.  Over  67%   of  these  students  use  their  devices  academically.4  Elmore  and  Stephens  (2012)  write,  “Academic   libraries  cannot  afford  to  ignore  this  growing  trend.  For  many  students  a  mobile  phone  is  no   longer  just  a  telephonic  device  but  a  handheld  information  retrieval  tool.”5       Yan  Quan  Liu  (liuy1@southernct.edu)  is  Professor  in  Information  and  Library  Science  at   Southern  Connecticut  State  University,  New  Haven,  CT,  and  Special  Hired  Professor  at  Tianjin   University  of  Technology,  Tianjin,  China.  Sarah  Briggs  (sjg.librarian@gmail.com)  is   Library/Media  Specialist  at  Jonathan  Law  High  School,  Milford,  CT.     A  LIBRARY  IN  THE  PALM  OF  YOUR  HAND:  MOBILE  SERVICES  IN  THE  TOP  100  UNIVERSITY  LIBRARIES  |     LIU  AND  BRIGGS  |  doi:  10.6017/ital.v34i2.5650   134   It  is  clear  from  these  studies  that  academic  libraries  can  expect  their  patrons  to  be  accessing  their   services  via  mobile  devices  in  growing  numbers  and  need  to  adapt  to  this  reality.  However,  the   sheer  number  of  mobile  devices  on  the  market  and  the  myriad  ways  libraries  could  offer  mobile   services  can  be  daunting.  Additionally,  offering  mobile  services  requires  investing  time,  money,   and  personnel.  In  order  to  give  libraries  a  starting  point,  this  paper  examines  the  current  status  of   mobile  services  in  the  United  States’  top  100  universities’  libraries  as  a  model,  specifically  what   services  are  being  offered,  what  are  they  being  used  for,  and  what  challenges  libraries  have   encountered  in  offering  mobile  services.  In  doing  so,  this  paper  attempts  to  answer  two  questions:   What  is  the  state  of  mobile  services  among  academic  libraries  of  the  country’s  top  ranked   universities,  and  what  can  the  experiences  of  these  libraries  teach  us  about  best  practices  for   mobile  services  at  the  university  level?     LITERATURE  REVIEW   Current  Status  of  Mobile  Services  in  Academic  Libraries   There  is  not  a  lot  of  data  regarding  the  prevalence  of  mobile  services  in  academic  libraries.  A  2010   study  found  that  35%  of  the  English  speaking  members  of  the  Association  of  Research  Libraries   had  a  mobile  website  for  either  the  university,  the  library,  or  both  (Canuel  and  Crichton  2010).6  A   study  of  Chinese  academic  libraries  revealed  that  only  12.8%  surveyed  had  a  section  of  their  web   pages  devoted  to  mobile  library  service  (Li  2013).7  In  2010,  Canuel  and  Crichton  found  that  13.7%   of  Association  of  Universities  and  Colleges  of  Canada  members  had  some  mobile  services,   including  websites  and  apps.8  In  the  United  States,  a  2010  survey  found  that  44%  of  academic   libraries  offered  some  type  of  mobile  service.  39%  had  a  mobile  website,  and  36%  had  a  mobile   version  of  the  library’s  catalog.  Half  of  libraries  which  did  not  offer  mobile  services  were  in  the   planning  process  for  creating  a  mobile  website,  catalog,  and  text  notifications.  Additionally,  40%   planned  on  implementing  SMS  reference  services,  and  54%  wanted  the  ability  to  access  library   databases  on  mobile  devices  (Thomas  2010).9  However,  it  is  widely  assumed  that  mobile  services   will  expand  rapidly  in  the  future  (Canuel  and  Crichton  2010).10  More  recently,  a  2012  survey  of   academic  libraries  in  the  Pacific  Northwest  found  that  50%  had  a  mobile  version  of  the  library’s   website  and/or  catalog,  40%  used  QR  codes,  38%  had  a  text  messaging  service,  and  18%  replied   “other”  with  mobile  interfaces  for  databases  being  a  popular  offering.  However,  31%  of  survey   respondents  still  did  not  have  any  mobile  services  (Ashford  and  Zeigen  2012).11  Osika  and   Kaufman  (2012)  surveyed  community  and  junior  colleges  nationwide  to  determine  what  mobile   services  were  being  offered.  73%  offered  mobile  catalog  access,  62%  offered  vendor  database   apps,  two  were  creating  a  mobile  app  for  the  library,  and  14.7%  had  a  mobile  library  website.12         Definition  and  Types  of  Mobile  Services   Although  there  are  dozens  of  different  mobile  devices  on  the  market,  La  Counte  (2013)  aptly  and   succinctly  defines  them  as  follows:  “The  reality  is  that  mobile  devices  can  refer  to  essentially  any   device  that  someone  uses  on  the  go”  (vi).13  Smart  phones,  netbooks,  tablet  computers,  e-­‐readers,     INFORMATION  TECHNOLOGIES  AND  LIBRARIES  |  JUNE  2015   135   gaming  devices  and  iPods  are  examples  of  mobile  devices  that  are  now  commonplace  on  college   campuses.  Barnhart  and  Pierce  (2012)  define  these  devices  as  “…networked,  portable,  and   handheld…”14  Additionally,  these  devices  may  be  used  to  read,  listen  to  music,  and  watch  videos   (West,  Hafner  and  Faust  2006).15  According  to  Lippincott  (2008),  libraries  should  consider  all   their  patron  groups  as  potential  mobile  library  users,  including  faculty,  distance  education   students,  on-­‐campus  students,  students  placed  in  internships  or  doing  other  kinds  of  fieldwork,   and  students  using  mobile  devices  to  work  on  collaborative  projects  outside  of  school.16     The  most  common  mobile  services  discussed  in  the  literature  are  mobile-­‐friendly  websites  or  apps,   mobile-­‐friendly  access  to  the  library’s  catalog  and  databases,  text  messaging  services,  QR  codes,   augmented  reality,  e-­‐books,  and  information  literacy  instruction  facilitated  by  mobile  devices.   These  services  fall  into  one  of  two  categories:  traditional  library  services  amended  to  be  available   with  mobile  devices  and  services  created  specifically  for  mobile  devices.     Common  library  services  that  have  been  updated  to  be  mobile-­‐friendly  include  a  mobile  website   (either  as  a  mobile  version  of  the  library’s  regular  site,  an  app,  or  both),  mobile-­‐friendly  interfaces   for  the  library’s  catalog  and  databases,  access  to  books  in  electronic  format,  and  information   literacy  instruction  which  makes  use  of  mobile  devices.  Regarding  mobile  websites  and  apps,   Walsh  (2012)  writes,     “If  a  well-­‐designed  app  is  like  a  top-­‐end  sports  car,  a  mobile  website  is  more  like  a  family  run-­‐ around.  It  may  not  be  as  good  looking,  but  it  is  likely  to  be  cheaper,  easier  to  run  and   accessible  to  more  people.”17     It  is  not  feasible  to  replicate  the  entire  website  in  a  mobile  version,  so  libraries  must  know  what   patrons  find  most  important  and  address  that  information  through  the  mobile  site  (Walsh  2012).18   According  to  a  2012  survey  of  academic  libraries  in  the  Pacific  Northwest,  the  most  popular  types   of  information  found  on  mobile  websites  are  links  to  the  catalog,  a  way  to  contact  a  librarian,  links   to  databases,  and  hours  of  operation  (Ashford  and  Zeigen  2012).19  Many  libraries  are  also   providing  mobile  access  to  their  catalogs  and  databases.  This  is  sometimes  difficult  because  often   third-­‐party  vendors  are  responsible  for  the  catalogs  and/or  databases,  and  libraries  must  rely  on   these  vendors  to  provide  mobile  access  (Iglesias  and  Meesangnil  2011).20  However,  many  vendors   already  offer  mobile-­‐friendly  interfaces;  libraries  must  be  aware  when  this  is  the  case  and  provide   links  to  these  interfaces.  When  a  vendor  does  not  provide  a  mobile-­‐friendly  interface,  the  library   should  encourage  the  vendor  to  do  so  (Bishoff  2013,  p.  118).21     There  is  a  growing  expectation  that  libraries  will  provide  e-­‐books  to  patrons  as  e-­‐books  become   increasingly  popular.  Walsh  (2012)  states  that  the  proportion  of  adults  in  the  United  States  who   own  an  e-­‐book  reader  doubled  between  November  2010  and  May  2011.22  According  to  Bischoff,   Ruth,  and  Rawlins  (2013),  29%  of  Americans  owned  a  tablet  or  e-­‐reader  as  of  January  2012.23  This   has  presented  challenges  for  libraries,  mainly  in  two  areas:  format  and  licensing.  There  is  risk   involved  in  choosing  a  format  that  will  only  work  with  one  product,  i.e.  a  Nook  or  a  Kindle,     A  LIBRARY  IN  THE  PALM  OF  YOUR  HAND:  MOBILE  SERVICES  IN  THE  TOP  100  UNIVERSITY  LIBRARIES  |     LIU  AND  BRIGGS  |  doi:  10.6017/ital.v34i2.5650   136   because  not  every  patron  will  own  the  same  device,  and  ultimately  one  device  might  become  the   most  popular,  rendering  books  purchased  for  other  devices  obsolete.  On  the  other  hand,  formats   that  work  with  multiple  devices  tend  to  have  only  basic  functionality  and  do  not  provide  an  ideal   user  experience  (Walsh  2012).24  Walsh  (2012)  recommends  EPUB,  which  works  well  with  many   different  devices,  is  free,  and  supports  the  addition  of  a  digital  rights  management  layer.25   Licensing  is  also  an  issue  as  libraries  and  publishers  strive  to  find  a  method  of  loaning  e-­‐books   amenable  to  both.  No  one  model  has  emerged  which  is  mutually  satisfactory  (Walsh  2012).26             Libraries  are  increasingly  integrating  mobile  technologies  into  information  literacy  instruction   and  other  forms  of  instruction.  For  example,  services  such  as  Skype  and  FaceTime,  which  Walsh   (2012)  describes  as  “a  window  to  another  world”  (p.  105),  can  be  used  for  distance  learning,   including  reference  and  instruction.27  When  interactions  do  not  need  to  take  place  live,  many   mobile  devices  have  the  capability  to  take  pictures,  record  video,  and  record  audio  (Walsh  2012,  p.   97).28  This  allows  class  events,  including  lectures  and  discussions,  to  be  broadcast  to  people  and   spaces  beyond  the  physical  classroom.  Walsh  (2012)  notes  that,  when  constructing  podcasts  or   vodcasts,  it  is  important  to  make  mobile-­‐friendly  versions  of  these  available,  bearing  in  mind   different  platforms  and  screen  sizes  people  might  be  using  to  access  the  content.29      Text  messaging,  QR  codes,  and  augmented  reality  are  examples  of  library  services  that  were   created  expressly  for  mobile  devices.  Text  messaging  in  particular  has  become  a  very  popular   mobile  service  offering;  as  Thomas  and  Murphy  (2009)  write,  “Interacting  with  patrons  through   text  messaging  now  ranks  among  core  competencies  for  librarians  because  SMS  increasingly   comprises  a  central  channel  for  communicating  library  information.”30  A  common  use  of  text   messaging  is  a  ‘text  a  librarian’  service.  Walsh  (2012)  recommends  launching  such  a  service  even   if  the  library  currently  offers  no  other  mobile  services,  noting,  “It  can  be  quick,  easy  and  cheap  to   introduce  such  a  service  and  it  is  an  ideal  entry  into  the  world  of  providing  services  via  mobile   devices”  (p.  45).31  Peters  (2011)  points  out  that  the  shorter  the  turnaround  time  (he  recommends   less  than  ten  minutes)  the  better.  He  notes  that  many  questions  arise  as  the  result  of  a  situation   the  questioner  is  currently  in.  He  writes,  “If  you  do  not  respond  in  a  matter  of  minutes,  not  hours,   the  context  will  be  lost  and  the  need  will  be  diminished  or  satisfied  in  other  ways.”32   QR  codes  have  become  popular  in  libraries  offering  mobile  services.  QR  codes  encode  information   in  two  dimensions  (vertically  and  horizontally),  and  thus  can  provide  more  information  than  a   barcode.  The  applications  necessary  for  using  QR  codes  are  usually  free,  and  they  can  be  read  by   most  mobile  devices  with  cameras  (Little  2011).33  The  most  common  uses  of  QR  codes  in   academic  libraries,  according  to  Elmore  and  Stephens  (2012),  are  linking  to  the  library’s  mobile   website  and  social  media  pages,  searching  the  library  catalog,  viewing  a  video  or  accessing  a  music   file,  reserving  a  study  room,  and  taking  a  virtual  tour  of  the  library  facilities.34     Augmented  reality  may  not  currently  be  used  as  often  in  libraries  as  other  services  such  as  mobile   sites  and  text  messaging,  but  many  libraries  are  finding  unique  and  compelling  ways  to  use  AR.  AR   applications  link  the  physical  with  the  digital,  are  interactive  in  real  time,  and  are  registered  in  3-­‐D.     INFORMATION  TECHNOLOGIES  AND  LIBRARIES  |  JUNE  2015   137   Hahn  (2012)  defines  AR  as  follows:  “In  order  to  be  considered  a  truly  augmented  reality   application,  an  app  must  interactively  attach  graphics  or  data  to  objects  in  real  time,  to  achieve  the   real  and  virtual  combination  of  graphics  into  the  physical  environment.”35  He  notes  that  such   applications  are  excellent  additions  to  libraries’  mobile  services  because  they  connect  physical  and   digital  worlds,  much  like  libraries.36  One  example  of  augmented  reality  is  North  Carolina  State   University’s  WolfWalk,  which  is  advertised  as  “…a  historical  walking  tour  of  the  NC  State  campus   using  the  location-­‐aware  campus  map”  (NCSU  Libraries).37  To  create  the  tour,  the  NCSU  Libraries   Special  Collections  Research  Center  provided  over  one  thousand  photographs  of  the  campus  from   the  19th  century  to  the  present  (NCSU  Libraries).38     RESEARCH  DESIGN   To  make  sure  the  information  gathered  was  current  and  valid,  this  study  employed  two   approaches,  website  visits  and  survey  investigation,  to  determine  the  state  of  mobile  services  at   the  top  100  universities’  libraries.  The  website  visits  explored  what  mobile  services  are  being   offered  and  how  they  are  being  offered  at  these  university  libraries.  The  survey  sent  via  email   inquired  how  they  are  providing  mobile  services  in  their  libraries  and  what  their  results  have   been  regarding  challenges,  successes,  and  best  practices.  The  survey  data  was  analyzed  and   compared  to  the  data  obtained  via  website  exploration  to  form  a  more  comprehensive  picture  of   mobile  services  at  these  universities.   PARTICIPANTS   University  libraries'  patrons  are  frequent  users  of  mobile  technology.  According  to  Osika  and   Kaufman  (2012),  studies  have  found  that  45%  of  18  to  29-­‐year-­‐olds  who  have  internet-­‐capable   cell  phones  do  most  of  their  browsing  on  their  devices.  39  Kostruski  and  Skornia  (2011)  note  that   people  of  this  age  group  are  “…leaders  in  mobile  communication…the  traditional  college-­‐age   student.”40  As  the  nation’s  leaders  in  undergraduate  and  graduate  programs  and  academic   research,  an  examination  of  the  status  of  the  top  100  university  libraries'  mobile  services  can   provide  useful  service  patterns  and  a  benchmark  for  the  service  improvements  that  would  benefit   academic  programs.  Based  on  the  U.S.  News  &  World  Report's  national  university  rankings,  this   study  selected  the  top  100  universities  in  the  2014  rankings.41     PROCEDURE   Website  visits  as  the  first  step  were  conducted  from  March  2,  2014  to  March  16,  2014.  Each   library’s  home  page  was  carefully  examined  for  the  most  common  mobile  services  named  in  the   literature  with  these  categorized  items:  1)  a  mobile  website  or  app,  2)  mobile  access  to  the   library’s  catalog  and  databases,  3)  text  messaging  services,  4)  QR  codes,  5)  augmented  reality,  and   6)  e-­‐books.  To  assess  each  site,  we  first  visited  the  site  via  a  Nexus  7  to  see  if  it  had  a  mobile   version.  Next,  we  viewed  each  library’s  full  site  on  a  laptop  computer.  We  browsed  through  each   page  of  the  site  looking  for  mention  or  use  of  each  said  categorization.  We  also  searched  for  these     A  LIBRARY  IN  THE  PALM  OF  YOUR  HAND:  MOBILE  SERVICES  IN  THE  TOP  100  UNIVERSITY  LIBRARIES  |     LIU  AND  BRIGGS  |  doi:  10.6017/ital.v34i2.5650   138   items  via  the  library’s  site  map  or  site  search  functions  whenever  available.  The  results  were   tabulated  with  a  codebook  in  the  established  categorization  through  Microsoft  Excel.     Although  the  website  visits  place  great  value  on  gathering  quantitative  data  about  what  mobile   services  are  offered  at  these  libraries,  this  method  has  its  limitations.  Firstly,  it  locates  only  those   mobile  services  that  appear  on  a  library’s  website,  but  services  the  library  provides  which  are  not   mentioned  on  the  website  can  be  overlooked.  Also,  the  use  of  mobile  devices  or  services  in  library   instruction,  a  very  commonly  mentioned  mobile  service  in  the  literature,  cannot  generally  be   determined  via  a  website  visit.  In  addition,  the  website  visit  provides  only  a  snapshot  of  the   current  state  of  mobile  services;  university  libraries  may  be  planning  to  implement  or  even  be  in   the  process  of  implementing  mobile  services.  Lastly,  website  visits  evaluate  what  is  publicly   available,  but  it  is  not  possible  to  access  password-­‐protected  information  meant  only  for  a   university’s  students  and  faculty  to  assess  mobile  content.  To  address  these  shortcomings,  we   created  a  survey  using  SurveyMonkey  to  complement  the  data  supplied  from  the  website  visits.   We  sent  out  the  survey  via  email  to  each  of  the  top  100  universities’  libraries.    The  survey  was   conducted  from  April  10,  2014,  to  April  24,  2014.     RESULTS  AND  ANALYSIS   Study  results  presented  compelling  evidence  that  mobile  services  are  already  ubiquitous  among   the  country's  top  universities.  The  most  recognized  ones  are  mobile  sites,  mobile  apps,  mobile   OPACs,  mobile  access  to  databases,  text  messaging  services,  QR  codes,  augmented  reality,  and  e-­‐ books.  These  service  forms  confirm  those  commonly  named  in  the  literature  as  library  mobile   services.   What  basic  types  of  mobile  services  do  the  libraries  provide?   The  results  showed  all  of  the  libraries  offered  one  or  more  of  the  specific  mobile  services  in  Chart   1  with  multiple  entries  allowed,  presenting  modernized  new  service  patterns  the  university   libraries  provide  to  meet  the  needs  and  demands  of  university  communities  in  this  digital  era.     INFORMATION  TECHNOLOGIES  AND  LIBRARIES  |  JUNE  2015   139     Chart  1.  Percentage  of  Libraries  Offering  Specific  Mobile  Services  (Multiple  Entries  Allowed).   It  is  clear  from  both  the  survey  results  and  the  website  visits  that  almost  all  libraries  at  the  top   100  universities  are  offering  multiple  mobile  services,  with  mobile  websites,  mobile  access  to  the   library’s  catalog,  mobile  access  to  the  library’s  databases,  e-­‐books,  and  text  messaging  services   being  the  most  common.  QR  codes  and  especially  augmented  reality  are  not  as  common.     Of  the  eight  main  mobile  services  we  looked  for  via  the  website  visits  and  survey  (mobile  site,   mobile  app  for  the  site,  mobile  OPAC,  mobile  access  to  databases,  text  messaging,  QR  codes,   augmented  reality,  and  e-­‐books),  all  libraries  surveyed  offer  between  one  and  seven  of  these   services.  No  universities  have  none  of  these  services,  and  no  universities  have  all  of  these  services.   Only  one  university  has  one  service,  none  have  two,  seven  have  three,  thirteen  have  four,  twenty-­‐ four  have  five,  forty-­‐six  have  six,  and  eight  have  seven.  To  make  this  information  easy  to  read,  we   summarized  it  in  Table  1  below.   Number  of  mobile   services  offered   Number  of   libraries   Percentage   of  libraries   No  mobile  services   0   0%   1  mobile  service   1   1%   2  mobile  services   0   0%   3  mobile  services   7   7%   4  mobile  services   13   13%   5  mobile  services   24   24%   6  mobile  services   46   46%   7  mobile  services   8   8%   8  mobile  services   0   0%   Table  1.  Number  of  mobile  services  offered.   5.0%   29.2%   58.7%   77.2%   81.6%   81.7%   88.0%   92.6%   Augmented  reality   Mobile  app  for  site   QR  codes   Text  messaging   Mobile  website   Mobile  databases   Mobile  OPAC   E-­‐books   Percentage  of  Libraries  Offering  SpeciMic  Mobile   Services       A  LIBRARY  IN  THE  PALM  OF  YOUR  HAND:  MOBILE  SERVICES  IN  THE  TOP  100  UNIVERSITY  LIBRARIES  |     LIU  AND  BRIGGS  |  doi:  10.6017/ital.v34i2.5650   140   Such  a  data  pattern  demonstrates  not  only  that  mobile  services  are  very  widespread  at  these   universities’  libraries,  but  also  that  the  vast  majority  of  these  libraries  offer  multiple  mobile   services.  In  other  words,  libraries  do  not  appear  to  be  offering  mobile  services  in  isolation;  they   have  taken  several  of  their  most  popular  services  (such  as  websites,  reference,  and  search   functions)  and  mobilized  all  of  them.  In  fact,  the  average  number  of  mobile  services  offered  among   the  eight  services  we  examined  is  5.31.      Although  results  collected  from  the  two  research  methods  (website  visits  and  survey)  are  almost   identical  for  mobile  websites  and  mobile  OPACs  and  are  very  comparable  for  text  messaging,  QR   codes,  and  augmented  reality  there  is  a  bit  of  a  gap  between  results  from  the  website  visits  and  the   survey  regarding  mobile  databases  (92.9%  vs.  70.59%),  but  perhaps  libraries  that  responded  to   the  survey  just  happened  to  offer  mobile  access  to  databases  less  often  than  all  the  libraries  in   general.      It  is  interesting  that  we  located  e-­‐books  on  100%  of  the  websites  we  visited,  but  only  85.29%  of   respondents  mention  offering  them.  Perhaps  this  discrepancy  can  be  explained  by  a  clarification  in   terms.  We  looked  for  the  presence  of  books  in  electronic  format  that  could  be  accessed  online.   Perhaps  survey  respondents  only  considered  e-­‐books  specifically  formatted  for  smart  phones  or   tablets  as  a  mobile  service.  Also,  later  in  the  survey  several  respondents  mention  communication   issues  as  an  ongoing  challenge  in  offering  mobile  services,  specifically,  not  always  knowing  what   other  library  departments  are  offering  in  terms  of  mobile  services.  It  is  possible  that  some  survey   respondents  are  not  responsible  for  the  e-­‐book  collection  and  thus  did  not  mention  it  as  a  mobile   service.     Another  discrepancy  exists  between  the  results  for  mobile  apps  for  the  library’s  site  (20.2%  for   the  website  visits  versus  38.24%  for  the  survey).  These  results  indicate  that  mobile  apps  for   libraries’  sites  are  more  common  than  we  had  previously  thought.  Perhaps  these  apps  are  being   advertised  in  places  other  than  on  the  library’s  website,  and  therefore  a  website  visit  is  not  the   best  way  to  discover  them.     The  website  visits  did  not  look  for  mobile  library  instruction,  mobile  book  renewal,  or  mobile   interlibrary  loan,  but  through  our  website  visits  we  saw  these  services  mentioned  several  times   and  thus  included  them  in  the  survey.  They  turned  out  to  be  somewhat  common  among  libraries   surveyed;  41.18%  of  respondents  offer  mobile  book  renewal,  20.59%  offer  mobile  interlibrary   loan,  and  32.35%  offer  mobile-­‐friendly  library  instruction.     Table  2  below  compares  the  data  collected  from  both  the  website  visits  and  the  survey  among   these  100  universities,  ranking  from  high  to  low  percentages.  In  most  cases,  they  are  very  similar.         INFORMATION  TECHNOLOGIES  AND  LIBRARIES  |  JUNE  2015   141     Mobile  Services   Percentage  of   libraries  offering   service  (Website   Visits)   Percentage  of   libraries  offering   service  (Survey)   E-­‐books   100%   85.29%   Mobile  databases   92.90%   70.59%   Mobile  OPAC   87.80%   88.24%   Mobile  website   80.80%   82.35%   Text  messaging   80.80%   73.53%   QR  codes   61.60%   55.88%   Mobile  app  for  site   20.20%   38.24%   Augmented  reality   7.00%   2.94%   Table  2.  Data  Comparison  of  Specific  Mobile  Services  between  Website  Visits  &  Survey.   What content do the mobile sites offer? In addition to assessing whether libraries had a mobile site, the survey asked libraries that already have a mobile site what is included on the site. 100% of libraries with mobile sites include library hours on their site, making this the most common feature. The next two most common features are library contact information and a search function for the catalog, which both received 96.67%. Searching within mobile-friendly databases , such as EBSCOhost Mobile, JSTOR and PubMed, is the next most popular feature, although it trailed a little behind library hours, contact information, and catalog searching at 70%. Book renewal received 56.67%, and access to patron accounts received 53.33%. Interlibrary loan is the least common feature by far, offered by only 26.67% of respondents. This information is summarized in Chart 2 below: Chart  2.  Components  of  Libraries’  Mobile  Sites.   26.67%   53.33%   56.67%   70.00%   96.67%   96.67%   Interlibrary  loan   Access  to  patron  accounts   Book  renewal   Search  the  databases   Library  contact  information   Search  the  catalog   Components  of  Libraries'  Mobile  Sites     A  LIBRARY  IN  THE  PALM  OF  YOUR  HAND:  MOBILE  SERVICES  IN  THE  TOP  100  UNIVERSITY  LIBRARIES  |     LIU  AND  BRIGGS  |  doi:  10.6017/ital.v34i2.5650   142   These  results  are  interesting  as,  overall,  they  reflect  higher  percentages  for  specific  mobile   services  than  question  1  on  the  survey,  which  asked  which  mobile  services  libraries  offer.  For   example,  in  question  1,  88.24%  of  respondents  offer  mobile  access  to  the  library’s  catalog,   whereas  for  libraries  with  mobile  sites,  96.67%  offer  access  to  the  catalog  on  the  mobile  site.  The   ability  to  search  mobile-­‐friendly  versions  of  databases  the  library  subscribes  to  was  almost  the   same  for  both  groups,  with  70.59%  of  respondents  to  question  1  offering  this  and  70%  of   respondents  having  this  as  a  component  of  their  mobile  sites.  Mobile  book  renewal  is  much  more   common  among  libraries  with  mobile  sites;  56.67%  of  respondents  with  mobile  sites  compared  to   41.18%  of  total  respondents.  A  slightly  higher  percentage  of  respondents  with  mobile  sites  offer   mobile  interlibrary  loan  (26.67%)  compared  to  all  respondents  (20.59%).  This  data  suggests  that,   on  the  whole,  libraries  with  mobile  sites  are  more  likely  to  offer  other  mobile  services  as  well,   specifically  mobile  access  to  the  catalog,  mobile  book  renewal,  and  mobile  interlibrary  loan.     What  mobile  reference  services  do  libraries  provide?   The  survey  also  looked  for  information  on  virtual  and/or  mobile  reference  services.  81.25%  of   survey  respondents  offer  text/SMS  messaging,  100%  offer  chat/IM,  and  21.88%  offer  reference   services  via  a  social  media  account.  These  results  showing  popular  reference  services  in  these  top   universities  are  summarized  in  Chart  3  below:   Chart  3.  Popular  Mobile  Reference  Services.   Chat/IM  is  obviously  the  most  popular  method  of  providing  virtual/mobile  reference  services;  all   survey  respondents  offer  this  service.  Text/SMS  is  also  very  popular,  indicating  that  the  majority   of  libraries  see  value  in  providing  both  despite  their  similar  functions.  The  fact  that  social  media   does  not  compare  favorably  to  either  texting  or  chat/IM  services  is  curious  because  most  social   media  platforms  have  a  mobile  version  available  that  libraries  can  take  advantage  of  for  free.   However,  this  may  not  be  the  best  medium  for  reference.  One  respondent  commented  on  this   question,  “Our  ‘Ask  a  Librarian’  service  is  available  from  desktop  Facebook,  but  not  on  mobile   Facebook.”     22%   81%   100%   Social  media   Text/SMS   Chat/IM   Popular  Virtual/Mobile  Reference  Services     INFORMATION  TECHNOLOGIES  AND  LIBRARIES  |  JUNE  2015   143   What  apps  do  libraries  use  or  provide  for  patrons?   Although  the  website  visits  and  survey  results  indicated  that  apps  for  a  library’s  site  are  not  very   common,  both  tools  revealed  that  use  of  apps  for  various  purposes  is  widespread.  The  most   commonly  mentioned  app  is  BrowZine,  which  is  used  for  accessing  e-­‐journals.  Several   respondents  mentioned  apps  developed  in-­‐house  for  using  library  services,  such  as  an  app  for   reserving  a  study  room,  accessing  university  archives,  and  sending  catalog  records  to  a  mobile   device.  Another  respondent  stated  that  the  university’s  app  has  a  library  function.  Several   respondents  mentioned  vendor-­‐provided  or  third-­‐party  apps,  such  as  apps  for  accessing  PubMed,   ScienceDirect,  Naxos  Music  Library,  AccessMyLibrary  (for  Gale  resources),  a  mobile  medical   dictionary,  and  the  American  Chemical  Society.  One  respondent  noted  that  the  library  loans  iPads   preloaded  with  popular  apps  to  support  student  research  such  as  EndNote,  Notability,   GoodReader,  Pages,  Numbers,  and  Keynote,  among  others.  Finally,  these  apps  were  named  at  least   once  as  an  app  libraries  either  use  or  provide  access  to:  iResearch  (for  storing  articles  locally),   Boopsie  (for  building  a  library  mobile  app),  ebrary  (for  accessing  e-­‐books),  and  Safari  (for   accessing  books  and  videos  online).  These  results  indicate  that  the  use  of  apps  is  fairly  robust  and   diverse  among  these  libraries.  Additionally,  from  these  results,  it  seems  more  common  for   libraries  to  use  and/or  provide  apps  created  by  third  parties  than  to  develop  an  in-­‐house  app,   perhaps  due  to  the  expertise  and  expense  involved  in  creating  and  maintaining  an  app.     What  mobile  services  will  be  added  in  the  future?   The  final  question  of  the  survey  asks  libraries  if  there  are  any  plans  to  offer  a  mobile  service  not   currently  provided.  Responses  are  summarized  in  Chart  4  below.     Chart  4.  Percentage  of  the  Libraries  Seeking  to  Add  Specific  Mobile  Services   The  most  common  selection  is  mobile  friendly  library  instruction,  with  61.54%.  The  next  most   common  is  a  mobile  website  (46.15%).  Mobile  interlibrary  loan  was  chosen  by  38.46%  of   8%   8%   8%   15%   15%   15%   38%   46%   62%   Text  messaging  services   QR  codes   Mobile  app(s)   E-­‐books   Augmented  reality   Mobile  OPAC   Mobile  databases   Mobile  book  renewal   Mobile  interlibrary  loan   Mobile  website   Mobile  library  instruction   Planned  Mobile  Services  Additions     A  LIBRARY  IN  THE  PALM  OF  YOUR  HAND:  MOBILE  SERVICES  IN  THE  TOP  100  UNIVERSITY  LIBRARIES  |     LIU  AND  BRIGGS  |  doi:  10.6017/ital.v34i2.5650   144   respondents.  Less  common  services  planned  include  adding  mobile  access  to  the  library’s  OPAC,   mobile  access  to  the  library’s  databases,  and  mobile  book  renewal,  each  of  which  were  chosen  by   15.38%  of  respondents.  7.69%  of  respondents  are  planning  to  add  mobile  apps,  e-­‐books,  and   augmented  reality,  respectively.  No  one  indicated  plans  to  add  text  messaging  services  or  QR   codes.  These  results  indicate  that  libraries  expect  demand  for  traditional  library  services  in  a   mobile-­‐friendly  format  to  continue  to  expand;  mobile  friendly  library  instruction  was  only  offered   by  32.35%  of  respondents,  yet  61.54%  have  plans  to  offer  this  service  in  the  future.  Mobile   interlibrary  loan  is  currently  offered  by  20.59%  of  respondents,  so  the  fact  that  38.46%  would  like   to  add  it  represents  a  significant  change.     Not  surprisingly,  mobile  websites  are  likely  to  remain  a  very  popular  mobile  service.  The  fact  that   82.35%  of  respondents  already  have  a  mobile  website  and  46.15%  who  do  not  have  one  wish  to   add  one  in  the  near  future  means  that  mobile-­‐friendly  sites  are  well  on  their  way  to  becoming   ubiquitous,  at  least  among  libraries  at  the  top  100  universities,  and  may  reasonably  be  expected  to   take  their  place  among  websites  in  general  as  a  necessity  to  maintain  institutional  viability.   Additionally,  several  respondents  mentioned  moving  towards  responsive  design,  in  which  their   websites  are  fully  functional  regardless  of  whether  they  are  accessed  on  mobile  devices  or   desktops.   What  are  challenges  and  strategies  for  offering  mobile  services?   In  addition  to  looking  for  the  presence  or  absence  of  mobile  services  being  offered  at  top  100   university  libraries,  the  survey  also  examined  libraries’  experiences  in  implementing  mobile   services,  including  challenges,  successes,  and  best  practices.  Several  themes  emerged  in  response   to  these  questions.  The  most  common  challenge  among  respondents  was  having  the  time,   expertise,  staffing  and  money  to  support  mobile  services,  especially  apps  and  mobile  sites.  To   solve  this  problem,  respondents  mention  relying  on  vendors  and  third-­‐party  providers  supplying   apps  to  access  their  resources,  but  this  does  not  give  libraries  the  flexibility  and  specificity  of  an   in-­‐house  app.     Another  common  challenge  mentioned  by  several  respondents  involved  technical  issues,  such  as   difficulties  with  off  campus  access  to  resources  via  a  proxy  server  and  compatibility  issues  among   different  browsers  and  especially  different  devices.  A  lack  of  communication  and/or  support  is   another  issue  for  libraries.  One  respondent  reported  a  lack  of  support  from  the  campus  computing   center  for  mobile  services.  One  respondent  discussed  the  difficulty  of  having  a  coordinated  mobile   effort  when  the  library  has  a  large  number  of  departments,  and  each  department  may  or  may  not   be  aware  of  what  the  others  are  doing  in  regards  to  mobile  services.  Survey  results  revealed  that   few  libraries  have  policies  in  place  to  support  mobile  services.     Coming  up  with  a  specific  plan  for  implementing  such  services  can  help  libraries  work  towards   promoting  effective  communication  and  garnering  support.  One  respondent  wrote,  “The  biggest   challenges  have  been:  (1)  developing  a  strategy  (2)  developing  a  service  model  (3)  having  a   systematic  model  for  managing  content  for  both  mobile-­‐  and  non-­‐mobile  applications.  We've  had     INFORMATION  TECHNOLOGIES  AND  LIBRARIES  |  JUNE  2015   145   success  with  the  first  two  and  are  making  great  progress  on  the  third.”  Interestingly,  several   respondents  noted  that  underuse  is  an  issue  for  some  services.  One  respondent  mentioned  that   QR  codes  are  not  used  often,  and  another  mentioned  that  the  library’s  text-­‐a-­‐librarian  service  is   much  underutilized.  Several  respondents  cited  the  need  to  market  mobile  services  as  an  antidote   to  this  problem.  Seeking  regular  feedback  from  the  user  community  regarding  mobile  services   wants  and  needs  is  another  recommended  solution.   Other  issues  include  the  fact  that  not  all  library  services  are  mobilized.  However,  libraries  are   actively  looking  for  solutions  for  this.  There  is  a  trend  among  respondents  towards  developing  a   site  that  is  responsive  to  all  devices,  including  desktops,  laptops,  tablets,  and  phones.  This  will  take   the  place  of  a  separate  mobile  site.  As  one  respondent  states,  “At  the  moment,  our  library  mobile   website  only  has  a  fraction  of  the  services  available  via  our  desktop  website.  We  are  in  the  process   of  moving  everything  to  responsive  design,  with  the  expectation  that  all  services  will  be  equally   available  in  mobile  and  desktop.”  In  reading  through  these  responses,  one  message  is  clear:  mobile   services  are  a  must.  Several  respondents  noted  that  demand  for  mobile  services  is  growing,  with   one  writing,  “Get  started  as  soon  as  possible.  Our  analytics  show  that  mobile  use  is  continuing  to   increase.”   CONCLUSION   This  study  confirms  that  as  of  spring  2014  mobile  services  are  already  ubiquitous  among  the   country’s  top  100  universities’  libraries  and  are  likely  to  continue  to  grow.  Where  the  most   common  services  offered  are  e-­‐books,  chat/IM,  mobile  access  to  databases,  mobile  access  to  the   library  catalog,  mobile  sites,  and  text  messaging  services,  there  is  a  trend  towards  responsive   design  for  websites  so  that  patrons  can  access  the  library’s  full  site  on  any  mobile  device.       The  experiences  of  these  libraries  demonstrate  the  value  of  creating  a  plan  for  providing  mobile   services,  allotting  the  appropriate  amount  of  staffing,  time,  and  funding,  communicating  among   departments  and  stakeholders  to  coordinate  mobile  efforts,  marketing  services,  and  regularly   seeking  patron  feedback.  However,  there  is  no  one  approach  to  offering  mobile  services,  and  each   library  must  do  what  works  best  for  its  patrons.   REFERENCES     1.     Andrew  Walsh,  Using  Mobile  Technology  to  Deliver  Library  Services  (Maryland:  Scarecrow   Press,  2012),  xiv.   2.     “Smartphone  Ownership  2013,”  last  modified  June  5,  2013,   http://www.pewinternet.org/2013/06/05/smartphone-­‐ownership-­‐2013/.   3.     Thomas  A.  Peters,  “Left  to  Their  Own  Devices:  The  Future  of  Reference  Services  on  Personal,   Portable  Information,  Communication,  and  Entertainment  Devices,”  Reference  Librarian  52   (2011):  88-­‐97,  doi:10.1080/02763877.2011.520110.     A  LIBRARY  IN  THE  PALM  OF  YOUR  HAND:  MOBILE  SERVICES  IN  THE  TOP  100  UNIVERSITY  LIBRARIES  |     LIU  AND  BRIGGS  |  doi:  10.6017/ital.v34i2.5650   146     4.     ACRL  Research  Planning  and  Review  Committee,  “Top  Ten  Trends  in  Academic  Libraries,  “   College  &  Research  Libraries  News  73  (2012):  311-­‐320.   5.     Lauren  Elmore  and  Derek  Stephens,  “The  Application  of  QR  Codes  in  UK  Academic  Libraries,”   New  Review  of  Academic  Librarianship  18  (2012):26-­‐42,  doi:10.1080/13614533.2012.654679.   6.     Robin  Canuel  and  Chad  Crichton,  “Canadian  Academic  Libraries  and  the  Mobile  Web,”  New   Library  World  112  (2011):  107-­‐120,  doi:10.1108/03074801111117014.   7.     Aiguo  Li,  “Mobile  Library  Services  in  Key  Chinese  Academic  Libraries,”  Journal  of  Academic   Librarianship  39  (2013):  223-­‐226,  doi:10.1016/j.acalib.2013.01.009.   8.     Robin  Canuel  and  Chad  Crichton,  “Canadian  Academic  Libraries,”  107-­‐120.   9.     Lisa  Carlucci  Thomas,  “Gone  Mobile?  (Mobile  Libraries  Survey  2010),”  Library  Journal  135   (2010):  30-­‐34.   10.    Robin  Canuel  and  Chad  Crichton,  “Canadian  Academic  Libraries,”  107-­‐120.   11.    “Mobile  Technology  in  Libraries  Survey,”  last  modified  2012,   http://www.ohsu.edu/xd/education/library/about/staff-­‐ directory/upload/mobile_survey_ACADEMIC_final.pdf.   12.    Brittany  Osika  and  Cate  Kaufman,  “’Mobilizing’  Community  College  Libraries,”  Searcher  20   (2012):  36-­‐46.   13.    Scott  La  Counte,  “Introduction,”  in  Mobile  Library  Services:  Best  Practices,  ed.  Charles  Harmon   and  Michael  Messina.  (Maryland:  Scarecrow  Press,  2013),  v-­‐vii.     14.    Fred  D.  Barnhart  and  Jeannette  E.  Pierce,  “Becoming  Mobile:  Reference  in  the  Ubiquitous   Library,”  Journal  of  Library  Administration  52  (2012):  559-­‐570,     doi:10.1080/01930826.2012.707954.   15.    Mark  Andy  West,  Arthur  W.  Hafner,  and  Bradley  D.  Faust,  “Expanding  Access  to  Library   Collections  and  Services  Using  Small-­‐Screen  Devices,”  Information  Technology  &  Libraries  25   (2006):  103-­‐107.   16.    Joan  K.  Lippincott,  “Mobile  Technologies,  Mobile  Users:  Implications  for  Academic  Libraries,”   ARL:  A  Bimonthly  Report  on  Research  Library  Issues  &  Actions  261  (2008):  1-­‐4.     17.    Walsh,  Using  Mobile  Technology,  58.   18.    Ibid.   19.    “Mobile  Technology  in  Libraries  Survey.”     INFORMATION  TECHNOLOGIES  AND  LIBRARIES  |  JUNE  2015   147     20.    Edward  Iglesias  and  Wittawat  Meesangnil,  “Mobile  Website  Development:  From  Site  to  App,”   Bulletin  of  the  American  Society  for  Information  Science  and  Technology  38  (2011):  18-­‐23,   doi:  10.1002/bult.2011.1720380108.   21.    Joshua  Bishoff,  “Going  Mobile  at  Illinois:  A  Case  Study,”  in  Mobile  Library  Services:  Best   Practices,  ed.  Charles  Harmon  and  Michael  Messina.  (Maryland:  Scarecrow  Press,  2013),  107-­‐ 121.   22.    Walsh,  Using  Mobile  Technology.   23.    Helen  Bischoff,  Michele  Ruth,  and  Ben  Rawlins,  “Making  the  Library  Mobile  on  a  Shoestring   Budget,”  in  Mobile  Library  Services:  Best  Practices,  ed.  Charles  Harmon  and  Michael  Messina.   (Maryland:  Scarecrow  Press,  2013),  43-­‐54.     24.    Walsh,  Using  Mobile  Technology.   25.    Ibid.   26.    Ibid.   27.    Ibid.,  105.   28.    Ibid.,  97.   29.    Ibid.   30.    “Go  Mobile:  Use  These  Strategies  and  Increase  your  Mobile  Literacy  and  your  Patrons’   Satisfaction,”  last  modified  November  1,  2009,   http://libraryconnect.elsevier.com/articles/technology-­‐content/2009-­‐11/go-­‐mobile.     31.    Walsh,  Using  Mobile  Technology,  45.   32.    Peters,  “Left  to  Their  Own  Devices.”   33.    Geoffrey  Little,  “Keeping  Moving:  Smart  Phone  and  Mobile  Technologies  in  the  Academic   Library,”  Journal  of  Academic  Librarianship  37  (2011):  267-­‐269,  doi:   10.1016/j.acalib.2011.03.004.   34.    Elmore  and  Stephens,  “The  Application  of  QR  Codes.”   35.    Jim  Hahn,  “Mobile  Augmented  Reality  Applications  for  Library  Services,”  New  Library  World   113  (2012):  429-­‐438,  accessed  June  21,  2014,  doi:10.1108/03074801211273902.   36.    Ibid.   37.    WolfWalk:  Explore  NC  State  History  Right  on  your  Phone,”   http://www.lib.ncsu.edu/wolfwalk/.     A  LIBRARY  IN  THE  PALM  OF  YOUR  HAND:  MOBILE  SERVICES  IN  THE  TOP  100  UNIVERSITY  LIBRARIES  |     LIU  AND  BRIGGS  |  doi:  10.6017/ital.v34i2.5650   148     38.    Ibid.   39.    Osika  and  Kaufman,  “Mobilizing  Community  College  Libraries.”   40.    Kate  Kosturski  and  Frank  Skornia,  “Handheld  Libraries  101:  Using  Mobile  Technologies  in  the   Academic  Library,”  Computers  in  Libraries  31  (2011):  11-­‐13.     41.    “National  University  Rankings,”  http://colleges.usnews.rankingsandreviews.com/best-­‐ colleges/rankings/national-­‐universities/spp+50.   5638 ---- Microsoft Word - December_ITAL_zak_final.docx Do  You  Believe  in  Magic?     Exploring  the  Conceptualization  of   Augmented  Reality  and  its  Implications   for  the  User  in  the  Field  of  Library  and   Information  Science       Elizabeth  Zak     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  DECEMBER  2014           23   ABSTRACT   Augmented  reality  (AR)  technology  has  implications  for  the  ways  that  the  field  of  library  and   information  science  (LIS)  serves  users  and  organizes  information.  Through  content  analysis,  the   author  examined  how  AR  is  conceptualized  within  a  sample  of  LIS  literature  from  the  Library,   Information  Science  and  Technology  Abstracts  (LISTA)  database  and  Google  Blogs  postings.  The   author  also  examined  whether  Radical  Change  Theory  (RCT)  and  the  digital-­‐age  principles  of   interactivity,  connectivity,  and  access  are  present  in  the  discussion  of  this  technology.  The  analysis  of   data  led  to  the  identification  of  14  categories  comprising  132  total  codes  across  sources  within  the   data  set.  The  analysis  indicates  that  the  conceptualization  of  AR,  while  inconsistent,  suggests   expectations  that  the  technology  will  enhance  the  user  experience.  This  can  lead  to  future   examinations  of  user  behavior,  response,  and  observation  of  technologies  like  AR.   INTRODUCTION   It  seems  an  understatement  to  say  digital  technology  is  changing  quickly.  Cell  phones  are  like   small  computers  in  our  pockets;  we  have  access  to  far  greater  computing  resources  in  “the  cloud”   than  we  did  just  five  years  ago,  and  computers  are  processing  at  speeds  once  only  a  fantasy.  This   digital  revolution  includes  the  continued  development  of  augmented  reality  (AR)  applications.  At   its  simplest,  AR  is  a  blending  of  the  physical  environment  with  digital  elements.   As  with  many  of  the  latest  technologies,  the  development  of  AR  is  interdisciplinary.  Professionals   in  the  fields  of  computer  science,  psychology,  and  philosophy  seem  to  direct  the  discussion  on  the   development  and  application  of  AR  technology,  as  evidenced  by  the  volume  of  literature  when   searching  these  subject  databases  for  articles  pertaining  to  AR.  The  field  of  library  and  information   science  (LIS)  seems  largely  absent  from  the  conversation.   While  elements  of  AR,  such  as  global  positioning  systems  (GPS),  quick  response  (QR)  codes,  and   virtual  reality  are  not  uncommon  in  LIS  literature,  rarely  are  these  topics  defined  as  AR.   Information  theory,  information  behavior,  knowledge  management,  information  architecture,  and   digital  literacy  (to  name  only  a  few)  are  key  areas  of  study  within  LIS,  which  can  be  central  in       Elizabeth  Zak  (ezak@dom.edu),  is  adjunct  instructor  at  Dominican  University,  River  Forest,   Illinois.     DO  YOU  BELIEVE  IN  MAGIC?  |  ZAK       24   developing  and  exploring  AR.   The  focus  on  and  the  definitions  of  the  user  within  LIS  provide  a  much  different  perspective  on  the   human  aspect  of  engaging  with  digital  information  than  is  found  within  computer  science.   Literature  on  human-­‐computer  interaction  focuses  more  on  the  user  as  a  piece  of  the  system,  with   a  shift  only  recently  toward  acknowledging  this  misdirected  focus.1,2,3,4,5,6,7,8  Libraries  and   information  agencies  have  the  tools  and  skills—with  regard  to  user  interaction  with  and  use  of   information—to  help  answer  questions  relating  to  how  the  conceptualization  of  a  technology  like   AR  influences  the  use  of  the  technology.   Augmented  Reality  (AR)  Defined   What  is  AR,  really?  Ronald  Azuma,  a  pioneer  and  innovator  in  the  research  and  creation  of  AR   applications,  describes  AR  as  a  supplement  to  reality.9  It  combines  the  real  and  the  virtual,  aligning   the  virtual  with  the  real  environment.10  AR  is  part  of  a  mixed  reality  continuum,  and  “the   technology  depends  on  our  ability  to  make  coherent  space  from  sensory  information.”11  This   coherent  space  is  dependent  on  several  variables,  one  of  which  is  AR’s  real-­‐time  interactivity.  AR   applications  cannot  experience  any  delays  in  response  time;  if  the  real  and  virtual  are  misaligned,   it  impedes  the  sense  of  reality.  AR  needs  to  happen  at  the  same  speed  as  real  life—virtual  actions   coinciding  with  human  actions,  and  variable  across  users.   Some  also  view  AR  as  a  new  media  experience,  adding  to  the  growing  list  of  digital  literacies.   “While  the  pure  technology  provides  a  set  of  features  that  can  be  exploited,  the  features  of  the  new   technology  will  develop  into  one  or  more  particular  forms  within  a  particular  historical  and   cultural  setting.”12  This  includes  “remediating”  existing  media,  such  as  film  or  stage  productions,   with  AR  components.   As  a  result,  AR  is  contextual  and  reliant  on  each  personal  experience,  but  it  also  borrows  from   earlier  forms  of  media  within  those  contexts  and  experiences.  For  this  reason,  it  is  important  to   examine  just  how  those  within  LIS  are  constructing  AR  as  a  concept.  The  rapid  pace  at  which  AR  is   evolving  and  gaining  in  popularity  suggests  those  within  the  field  of  LIS  will  need  to  be  aware  of   new  applications  for  the  technology  as  more  and  more  users  may  come  to  expect  access  to  and   knowledge  of  this  technology.   Much  of  the  literature  also  references  AR  as  being  an  enhancement.  But  what,  exactly,  is  AR   enhancing?  The  answer  includes  our  senses  and  perceptions,13  software,14  our  emotions  and   feelings,15  and  question-­‐answering  programs.16  Authors  use  the  term  enhancement  with  no   explanation  of  how  it  is  defined  in  each  piece  of  literature.  Some  assume  more  is  always  better.   Missing  is  the  voice  of  the  user  or  consumer,  whom  authors  refer  to  as  a  subject  or  wearer.  The   irony  is  many  of  these  users  are  the  ones  creating,  building,  and  populating  these  AR  worlds.  I   propose  the  field  of  LIS  is  in  a  prime  position  to  address  the  missing  piece  of  the  research  and   discourse  of  AR.     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  DECEMBER  2014   25   Many  of  the  technological  enhancements  authors  describe  in  the  AR  literature  connect  to  the   development  of  knowledge,  one  of  the  main  goals  of  computing.17  There  are  a  variety  of  settings  in   which  some  expect  AR  to  create  new  routes  to  knowledge.  Within  a  library  setting,  AR  can   improve  library  instruction,18  provide  information  retrieval  about  shelved  books  through   recognition  applications,19  reconstruct  and  restore  artifacts,20  and  deliver  services  at  point  of  need   through  QR  codes.21  Others  view  AR  as  a  technological  breakthrough  with  extreme  potential  in   medical  fields,  such  as  non-­‐invasive  surgical  visualization,  which  can  display  the  organs  through   sensors  placed  on  the  body,  helping  medical  practitioners  and  students  to  understand  internal   body  functions.22   AR  is  also  said  to  perform  some  of  its  “enhancements”  in  the  classroom.23  Augmented  books  are   growing  in  number  and  are  expected  to  enhance  collaboration  and  interactivity  between  students.   Some  allow  more  than  one  person  to  explore  the  same  content  at  the  same  time  or  outside  of  the   classroom  through  mobile  technology;  others  give  students  the  ability  to  rotate,  tilt,  and   manipulate  viewing  angles  of  various  objects.24       AR  technology  within  art  education  is  said  to  have  a  positive  impact  on  student  motivation.25  In  a   2010  study,  reference  librarians  at  the  University  of  Manitoba  were  given  smartphones  and  asked   to  create  innovative  projects.  What  they  came  up  with  resulted  in  a  public  art  project  using  social   networks,  GPS  software,  and  AR  technology  that  allowed  users  to  interact  with  the  art  pieces   through  QR  codes.26  Again  and  again,  interactivity,  connectivity,  and  mobility  of  AR  applications   are  highlighted  as  efficient  and  motivating  factors  in  education  and  learning.27,28,29,30   Organizations  are  also  testing  AR  applications  for  general  public  use.  Museums  and  galleries   contain  AR  virtual  displays  of  artifacts  and  historic  scenes,  personalizing  interactive  experiences   and  providing  multiple  perspectives  of  events  and  artifacts.31  The  Natural  History  Museum  in   London,  for  example,  created  the  Attenborough  Studio  in  2009  for  live  events  and  the  viewing  of   AR  enhanced  films.32  AR  tours  are  also  offered  on  historic  places  and  spaces.  Augmented  reality  is   already  in  use  as  a  tour  guide  application,33  to  supplement  paper  maps,34  and  even  to  reconstruct   damaged  historic  sites.35  AR  is  essentially  a  mutable  form  of  displaying  what  are  typically  static   objects  and  ideas.   The  use  of  AR  is  integrative.  It  can  saturate  real  landscapes,  places,  and  spaces  with  virtual   characteristics;  it  can  add  to  or  hide  objects  in  the  environment.36  Manipulation  of  the   environment  promotes  “immersive  experiences,”37  multifaceted  for  different  people  interacting   with  AR.38  Virtual  experiences  are  expected  to  enhance  the  real-­‐life  experiences  of  the  user.  Spaces   can  become  layered,  or  scaffold,  not  only  through  depth  perception  or  AR  software,  but  also   through  the  user’s  contextual  life  experiences.39   With  growing  widespread  use,  it  is  imperative  for  those  in  the  field  of  LIS  to  understand  this   technology  and  how  it  is  used.  How  will  these  applications  be  archived?  Will  institutions  and   organizations  collaborate  with  users  in  creating  these  applications?  Will  users  expect  librarians   and  other  knowledge  workers  to  help  them  understand  and  use  the  technology?  Will  libraries  and     DO  YOU  BELIEVE  IN  MAGIC?  |  ZAK       26   other  institutions  use  the  technology,  and  if  so,  in  what  capacity?  These  are  just  some  of  the   questions  LIS  professionals  should  be  asking  to  ensure  they  are  meeting  user  needs  in  relation  to   technology  and  the  access  of  information  through  technology.  This  study  provides  a  base  for   considering  these  questions,  and  the  effects  of  AR,  by  situating  the  concept  and  expectations  of  AR   within  the  field  and  aligning  the  conceptualization  with  Radical  Change  Theory.   The  User  Within  AR  and  LIS   The  use  of  AR  can  bring  with  it  an  altering  of  emotional  and  psychological  experiences.  Pederson   argues  AR  applications  should  be  “human-­‐centric”.40  Because  human  beings  “instrumentalize”   technology  such  as  AR,  the  technology  itself  should  accommodate  the  human  needs  for  AR  tools.41   For  AR  applications  to  be  successful,  they  must  display  the  human  characteristics  encompassing   reactions  to  one  another  and  the  environment.42  Applications  blurring  the  lines  between  the  real   and  the  virtual,  and  engaging  the  person  in  ways  that  play  on  perception,  are  capable  of  changing   the  real-­‐life  perceptions  and  expectations  of  the  user.  AR  can  be  either  a  form  of  escapism  for  brief   moments  or  an  escape  from  what  we  know  of  as  real-­‐life  for  good.43   What  much  of  the  research  assumes  is  the  desire  for  AR  applications.  Lacking  is  surveys  of  user   desire  for  AR,  or  examination  of  potential  negative  consequences.  A  user-­‐focus  is  central  to  LIS   and  the  values  of  librarianship.  Within  all  these  potential  uses,  what  is  not  being  discussed  is  how   creators  of  AR  applications  organize,  categorize  and  choose  the  information  contained  within   these  applications.  Much  of  the  discussion  surrounding  the  implications  of  AR  use  is  left  to  those  in   fields  of  philosophy  and  psychology,  dealing  in  abstraction.  The  research  and  marketing  of  AR   applications  promote  passive  acceptance  of  AR  technologies.44  There  comes  a  point  when  AR   researchers  must  ask,  to  what  extent  are  potential  users  aware  of  and  desirous  of  AR  applications?   The  notion  of  people  as  users  is  quite  similar  to  the  contextualization  of  human  as  subject  in  this   scientific  literature,  different  from  the  notion  of  user  in  LIS.  In  computer-­‐science  literature,  users   are  those  reacting  to  AR  and  providing  AR  researchers  with  content  for  AR  modification  as   evaluators  of  the  technology.  Within  computer-­‐science  literature,  there  is  much  talk  of  the  user’s   satisfaction  with  tested  AR  applications;45  there  also  much  talk  of  the  user’s  position  within  the  AR   frame/environment.46  Similar  is  the  discussion  of  the  user  and  the  physical  space  of  the  AR   application,  and  how  the  user  responds  to  the  AR  functions.47  User  and  subject  are  terms  used   interchangeably  in  the  scientific  literature.  These  terms  are  largely  undefined,  indicating  little   regard  for  the  role  of  the  human  outside  of  objectification  as  a  tool  working  in  the  AR  environment.   Within  the  field  of  LIS,  the  notion  of  the  user  takes  on  different  characteristics:  users  are  clearly   patrons,  human  beings,  for  which  LIS  provides  a  service.  Kuhlthau  focuses  on  user-­‐centric   treatment  of  information  services  with  her  Information  Search  Process  model.48  Chatman’s   theories  of  Information  Poverty  and  Life  in  the  Round  both  center  on  the  information-­‐seeking   behaviors  of  ordinary  people  with  everyday  needs.49     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  DECEMBER  2014   27   Library  anxiety,  as  developed  by  Mellon,  pivots  on  the  emotions  and  psychological  responses  of   information-­‐seekers,  or  users.50  There  are  many  more  theories  and  models  demonstrating  this   user-­‐centric  focus  in  LIS.  Case  points  to  Dervin’s  1976  article  “Strategies  for  Dealing  with  Human   Information  Needs:  Information  or  Communication?”  as  an  exemplar  of  the  shift  in  thinking  about   the  role  and  needs  of  the  user  in  LIS.51  Dervin  pointed  to  assumptions  in  research  on  information-­‐ seeking,  such  as  more  information  as  better,  and  relevant  information  existing  for  every  need.  It  is   important  to  note  whether  there  are  assumptions  now  being  made  in  LIS  literature  with  regard  to   user  interaction  with  AR  and  how  best  to  understand  user-­‐centered  design  of  AR  applications.   Bowler  et  al.  define  user-­‐centered  design  as  “[reflecting]  the  user,  typically  from  a  cognitive,   affective  or  behavioral  point  of  view,  as  well  as  the  social,  organizational,  and  cultural  contexts  in   which  users  function.”52  These  points  of  view—the  cognitive,  social,  behavioral,  and  the  like—all   synchronize  within  AR  applications.  Bowler  et  al.  state,  “With  the  increased  use  of  digital,   networked  information  tools  in  daily  practice  and  the  emergence  of  the  digital  library  and  archive,   it  is  impossible  to  separate  the  service  from  the  system.  In  this  context,  understanding  the  user   becomes  more  critical  than  ever.”53  With  the  advent  of  AR  and  its  dependence  on  user  interaction,   it  is  imperative  to  continue  to  address  the  role  of  the  user.   O’Brien  and  Toms  further  the  discussion  by  trying  to  define  user  engagement  with  technology.  The   authors  define  engagement  as  a  process,  composed  of  three  states:  a  point  of  engagement,  a  period   of  engagement  and  then  finally  disengagement.54  Attention  to  user  needs  and  behavior  as  an   individualized  process  is  evident  across  LIS  literature.  Shu  suggests  user  engagement  in  website   design  as  a  means  to  strengthening  user  relationships  with  organizations  through  a  study  of  Web   2.0  [interactive  internet  applications].55  Idoughi,  Seffah,  and  Kolski  recommend  integrating   “personae,”  or  perceived  personality  types  and  characteristics,  into  user-­‐design  to  address   challenges  in  creating  software  offering  highly  personalized  services.56  Pang  and  Schauder  take  a   community-­‐based  approach  to  systems  design,  particularly  in  libraries  and  museums  and   encourage  system  designers  to  draw  on  the  study  of  relationships  and  interaction  within  different   communities  as  a  means  to  gain  insight  into  more  user-­‐centric  design  methods.57   A  user-­‐centric  focus  has  extended  itself  to  catalog  design58  and  information-­‐retrieval  systems   design.59  It  has  been  applied  as  a  learning  approach  to  organizational  culture  within  libraries.60   Scholarly  research  within  the  library  is  said  also  to  have  benefitted  because  a  user-­‐centric  focus   informs  how  different  types  of  users  interact  with  information  within  the  library.61  Through  an   analysis  of  user  language,  user-­‐centricity  is  also  applied  as  a  means  to  identify  strategies  for   creating  language  tools  for  web  searching.62  Such  studies  are  representative  of  the  ways  a  user-­‐ centric  paradigm  proliferates  within  LIS.   These  models  and  studies  highlight  the  importance  of  the  user  perspective  and  how  the  user   engages  with  information  at  various  levels.  Radical  Change  Theory  (RCT),  developed  by  Eliza   Dresang  in  1999,  goes  one  step  further  and  adds  another  element  to  the  user-­‐information   interaction:  the  user’s  interaction  with  other  users  and  the  response  to  the  interactive  applications     DO  YOU  BELIEVE  IN  MAGIC?  |  ZAK       28   and  digital  technologies  in  which  information  is  becoming  embedded.  This  element  is  mirrored  in   such  applications  as  Twitter  and  Foursquare,  which  allow  digital  overlays  of  information  to  see  if   there  are  individuals  “near  you”  also  logged  on  to  these  networking  applications.   The  highly  complex  AR  applications  call  for  a  highly  complex  view  of  human  interaction  with  those   applications.  Radical  Change  Theory  helps  to  explicate  the  characteristics  of  human  expectations   and  interactions  with  information  in  digital  formats.  Coupled  with  an  examination  of  AR  discourse   within  LIS  literature,  RCT  helps  clarify  how  people,  or  users,  are  approaching  the  changing   information  landscape.   RESEARCH  APPROACH   This  study  focuses  on  the  user-­‐centric  paradigm  prevalent  in  LIS,  seeking  to  understand  the   relationship  between  the  conceptualization  of  an  emerging  technology  and  the  role  of  the  user.   The  research  questions  guiding  this  study  are  the  following:   • How  is  augmented  reality  (AR)  conceptualized  in  LIS?   • What  is  the  role  of  the  user  in  relation  to  AR  as  it  is  conceptualized  in  LIS?     The  aim  is  to  understand  how  a  specifically  LIS  user-­‐centric  focus  can  apply  to  the   conceptualization  of  AR  and  its  use  within  libraries  and  cultural  heritage  institutions.   The  model  for  this  study  is  Clement  and  Levine,  which  examined  how  pre-­‐1978  dissertations  were   published  and  what  the  concept  of  copyright  was  for  those  dissertations.  The  unit  of  sampling  in   their  study  was  the  written  message,  “defined  as  a  complete  statement  or  series  of  statements   with  a  distinct  start  and  end.”  Each  message  under  investigation  required  author-­‐specified   semantic  concepts.  The  researchers  then  selected  recording  units,  what  they  describe  as  “explicit   assertions”  pertaining  to  the  publication  of  dissertations.  The  authors  delineated  explicit   assertions  as  taking  several  forms,  from  a  phrase  within  a  sentence  to  a  multipage  argument.63   For  the  purpose  of  this  study,  I  investigated  written  messages  contained  in  the  journals  and  blog   posts  I  chose  as  well  as  links  to  other  webpages  or  blog  posts  contained  in  the  initial  data  set.  The   semantic  concept  is  the  term  augmented  reality.  The  recording  unit  is  any  explicit  assertions  made   regarding  augmented  reality  using  the  same  range  of  form  as  Clement  and  Levine.64   In  this  study,  I  allied  content  analysis  with  Radical  Change  Theory  (RCT).  RCT  focuses  on  the   characteristics  of  interactivity,  connectivity,  and  access,  which  are  all  dependent  on  measures  of   human  connections  through  various  forms  of  media.  The  characteristics  of  these  principles  are  as   follows:   • Interactivity  refers  to  dynamic,  user-­‐initiated,  nonlinear,  nonsequential,  complex   information  behavior,  and  representation.       • Connectivity  refers  to  the  sense  of  community  or  construction  of  social  worlds  that  emerge   from  changing  perspectives  and  expanded  associations.         INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  DECEMBER  2014   29   • Access  refers  to  the  breaking  of  long-­‐standing  information  barriers,  bringing  entrée  to  a   wide  diversity  of  opinion  and  opportunity.65   Content  analysis  of  LIS  literature  related  to  AR  through  the  RCT  framework  allowed  me  to  find   connections  between  the  conceptualization  of  a  technology  and  how  the  conceptualization   functions  within  a  given  academic  community.  During  the  data  analysis,  I  assessed  whether  the   characteristics  of  interactivity,  connectivity,  and  access  are  present  in  the  descriptors  of  AR.   Understanding  how  AR  is  brought  into  the  discourse  of  LIS  in  LIS  literature  helps  describe  the   ways  an  emerging  technology  is  developed  as  a  concept  and  as  a  tool  through  an  examination  of   how  researchers  and  practitioners  perceive  the  need  for  and  use  of  AR.   Data  collected  from  searches  of  Google  Blogs  and  the  Library,  Information  Science  and  Technology   Abstracts  with  Full  Text  (LISTA)  database  aided  in  understanding  the  conceptualization  of  AR  and   the  role  of  the  user  in  relation  to  the  conceptualization.  All  the  searches  took  place  over  three   months,  December  2012  to  February  2013,  and  all  searches  centered  on  the  search  term   “augmented  reality”  (the  term  was  enclosed  in  quotation  marks  in  the  search  box).   Through  purposeful  criterion  sampling,  all  search  results  including  the  term  “augmented  reality”   are  included  in  the  initial  list  of  data  for  analysis.  I  chose  the  Google  Blogs  search  engine  because   of  its  popularity,  familiarity,  ease  of  use,  and  variety  of  viewpoints.  Blogs  are  an  important  source   of  data  because  they  continue  to  increase  in  popularity  across  disciplines,  including  LIS,  and  serve   as  a  way  for  people  to  communicate  with  one  another  and  exchange  information.66  The  LISTA   database  is  easy  to  access;  provided  free  to  libraries;  familiar  to  students,  faculty,  and   professionals  in  the  field  of  LIS;  and  covers  a  broad  spectrum  of  general  and  specialized  journals.   Together,  search  results  comprise  both  academic  and  popular,  or  mainstream,  sources.   Google  Blogs   I  first  gathered  data  from  searches  of  Google  Blogs.  I  conducted  two  separate  searches  of  the   search  term  “augmented  reality.”  The  first  was  in  December  2012  and  the  second  was  in  February   2013.  The  first  search  yielded  373  results  and  the  second  yielded  376.  I  used  the  advanced  search   function  to  limit  the  search  to  blog  postings  between  June  2012  and  December  2012.  The  second   search  is  limited  to  June  2012  to  February  2013.  Blog  postings  excluded  from  the  final  body  of   data  for  analysis  include  foreign-­‐language  entries,  duplicate  items,  video-­‐only  postings,  and   advertisements,  resulting  in  a  final  data  set  of  300  postings.   Library,  Information  Science  and  Technology  Abstracts  with  Full  Text  (LISTA)   After  completing  my  search  of  Google  Blogs,  I  gathered  data  from  LISTA  database  searches.  I   searched  the  database  once  a  month  for  three  months,  from  December  2012  to  February  2013.  I   divided  each  monthly  search  into  three  search  types:  first  by  author-­‐supplied  keyword,  second  by   subject  terms,  and  third  by  all-­‐text,  resulting  in  nine  searches.  I  did  this  to  determine  whether  the   results  would  differ  across  each  search  specification.  I  then  cross-­‐referenced  these  lists  and     DO  YOU  BELIEVE  IN  MAGIC?  |  ZAK       30   removed  duplicate  and  foreign-­‐language  results  to  compile  one  complete  data  set  of  160  articles.   Recording  units  from  the  articles,  blog,  or  social  media  postings  I  collected  from  these  searches   include  explicit  assertions  concerning  AR  I  then  coded.  For  the  purposes  of  this  exploratory  study,   I  developed  codes  inductively  rather  than  approaching  the  data  with  a  predetermined  set  of  codes,   as  inductive  codes  arise  from  the  interpretation  of  the  coded  data.67  An  example  of  an  assertion  in   the  dataset  includes  the  following,  taken  from  an  article  in  my  pilot  study:     AR  is  a  very  efficient  technology  for  both  higher  education  such  as  universities  and  colleges.   Students  in  both  schools  can  improve  their  knowledge  and  skills,  especially  on  complex   theories  or  mechanisms  of  systems  or  machinery.68   I  categorized  these  assertions  according  to  the  themes  or  codes  with  which  they  are  embedded.     For  instance,  in  the  assertion  above,  the  notion  that  AR  is  efficient  and  capable  of  improving  or   adding  to  knowledge  and  skills  could  be  categorized  within  similar  assertions  about  the  value  or   purpose  of  AR.  After  I  organized  the  codes  and  categories,  I  determined  which,  if  any,  of  these   codes  coincide  with  the  digital  age  principles  of  RCT  interactivity,  connectivity,  and  access.   I  developed  132  codes.  A  further  reduction  of  codes  is  not  feasible  given  the  myriad  ways  AR  is   defined  across  sources  in  the  data  set,  which  underscores  the  lack  of  consensus  on  just  how  AR  is   truly  understood.  I  went  through  the  codes  and  grouped  them  according  to  similarities  and   overarching  themes  labeled  as  categories.  The  132  codes  make  up  14  categories,  listed  in  table  1.   How  is  AR  conceptualized  in  LIS?   The  LISTA  database  includes  more  than  560  journals  from  LIS  and  related  information-­‐science   fields  such  as  communication  and  museum  studies.  Of  the  77  LISTA  sources  included  in  the  data   set  for  this  study,  46  sources  are  peer-­‐reviewed;  31  are  not.  Of  those  journals  not  peer-­‐reviewed,   all  of  them  focus  specifically  on  issues  in  the  media,  technology,  education  or  libraries.  Based  on   my  analysis,  the  categories  or  overarching  themes  most  prominent  in  the  LISTA  data  set  are  “AR   as  a  new  direction,”  “AR  as  informational,”  and  “AR  as  an  enhancement.”   Taken  together,  these  categories  suggest  AR  can  deliver  information  and  the  user  can  interact  with   information  through  an  enhanced  experience,  which  is  a  new  direction  in  technology.  Individually,   these  categories  are  loaded  with  implications  based  on  the  codes  each  category  encompasses.  The   category  of  AR  as  a  new  direction  itself  includes  twenty-­‐five  codes.  The  codes  include  assertions  of   “AR  as  a  new  normal,  providing  opportunities”  for  those  willing  to  implement  the  technology   because  of  its  “potential,  versatile”  (yet  debatable)  range  of  uses  from  the  business  sector  to   education.         INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  DECEMBER  2014   31                           A  New  Direction   Acting  Upon  reality   Conditioning  the  Environment           A  New  World   Blurs  Line  Between  Real  &  Virtual   Control  Environment           New  Direction   Distinction  Between  Real  &  Virtual   Creating  Environment           New  Normal   Layered  Reality   Increases  view  of  the  Environment       Relevant   Overlay  Worlds   Contextual           Unstoppable   Integrated   Omnipotent  Presence           Important   Embellishment             Raising  Expectations   Improves  Reality   Used  as  a  Tool           Popular   Bringing  to  Life   Discovery  Tool           Trend   Used  to  Simulate   Educational  Tool           Versatile   Unifying   Marketing  Tool           Has  Potential   Crossing  Boundaries   Utility  for  Library  Operations           Needs  Further  Development   Intelligent   Utility  for  Library  Operations           Under-­‐Utilized   Transferring  Intelligence   Library  Instruction  Aid           Unfamiliar   Making  Meaning   Mobile  Learning           Promising   Multimedia  Display   Reading  Aid           Searching   Generates  Media   Increases  Motivation           Opportunity  for  Leadership   Visual  as  Better  than  Textual   Promote  Libraries           Provides  Opportunity   An  Experience   Prompts  Action           Background               Debatable   A  Modifier   Impacting  Economics           Familiar   Catalyst  for  Change   Economic  Barriers           Skepticism  &  Understanding   Changing  Network  Structure   Economic  Growth           Entertaining  Over  Practical   Eliminates  Objects   Low  Cost           Ill-­‐defined   Potential  for  Eliminating  People   Reducing  Costs           Redefined   Transformative   Business  Model           Valuable   Revolutionary   Retail             Problem  Solver   Measurable           Obtainable   Restorative   Niche  Market           Access*   Capable  of  Injury   Gimmick           Questionable  Access   Safety             Democratization   Disruptive   Progressive           Relative  Ease  of  use   Innovative   Eighth  Mass  Medium           Simplicity   Influential   Occurs  in  Phases           Requiring  knowledge   Powerful   Part  of  a  Continuum                         Characterized  by  Empty  Descriptors   Confrontational   Involving  Imagination           Amazing   Challenging  Perceptions   Envisioning           Cool   Challenging     Science  Fiction           Exciting   Therapeutic   The  Future           Fun   Utilizes  Mobile  Devices             Unique   Wearable   An  Enhancement           Awkward     Enhance  Communication           Unpredictable   Informational   Enhance  User  Experience           Entrancing   Changing  Definitions  of  Personal  Information   Enhance  Reality           Phenomenon   Delivering  Information   Enhancing  Learning  and  Training       Magic   Defining  Relevant  Information   Enhancing  the  Library             Helps  Gather  Information   Enriching           Evoking  Legal  Questions   Presenting  Location-­‐Based  Information   Engaging           Beginning  Litigation   Speed  of  Information  Delivery   Interactivity*           Marketing  Disputes   Superimposes  Information   Building  Relationships           Privacy  Concerns   Providing  Services  to  User   Collaboration               Connectivity*                           Table  1.  Codes  Grouped  by  Category.     *  Denotes  principles  of  Radical  Change  Theory     DO  YOU  BELIEVE  IN  MAGIC?  |  ZAK       32     While  the  technology  still  needs  development  in  terms  of  user  awareness  of  the  technology,  a  clear   definition  of  the  technology,  and  an  understanding  of  its  full  range  of  uses,  some  view  it  as  a  trend   that  raises  the  bar  of  technological  expectations.  When  viewed  as  a  growing  trend,  AR  is  creating  a   new  world  or  platform  for  information  delivery.   Furthermore,  in  the  LISTA  sources  AR  is  also  viewed  as  informational.  This  category  comprises   eight  codes,  all  of  which  refer  to  the  capability  of  AR  to  deliver,  gather,  define,  present,  and   superimpose  information  rapidly.  In  this  context  for  LIS,  AR  is  another  format  for  providing  the   user  with  information  tailored  to  specific  user  needs.  The  ways  AR  can  provide  users  with   information  are  nestled  in  the  view  of  AR  as  an  enhancement.   Within  this  category,  AR  is  described  as  an  enhancement  of  reality,  communication,  experiences,   and  learning.  Under  this  category  there  is  little  AR  will  not  enhance.  The  enhancement  of  the  user   experience  is  directly  tied  to  the  informational  quality  of  AR.  This  enhancement  rests  on  the   engaging  interactive  qualities  of  AR  fostering  relationships  and  collaboration  through  the  property   of  connectivity.  Under  this  definition,  the  digital  information  AR  displays  or  “creates”  is  the   enhancement  of  the  experience.   Sources  in  the  data  set  also  suggest  AR  enhances  the  learning  experience  through  the  digital   images  or  objects  presented  to  the  user  in  conjunction  with  the  “original  materials.”  The   connection  of  AR  to  the  Internet  and  sources  therein  also  gives  users  the  ability  to  connect  and   collaborate  with  one  another.  The  enhanced  experiences  provided  by  way  of  AR  foster   connections  between  users  as  well  as  librarians  and  the  creators  of  AR  applications  themselves.   Authors  within  the  data  set  expect  connecting  with  others  and  building  relationships  will  make   user  experiences  much  richer  in  terms  of  how  the  user  interacts  with  information  and  in  what  way   information  is  presented  to  the  user.   The  Google  Blogs  search  was  not  limited  to  LIS-­‐specific  blogs,  as  the  search  function  is  limited  in   definable  search  parameters.  Of  the  300  blog  posts  in  the  data  set,  only  four  actually  include  the   term  library  or  refer  to  AR  applications  within  libraries.  One  blog  post  alludes  to  archiving  but   does  not  explicitly  mention  a  library  setting.  These  blog  postings  code  for  versatility,  utility,   interactivity,  discovery  tool,  an  experience,  a  library  instruction  aid,  promoting  and  enhancing  the   library,  access,  and  providing  services  to  users.  While  not  all  of  these  codes  are  included  in  the  three   dominant  categories  coding  for  the  LISTA  sources,  they  do  reflect  the  utilitarian  quality  of  AR  as  a   provider  of  information  at  its  most  basic.  For  example,  AR  is  in  one  source  a  versatile  tool   enhancing  aspects  of  the  user’s  library  experience,  a  view  shared  with  the  aforementioned  LISTA   excerpts,  from  the  ways  services  are  provided  for  the  user  to  the  level  of  interactivity  the  patron   has  with  relevant  information  within  the  library,  such  as  finding  specific  book  locations  or   accessing  information  about  the  services  the  library  offers.     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  DECEMBER  2014   33   Archiving  information  displayed  through  AR  applications  poses  a  challenge  and  is  a  necessary   consideration  for  those  interested  in  archiving  information.  Truly,  AR  has  the  potential  to  change   archiving  platforms  and  access  to  those  platforms.  The  archiving  of  information  presented  via  AR   is  a  concern  for  those  implementing  AR  for  a  variety  of  purposes.  Presenting  even  library  hours  or   wayfinding  information  for  a  user  through  AR  also  raises  the  question  of  how  information  will  be   accessed,  for  how  long,  and  in  what  form  it  will  exist  once  it  is  no  longer  needed,  updated,  or   changed.   The  coded  data  suggests  AR  is  conceptualized  as  a  new  development  in  digital  technology  worth   paying  attention  to,  at  least  for  now,  but  should  also  be  approached  with  some  caution;  users  or   those  looking  to  implement  AR  should  be  careful  to  understand  the  functionality  and  implications   of  using  AR  prior  to  adapting  the  technology.  As  reflected  in  the  sources,  books  and  physical   spaces  are  potential  areas  for  AR  application  and  in  some  cases  are  already  overlaid  with  AR  and   are  enhancing  user  experiences.  As  a  whole,  the  conceptualization  of  AR  is  a  technology  with  great   potential  to  change  the  way  users  interact  with  information  because  of  its  versatility,  mobility,  and   direct  interaction  with  the  user’s  immediate  environment.   What  Is  the  Role  of  the  User  in  Relation  to  AR  as  it  is  Conceptualized  in  LIS?   The  user  is  the  foundation  of  AR,  giving  the  technology  its  functionality  or  prompting  action.   Whether  conceptualized  as  a  new  direction,  an  information  source  or  provider,  or  an   enhancement,  AR  is  essentially  static  if  there  is  no  user  prompting  the  AR  application  to  “act.”   Without  action  on  the  user’s  part,  the  information  stored  within  AR  applications  is  inert.  The  goal   of  AR  is  to  present  information  in  digital  form  within  the  context  of  the  user’s  surroundings,   environment,  or  reality.   Codes  describing  AR  as  a  new  direction,  new  world,  or  new  normal  solidify  the  idea  that  AR  is  a  new   technological  development  poised  to  redefine  not  only  the  information  landscape  but  also  the   ways  users  interact  with  technology  and  information.  Furthermore,  references  to  AR  as  a   seemingly  unstoppable  popular  trend  point  to  the  perceived  usefulness  and  importance  of  AR  in   the  life  of  the  user,  as  it  is  seen  to  raise  expectations  in  terms  of  how  users  access  information.  One   blog  source  writes,  “You  may  have  heard  about  augmented  reality  before.  If  you  haven’t,  you’ll  be   hearing  a  lot  about  it  from  now  on,  with  the  smartphone  and  tablet  revolution  now  in  full-­‐ swing.”69  The  “revolution”  surrounding  smartphones  and  tablets  alludes  to  the  increase  in  their   use  and  sales,  making  these  devices  staples  in  everyday  life.  This  idea  is  parallel  to  user   expectations  in  terms  of  online  resources,  as  many  users  rely  on  information  accessed  through  the   Internet.70,71,72,73  The  implication  is  once  AR  applications  gain  more  widespread  use,  the  user  will   come  to  expect  access  to  a  wide  range  of  information  through  those  AR  applications.   Because  AR  is  still  met  with  skepticism,  and  for  some  is  still  in  need  of  further  development,  the   technology  provides  opportunities  for  librarians  and  their  staff  as  those  implementing  and     DO  YOU  BELIEVE  IN  MAGIC?  |  ZAK       34   providing  AR-­‐based  services  to  forge  new  paths  in  AR  application,  becoming  users  of  the   technology  “behind-­‐the-­‐scenes.”  The  following  excerpt  from  a  LISTA  source  highlights  this  view:   Technological  advances  are  beginning  to  fundamentally  change  the  way  that  library  users   interact  with  digital  information,  and  it  is  therefore  essential  that  librarians  become  engaged   with  the  relevant  technology  and  leverage  their  role  as  teachers  in  order  to  help  ensure  their   continued  relevance  in  the  lives  of  clients  in  the  twenty-­‐first  century.74   Within  this  statement,  librarians  are  learning  and  teaching  about  new  technology.  Maintaining   “relevance  in  the  lives  of  clients”  suggests  as  the  technology  grows,  implementers  of  the   technology  need  to  understand  the  technology.  As  reflected  in  this  excerpt,  on  the  other  end  is  the   user  (or  client)  of  the  AR  application  after  it  is  created  and  implemented.  The  user  has  the   opportunity  not  only  to  access  and  experience  information  in  new  ways  but  also  to  help  build  and   contribute  to  the  creation  and  streamlining  of  the  AR  application  through  his  or  her  response  to   the  application’s  functionality.   The  user  of  AR  is  also  central  to  the  view  of  AR  as  informational.  In  a  simplistic  way,  AR  is  dormant   and  incapable  of  providing,  delivering,  searching  for,  superimposing,  or  really  doing  much  of   anything  with  information  without  user  actions.  In  most  cases  with  AR,  users  must  have  some  type   of  mobile  device  to  prompt  what  is  often  described  as  an  AR  experience.  Information  provided  via   AR  applications  is  unlike  the  physical  format  of  a  book,  newspaper,  magazine,  or  other  object.   These  physical  objects  exist  in  libraries  and  the  like,  residing  on  shelves  and  taking  up  physical   space  regardless  of  the  location  or  presence  of  a  user.  By  contrast,  the  information  embedded   within  an  AR  application  is  only  viewable  and  unlocked  when  the  user  prompts  action  through  the   viewfinder  of  a  smartphone,  tablet,  or  other  mobile  device.  The  technology  is  described  as  a   catchall  or  endless  repository  for  interactive  information  for  the  user,  at  the  user’s  fingertips.   Likewise,  the  conceptualization  of  AR  as  any  kind  of  enhancement  is  also  dependent  on  the  user,   though  this  assertion  is  implicit  in  many  of  the  coded  passages.  Without  user  interaction   prompting  the  AR  application  to  overlay  information  onto  the  real  environment,  AR  is  incapable  of   enhancing  experiences,  libraries,  communication,  or  reality  itself.  Enhancement,  or  improvement,   suggests  something  or  someone  is  acted  on  in  a  way  that  is  beneficial,  intensified,  or  embedded   with  a  stronger  sense  of  value.   Much  like  the  informational  quality  of  AR,  without  the  use  of  a  mobile  device  handled  by  the  user,   AR  is  dormant  and  nonexistent,  unable  to  enhance  the  environment  or  other  aspect  of  physical  life.   Without  a  user  within  a  specific  context,  there  is  not  much  for  AR  to  enhance  because  it  exists  as   merely  a  “marker,”  tucking  away  the  desired  information,  awaiting  a  user  to  come  along  and  point   a  mobile  device  in  its  direction.  As  blogger  David  Meyer  put  it,  AR  “requires  the  active   participation  of  the  consumer—you  do  not  by  default  wander  around  with  your  phone  held  out  in   front  of  you.”75       INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  DECEMBER  2014   35   This  implicit  role  of  the  user  is  buried  under  language  suggesting  AR  is  actually  capable  of  acting   on  the  user  and  the  user’s  reality,  giving  AR  a  sense  of  agency.  For  example,  in  terms  of   enhancement,  while  the  user  is  the  “activator”  of  the  AR  experience,  AR  is  often  described  as   making,  creating,  challenging,  improving,  producing,  delivering,  solving  problems,  and  even   bringing  some  object  to  life,  thus  prompting  this  “enhancement.”  Such  verbs  give  AR  an  active   quality,  as  if  it  is  itself  alive  and  present  in  creating  or  prompting  change.  This  is  best  exemplified   by  the  categories  of  AR  as  acting  upon  reality,  AR  as  a  modifier  and  AR  as  conditioning  the   environment.   The  perceived  quality  of  AR  as  acting  on  reality  is  what  often  drives  the  idea  of  AR  is  a  catalyst  for   change,  conditioning  the  environment  by  either  controlling  the  present  environment  or  creating  a   new  one.  The  transformative  characteristic  of  AR  reshapes  and  redefines  the  user  experience   precisely  because  of  the  ways  AR  inserts  digital  objects  into  the  real-­‐world  environment.  For  this   reason,  AR  is  often  described  through  invoking  imagination  and  with  words  like  cool,  amazing,  fun,   and  exciting,  or  other  empty  descriptors;  AR  truly  enhances  user  interaction  with  the  surrounding   world  with  what  is  perceived  as  a  “wow”  factor,  so  much  so  that  codes  in  this  study  even  reflect   legal  and  economic  implications  of  the  technology.  But,  while  AR  is  a  catalyst  for  change  and  acts   on  the  environment  and  reality,  those  changes  and  actions  are  only  seen  through  the  viewfinder  of   a  mobile  device  ultimately  in  the  hands  of  the  user.   RADICAL  CHANGE  THEORY   Dresang  founded  RCT  on  what  she  identified  as  the  three  digital-­‐age  principles  of  interactivity,   connectivity,  and  access,  which  describe  how  “the  digital  environment  has  influenced  some   nondigitized  media  to  take  on  digital  principles.”  Essentially,  Dresang’s  theory  is  an  attempt  to   explain  information  resources  and  behaviors.  Within  the  digital-­‐age  principles,  the  user  takes  on   the  role  of  initiator.  While  flexible  in  allowing  user  initiation,  the  digital  applications  are  inert   without  the  user;  a  range  of  information  is  unavailable  without  user  action  to  put  the  digital   environment  into  motion.  The  digital  environment  within  AR,  as  described  by  sources  in  the  data   set,  is  an  overlay  of  the  digital  onto  the  real  world.  RCT  suggests  the  “digital  environment  extends   far  beyond  the  digital  resources  themselves.”  The  extension  beyond  the  digital  resources  is  evident   in  AR  as  it  combines  the  real  and  physical  with  the  virtual.76   Based  on  these  principles  and  the  idea  that  the  digital  extends  beyond  the  resources  themselves,   the  conceptualization  of  AR  reflects  the  characteristics  of  RCT  in  explaining  information  behavior   and  representations  in  the  AR  “environment.”  Interactivity,  connectivity,  and  access  are  essential   parts  of  AR.  When  each  principle  is  examined  in  conjunction  with  the  coded  data,  a  picture   appears  of  AR  as  an  exemplar  of  RCT.   Interactivity  and  connectivity  emerged  as  coded  assertions  within  the  data.  These  codes  fall  under   the  category  of  AR  as  an  enhancement.  As  stated,  AR  is  expected  by  many  of  the  voices  within  my     DO  YOU  BELIEVE  IN  MAGIC?  |  ZAK       36   data  set  to  influence  or  enhance  almost  every  aspect  of  our  lives,  and  part  of  this  enhancement  is   due  to  the  properties  of  interactivity  and  connectivity  exhibited  by  AR.     As  defined  by  Dresang,  interactivity  refers  to  dynamic,  user-­‐controlled,  nonlinear,  nonsequential   information  behavior  and  representation.77  Speaking  directly  to  the  idea  of  user-­‐controlled,   nonlinear,  and  nonsequential  information  behavior  and  representation  is  the  sense  of  agency  that   AR  is  expected  to  give  users  in  terms  of  how  they  view  and  interact  with  the  digital  overlays  of   information.   The  control  the  user  has  on  the  AR  experience  and  the  information  with  which  he  or  she  comes  in   contact  stems  from  the  functionality  of  AR  often  relying  on  tracking  a  user’s  location.  Presenting  a   user  with  information  based  on  location  is  not  only  user-­‐controlled  but  nonlinear  and   nonsequential,  since  the  application  has  no  predetermined  set  of  boundaries  the  way  a  physical   map  might  have.  People  do  not  typically  travel  through  the  day  in  a  linear  fashion  with  a   predetermined  sequence  of  action.  As  one  blogger  notes,  AR  is  “helping  to  erase  that  line  between   your  real  life  and  how  you  interact  with  the  web.”  78   AR  is  further  tied  to  the  principle  of  interactivity  as  defined  within  RCT  precisely  because  of  its   mobility  and  formatting,  namely,  because  of  mobile  devices.  A  source  within  the  LISTA  data  set   includes  the  following  assertion:     The  AR  paradigm  opens  innovative  interaction  facilities  to  users:  human  natural     familiarity  with  the  physical  environment  and  physical  objects  defines  the  basic     principles  for  exchanging  data  between  the  virtual  and  the  real  world,  thus  allowing     gestures,  body  language,  movement,  gaze  and  physical  awareness  to  trigger  events  in  the     AR  space.79     Gestures,  body  language,  movement,  gaze,  and  physical  awareness  are  all  unpredictable  actions.   For  these  actions  to  “trigger  events  in  the  AR  space,”  there  must  certainly  be  a  high  degree  of   nonlinear  and  nonsequential  information  behavior  and  representations  taking  place;  it  is  highly   unlikely  user  gestures  and  the  like  could  exist  on  a  linear  continuum  of  action.   Further,  within  RCT,  connectivity  refers  to  the  sense  of  community  or  construction  of  social   worlds  emerging  from  changing  perspectives  and  expanded  associations  in  the  world  and  in   resources.  In  terms  of  the  coded  data  in  this  study,  the  code  of  connectivity  reflects  this  idea  of   creating  community  and  social  worlds  through  the  capability  of  AR  to  connect  users  to  various   forms  of  social  media,  to  one  another,  and  to  various  resources.  Users  make  connections  through   games,  location  awareness,  and  applications  allowing  for  the  sharing  of  information  from  user  to   user.  Social  networking  figures  prominently  in  AR  technology.  Users  can  upload  overlays  of  digital   information  captured  on  a  smartphone  to  various  social  networking  sites.  Additionally,  the  ability   of  AR  to  overlay  digital  information  onto  the  real  world,  and  the  customizable  experiences  this   creates,  aids  in  connectivity.  AR  creates  a  virtual  world  wherein  users  can  engage  with  one   another  across  applications;  while  physically  in  different  places,  they  can  share  experiences  or     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  DECEMBER  2014   37   even  simple  conversations.  Connectivity  through  AR  is  possible  in  all  aspects  of  life  as  many   sources  in  the  data  set  allude  to  working,  playing,  and  learning.   To  interact  and  connect  with  one  another,  users  must  have  access  to  the  AR  space.  The  principle  of   access  within  RCT  is  defined  as  the  breaking  of  long-­‐standing  information  barriers,  allowing   exposure  and  access  to  a  wide  range  of  differing  perspectives  and  opportunities.  The  concept  of   access  (both  in  terms  of  what  is  and  is  not  accessible)  does  appear  in  the  coded  data.  Note  access   and  accessibility  within  this  study  refer  to  the  opportunity  or  right  to  use  a  system  or  service,  and   is  not  referring  to  access  and  accessibility  as  it  is  used  within  discourse  pertaining  to  disabilities.   The  use  of  AR  in  conjunction  with  smartphones  is  a  basic  way  of  interpreting  access  as  it  relates  to   RCT  and  AR.  The  mobility  of  smartphones  and  tablets  allows  users  to  access  AR  across  a  variety  of   locations,  and  each  user’s  smartphone  or  tablet  is  uniquely  tailored  to  the  user’s  interests  and   desires  through  differing  collections  of  applications  and  software.  Smartphones  and  tablets  are   comparatively  low-­‐cost  options  for  accessing  and  sharing  access  to  the  Internet  and  AR   applications.  Their  use  is  on  the  rise  because  they  are  often  much  cheaper  to  obtain  than   computers.80,81,82  The  fusion  of  AR  with  mobile  devices  suggests  an  opportunity  for  accessing   information  in  real  time  in  any  place  through  these  technologies  working  in  tandem  with  one   another.  By  its  very  nature,  AR  offers  access  to  “differing  perspectives  and  opportunities”  as  it   presents  the  user  with  information  in  atypical  formats  in  places  and  spaces  once  static,  or  lacking   in  digital  overlays.   AR  is  not  bounded  by  physical  location;  rather,  it  depends  on  and  varies  with  your  physical   location.  The  idea  that  users  can  access  AR  wherever  there  is  an  Internet  connection  means  the   only  real  barrier  to  accessing  AR  is  the  same  barrier  existing  regarding  a  web  connection  for  the   user,  something  at  least  one  source  within  the  LISTA  data  set  alludes  to  in  terms  of  the  digital   divide,  cautioning  that  AR  should  be  used  in  conjunction  with  traditional  formats  of  information   instead  of  in  lieu  of  them.  For  example,  libraries  should  not  replace  traditional  signage  with   information  only  embedded  within  AR  applications.  However,  as  more  and  more  organizations,   institutions,  and  businesses  use  AR  and  provide  users  with  access  rather  than  relying  on  the  user   to  conjure  his  or  her  own  Internet  connection,  more  barriers  to  AR  will  fall.     However,  it  is  important  to  note  that  not  all  AR  applications  actually  do  require  Internet  access.   Mobile  device  applications  can  work  off  of  markers  and  triggers  in  the  physical  environment,  not   web-­‐based  anchors.  For  example,  a  cookbook  can  include  a  marker  next  to  a  recipe  that  when   scanned  displays  an  image  of  the  finished  dish.  Access  to  AR  applications  without  a  web   connection  opens  a  wealth  of  information  to  individuals  who  do  not  have  access  to  the  web.  Not  all   AR  applications  link  to  web-­‐based  information,  and  this  widens  the  pool  of  users  engaging  with   information  through  those  AR  applications.  Moreover,  relative  ease  of  use  and  low  initial  cost  to   create  AR  applications  also  allow  users  to  become  content  creators,  as  exemplified  by  an  AR   application  allowing  an  artist  to  create  virtual  graffiti  in  public  spaces.  Users  are  able  to  display   and  access  information  otherwise  invisible.     DO  YOU  BELIEVE  IN  MAGIC?  |  ZAK       38   It  is  important  to  note  the  differing  views  suggesting  AR  does  and  does  not  require  Internet  access   to  function.  Such  a  discrepancy  points  further  to  AR  as  a  loosely  defined  concept.  From  a  technical   standpoint,  AR  does  not  require  an  Internet  connection,  but  the  Internet  does  serve  as  a   repository  for  information,  and  many  see  AR  as  providing  a  bridge  to  that  information,  be  it  a   company  wanting  to  give  consumers  access  to  product  lines  or  a  library  creating  an  AR  application   linking  to  web-­‐based  databases  and  services.  In  terms  of  RCT,  the  principle  of  access  does  take   into  account  digital  information  as  able  to  both  provide  and  inhibit  access,  as  some  users  may  not   have  access  to  the  often  costly  hardware  allowing  access  to  digitally  formatted  information.  What   is  important  to  highlight  about  the  principle  of  access  is  the  focus  on  the  range  of  voices  and  the   increased  array  of  digital  information  available,  which,  in  the  case  of  this  study,  AR  provides.  Any   object  associated  in  some  way  with  a  monetary  cost  or  technical  savvy  always  has  the  potential  to   leave  some  users  in  the  dark.   Interactivity,  connectivity,  and  access  are  present  in  the  conceptualization  of  AR  within  the  LIS   literature  as  well  as  the  popular  media  blogs.  The  user  is  central  to  both  RCT  and  this   conceptualization  of  AR  as  a  new  technology  with  the  potential  to  change  the  way  users  interact   with  information  and  with  one  another.  The  goal  of  AR  is  to  present  information  in  digital  form   within  the  context  of  the  user’s  surroundings,  environment,  or  reality.  RCT  is  a  theory  seeking  to   understand  changes  in  information  behavior  and  representations,  and  AR  is  an  exemplar  of  the   myriad  changes,  or  the  evolution,  of  the  digital-­‐information  environment.   IMPLICATIONS  FOR  THEORY   This  study  has  several  implications  for  theory  within  LIS.  These  include  the  extension  of  RCT  or   creation  of  new  theories  born  out  of  the  digital  age,  the  understanding  that  a  user-­‐centric  focus  is   essential  to  theory  within  the  digital  age,  and  the  realization  that  AR  opens  new  areas  of  research   in  what  is  considered  an  enhancement  of  information.  While  this  study  sought  to  test  RCT  in   relation  to  the  conceptualization  of  AR,  it  also  provides  a  framework  for  future  studies.  The  results   of  this  study  also  suggest  AR  opens  more  areas  to  explore  within  the  field  of  LIS  to  create  new   theories  or  to  add  to  RCT  as  a  theoretical  framework  to  better  understand  information  behavior   and  representations  in  the  digital-­‐information  environment.   When  Dresang  initially  formulated  RCT,  she  focused  on  youth  information  seeking  behaviors.83   Few  scholars  have  used  the  theory,  but  those  that  have  explore  education,84  literacy,85     communication  and  writing86  as  related  to  changing  technologies.  These  previous  studies,  as  well   as  this  study,  highlight  the  importance  of  this  theory  in  examining  the  effect  of  the  digital-­‐ information  landscape  on  information-­‐seeking  and  user  understanding  of  and  reaction  to  digital   information.  RCT  is  viable  beyond  a  focus  on  youth  information  seeking  and  is  highly  relevant  to   today’s  world.     RCT  developed  to  understand  how  the  digital  age  influences  traditional  and  new  media.  AR  is  itself   often  described  as  an  environment,  a  digital  environment,  which  is  precisely  the  focus  of  RCT.  If     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  DECEMBER  2014   39   AR  is  in  fact  a  “new  normal,”  as  some  describe,  our  information  landscape  is  moving  in  a  direction   where  interactivity,  connectivity,  access,  and  the  role  of  the  user  are  central  to  any  discussion  of   how  information  is  organized,  distributed,  formatted,  and  presented.   Researchers  can  begin  by  adapting  traditional  LIS  theories  to  the  digital  age.  For  example,  Wilson   revised  his  oft-­‐cited  original  general  model  of  information-­‐seeking  behavior  in  an  attempt  to   understand  the  totality  of  information  behavior  by  linking  theory  to  action  and  understanding   what  prompts  and  hinders  the  need  to  search  for  information.87  Researchers  can  begin  to   reevaluate  the  model  to  determine  whether  it  helps  to  explain  information  behavior  within  the  AR   environment,or  whether  aspects  of  the  model  can  be  further  developed  or  revised.  Wilson’s   model’s  focus  on  human  behavior  parallels  the  focus  on  human  behavior  within  the  AR   environment.   Other  theories  within  LIS  can  also  be  adapted  to  the  AR  environment,  such  as  Erdelez’s   Information  Encountering  (IE).  Erdelez’s  theory  focuses  on  a  “memorable  experience  of   unexpected  discovery  of  useful  or  interesting  information”  situated  within  three  elements:   characteristics  of  the  information  user,  characteristics  of  the  information  environment,  and   characteristics  of  the  encountered  information.  Erdelez  further  describes  categories  of   information  users:  superencounterers,  encounterers,  occasional  encounterers,  and   nonencounterers.  While  Erdelez  has  since  taken  the  web  and  the  Internet  into  account  as   information  environments,  this  theory  could  further  be  remodeled  to  include  the  AR   environment.88   Wilson’s  model  and  Erdelez’s  theory  are  just  two  examples  of  theories  within  LIS  lending   themselves  to  further  exploration  of  the  user  of  AR  within  LIS.  Bates’  model  of  information  search   and  retrieval,  known  as  berrypicking,  which  centers  on  the  changing  nature  of  the  search  query   through  the  search  process,  can  also  be  amended  to  include  information  search  and  retrieval   within  the  AR  environment.89  Bates’  model  suggests  as  users  seek  and  find  information,  the   information  search  shifts  from  source  to  source.  The  berrypicking  model  can  also  be  updated  or   expounded  in  response  to  AR  because  of  AR’s  multidimensional  display  of  information—a   relatively  new  phenomenon  for  the  average  user—to  understand  whether  the  same  shift  in   information  queries  occurs  and  what  new  paths  to  information  users  are  taking  within  the  AR   environment.   This  study  suggests  a  user-­‐centric  focus  is  essential  to  any  theories  in  LIS  developed  within  or   extended  to  the  digital  age.  The  user  is  vital  to  making  AR  technology  functional,  as  demonstrated   in  the  conceptualization  of  AR  in  this  study.  The  personalization,  individualization,  and  mobility  of   digital  technology  like  AR  suggest  theories  related  to  information  behavior  within  this   environment  must  account  for  user  interaction.  Information  is  no  longer  contained  within  static   formats.  Geotagging  or  geospatial  awareness  and  social  networking  are  prime  examples  of  the   reliance  of  digital  technology  on  user  interaction.  Without  addressing  the  role  of  the  user  in  the     DO  YOU  BELIEVE  IN  MAGIC?  |  ZAK       40   functionality  of  digital  technology  in  any  context,  theoretical  frameworks  attempting  to  address   information  seeking  and  behavior  in  the  digital  age  will  be  limited.   Additionally,  should  AR  prove  to  be  a  new  direction  in  accessing,  organizing,  delivering,  and   obtaining  information,  it  further  opens  new  areas  of  theoretical  research.  Since  AR  is  considered   an  enhancement  of  the  information  experience,  it  is  incumbent  on  researchers  to  determine  just   how  “enhancement”  is  defined  in  the  context  of  AR  and  how  that  translates  to  the  user  experience.   Researchers  can  strive  to  apply  and  understand  the  concept  of  an  enhancement  in  relation  not   only  to  the  enhancement  of  information  but  also  to  the  experience  of  accessing,  organizing,   delivering,  and  obtaining  information.  The  digital-­‐age  principles  outlined  by  Dresang  in  RCT  are   just  one  example  of  how  to  understand  the  impact  of  the  digital  age  on  the  user’s  interaction  with   information  and  in  what  ways  the  digital  age  creates  enhancements.   By  itself,  the  study  of  AR  technology  raises  more  questions  than  it  answers.  The  study  of  AR   technology  could  lead  to  more  diverse  theoretical  frameworks  seeking  to  answer  not  only   practical  questions  but  also  those  more  philosophical  in  nature,  working  toward  an  understanding   of  how  the  digital-­‐information  environment  influences  everyday  life  as  it  evolves  and  changes  at  a   rapid  pace.   IMPLICATIONS  FOR  PRACTICE   This  study  can  inform  several  aspects  of  practice.  I  expound  on  three  possibilities:  the  clear   definition  of  technologies  like  AR  to  create  an  awareness  and  understanding  of  those  technologies,   development  of  best  practices,  and  the  need  for  a  focus  on  user  collaboration  in  the  design  and   functionality  of  AR  and  similar  technologies.  The  implications  for  practice  concern  both  the  user   and  the  provider  of  information  services.   This  study  provides  perspective,  or  a  starting  point,  from  which  the  field  of  LIS  can  begin  to   analyze  the  use  and  implementation  of  AR  technology.  By  taking  a  step  back  to  understand  the   current  conceptualization  of  AR,  practitioners  within  LIS  can  begin  to  seek  consensus,  identify   best  practices,  maintain  an  awareness  of  how  the  technology  is  used  and  think  realistically  about   what  factors  contribute  to  successful  implementation  of  the  technology  in  a  given  institution.  As   identified  in  the  study,  AR  is  seen  as  a  new  direction.  It  is  important  for  those  within  the  field  to   understand  this  perspective  and  to  go  on  to  identify  what  a  new  direction  in  information  gathering,   organization,  and  seeking  implies  for  the  field  as  a  whole  and  for  users.   As  a  field,  LIS  can  begin  to  have  a  broader  discussion  on  what  exactly  AR  can  provide  and  how  it   can  benefit  user  services.  Such  a  discussion  can  help  practitioners  make  sense  of  how  this   technology  can  work  with  traditional  sources  of  information.  AR  can  be  integrated  with  the   traditional  rather  than  act  as  a  replacement  for  the  traditional.  This  broader  discussion  can  lead  to   a  consensus  on  how  best  to  define  AR  as  a  tool  and  concept.  Within  this  study,  it  is  evident  AR  is   described  in  myriad  ways,  so  it  is  important  to  reflect  on  those  descriptions,  understand  what  the   issues  are  surrounding  the  technology,  and  collaboratively  seek  and  identify  best  practices.     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  DECEMBER  2014   41   Furthermore,  by  identifying  best  practices,  practitioners  can  begin  to  pinpoint  what  applications   of  AR  are  successful  within  an  institution  and  for  users,  and  why  those  applications  are  successful   for  specific  purposes.  In  doing  so,  practitioners  can  build  AR  applications  around  the  needs  and   mission  of  the  institution  rather  than  simply  flock  to  use  a  new  technology.  It  is  therefore  critical   for  practitioners  to  think  realistically  about  AR  implementation.  Adopting  such  a  technology  will   only  be  beneficial  once  practitioners  in  the  library  understand  its  full  impact.  Experience  with   technology  and  programming,  knowledge  of  AR  functionality,  versatility,  and  cost  are  important   factors  to  consider  when  contemplating  how  an  institution  can  benefit  from  AR,  if  at  all.   Similarly,  publishers  are  using  AR  to  supplement  traditional  printed  books.  Educators  are  using   AR  books  in  the  classroom  and  supplementing  traditional  course  instruction  with  these  books.   Such  books  allow  for  3D  rendering  of  models  for  study,  such  as  planets,  molecular  structures,  and   various  other  objects.  Those  within  LIS  have  a  strong  connection  to  the  field  of  education,  and  AR   books  may  become  a  part  of  the  library  collection  as  they  become  more  popular  among  educators.   Practitioners  in  LIS  through  collaboration  with  educators  will  then  need  to  be  aware  of  these   books,  their  functionality,  and  how  to  help  users  access  the  content  lying  dormant  until  “activated”   by  a  smartphone  or  tablet.  This  raises  the  question  as  to  whether  smartphones,  tablets  or  other   devices  that  can  scan  the  environment  will  become  commonplace  in  the  library  to  provide  full   access  to  users.     User  collaboration  also  becomes  central  to  understanding  the  implications  of  AR  on  practice.  User   collaboration  in  design  is  important  because  AR  technology  is  largely  dependent  on  user  context.   As  the  data  suggests,  AR  is  considered  an  enhancement—of  the  environment,  of  information,  and   of  the  user  experience.  Prior  to  implementation,  it  is  critical  to  understand  how  AR  enhances  the   user  experience  and  what  the  perception  is  among  users.  User  surveys  can  lead  to  tailored  AR   applications  for  a  given  library  or  cultural  institution  community  should  there  be  a  need  or  desire   for  AR  applications  identified  among  users.  Coupled  with  the  idea  of  user  collaboration  in  design  is   also  the  need  to  reevaluate  the  physical  spaces  of  libraries  and  similar  institutions.  Because  AR   creates  an  overlay  of  digital  information  on  the  physical  environment,  it  will  be  necessary  for   practitioners  to  identify  what  areas  of  the  library  or  institution  lend  themselves  to  digital  overlays,   what  types  of  information  users  are  accessing  through  AR  applications,  and  whether  the  library   space  is  configured  to  allow  for  navigating  space  via  AR.   Practitioners  can  also  begin  to  survey  the  role  of  RCT  in  understanding  user  information-­‐seeking   behavior.  By  acknowledging  this  theory  as  an  outline  of  our  digital-­‐information  environment,   practitioners  can  be  mindful  of  user  expectations  and  behaviors  as  they  differ  from  traditional   information  representations  and  methods  of  information  retrieval.  As  AR  creates  an  environment   or  experience  for  the  user,  it  is  important  for  practitioners  within  the  field  to  understand  how  this   technology  is  moving  forward  and  what  effect  it  has  on  the  sea  change  occurring  in  user   acquisition  of  information.  RCT  is  a  framework  providing  practitioners  the  lens  through  which  to   make  sense  of  the  sea  change  and  predict  what  might  be  on  the  horizon.     DO  YOU  BELIEVE  IN  MAGIC?  |  ZAK       42   LIMITATIONS  AND  FUTURE  APPLICATION   This  study  is  just  one  step  in  the  process  of  understanding  how  new  technology  is  conceptualized   and  what  effect  that  conceptualization  has  on  implementation.  Should  AR  continue  to  grow  in   popularity,  this  study  can  serve  as  a  model  for  future  research  seeking  to  understand  concepts   misinterpreted,  misunderstood,  or  undergoing  concrete  development.  Utilizing  the  explicit   assertion  as  a  unit  of  analysis  coupled  with  RCT  can  aid  in  investigations  of  other  digital   technologies,  both  in  terms  of  implementation  and  end  use.     The  data  set  and  the  time  period  during  which  the  searches  of  the  data  set  took  place  in  this  study   highlight  two  of  the  study’s  limitations.  Further  studies  can  focus  on  a  wider  range  of  sources  not   limited  to  one  database  or  blog  search  type  and  extend  over  a  longer  period  of  time.  These   limitations  have  potentially  excluded  other  voices,  perspectives,  and  definitions  of  AR,  and  the   time  element  may  exclude  new  applications  or  uses  of  AR  currently  being  implemented.     Limitations  in  data  analysis  also  exist.  Content  analysis  is  one  of  many  research  methods   researchers  can  employ  to  explore  this  topic.  Ethnographic  research  and  user  interviews  can  lead   to  a  deeper  understanding  of  how  users  perceive  AR  and  information-­‐seeking  or  behavior  within   the  AR  environment.  Such  qualitative  studies  can  provide  insight  to  the  role  of  the  user  lacking  in   this  study.  Moreover,  this  researcher’s  own  admitted  bias  against  the  steadfast  use  of  digital   technologies  prior  to  in-­‐depth  understanding  is  what  prompted  the  qualitative  inquiry  guiding  the   study.  Quantitative  methods  can  also  be  used  to  track  the  popularity  or  perceptions  of  AR  through   close-­‐ended  questionnaires  or  surveys  of  both  users  and  practitioners  in  the  field  of  LIS.  Citation   tracking  could  further  reveal  in  what  subfields  of  LIS  the  conversation  surrounding  AR  is  taking   place,  and  may  also  uncover  whether  any  one  researcher  or  group  of  researchers  is  leading  the   conversation.     Future  studies  can  examine  and  expand  on  the  results  of  this  study.  Rather  than  focusing  on   conceptualization  only,  researchers  can  study  which  professional  fields  dominate  the  conversation   surrounding  AR  and  what  areas  of  popular  culture  dominate  the  conversation  or  influence   understanding  of  AR.  Similarly,  other  studies  can  address  the  specificity  of  each  source  making   explicit  assertions  about  this  kind  of  technology.  While  qualitative  in  nature,  the  study  is  limited   because  it  does  not  examine  quantitative  changes  in  the  number  of  articles  or  blog  posts  alluding   to  AR  over  an  extended  period  of  time.  Such  studies  might  unravel  why  AR  is  progressing  as  it  is,   and  may  identify  potential  problems  or  differences  in  the  influence  of  these  perspectives  on  the   use  of  AR.   The  study  of  AR  also  widens  the  spectrum  of  user  studies.  Augmented  reality  open  a  whole  new   area  of  user  interaction  with  information  extending  beyond  the  screen.  With  the  advent  of   products  like  Google  Glass  and  applications  overlaying  digital  information  at  the  click  of  a  button   in  an  endless  array  of  contexts  and  environments,  AR  brings  information-­‐seeking  further  into  a   world  of  instability  and  unpredictability.  The  complex  nature  of  individual  people  is  now  being   coupled  with  a  highly  individualized  complex  technology.     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  DECEMBER  2014   43   The  functionality  of  AR  prompts  the  need  for  archival  studies  related  to  this  technology.  The   mobile  aspect  of  AR,  the  highly  personalized  content  and  the  intangible  quality  of  the  information   stored  within  AR  applications  highlights  the  need  for  an  examination  of  how  such  information  can   actually  exist  within  an  archive  and  be  made  accessible,  or  whether  such  information  even  should   exist  within  an  archive.  Such  a  question  for  those  within  LIS  also  suggests  the  need  for  a  realistic   perspective  on  technology  like  AR—the  next  step,  or  reaction  to  such  technology,  is  often   unrecognizable  and  unidentifiable  until  the  concept  itself  is  dissected  and  each  part  is  interpreted   and  understood.   CONCLUSION   In  this  study,  I  used  content  analysis  to  explore  the  conceptualization  of  AR  technology  within  the   field  of  LIS.  The  model  for  this  study  is  the  work  of  Clement  and  Levine  and  their  use  of  the  explicit   assertion  as  a  unit  of  analysis.90  I  coded  and  examined  explicit  assertions  pertaining  to  AR  in  LIS   literature  and  Google  Blogs  to  determine  how  the  concept  of  AR  is  understood.  Analysis  shows  AR   is  most  prominently  conceptualized  as  a  new  direction  in  technology  and  media  consumption   acting  on  reality  and  as  enhancing  reality  and  interaction  with  information.   AR  is  basically  a  technology  allowing  for  digital  information  to  be  superimposed  on  the  real  world.   But  beyond  that,  it  is  a  technology  changing  the  way  users  interact  with  information,  and  it  has  the   potential  to  continue  changing  how  we  literally  see  information.  The  data  set  suggests  those   within  LIS  conceptualize  AR  as  a  new  development  in  digital  technology  worth  paying  attention  to,   at  least  for  now,  but  should  also  be  approached  with  some  caution  to  be  fully  understood  prior  to   implementation  in  case  its  popularity  and  growth  is  fleeting.  As  reflected  in  the  data  set,  books  and   physical  spaces  are  potential  areas  for  AR  application,  and  in  some  cases  are  already  overlaid  with   AR  and  enhancing  user  experiences.  As  a  whole,  AR  is  conceptualized  as  a  technology  with  great   potential  to  change  the  way  users  interact  with  information  because  of  its  versatility,  mobility,  and   direct  interaction  with  the  user’s  immediate  environment.   Within  this  conceptualization,  the  user  is  central  to  igniting  the  functionality  of  AR.  Whether   conceptualized  as  a  new  direction,  an  information  source  or  provider,  or  an  enhancement,  AR  is   essentially  static  if  there  is  no  user  prompting  the  AR  application  to  “act.”  The  goal  of  AR  is  to   present  information  in  digital  form  within  the  context  of  the  user’s  surroundings,  environment,  or   reality.  As  a  field  dedicated  to  user  services  in  regard  to  information-­‐seeking,  it  is  imperative  to   understand  the  potential  impact  this  technology  has  had  or  will  have  on  everyday  life.   RCT  is  a  theoretical  framework  aiding  in  the  exploration  of  the  potential  impact  of  AR.  Born  out  of   a  desire  to  understand  the  influence  of  the  digital  age  on  the  traditional  or  “analog”  media  with   which  we  engage,  RCT  is  one  of  few  theories  resting  entirely  on  the  characteristics  driving  our   digital-­‐information  environment,  outlined  specifically  as  interactivity,  connectivity,  and  access.   Utilizing  this  theory  as  a  lens  for  future  research  regarding  digital  information  is  a  natural  next   step  in  theory  exploration  and  development.     DO  YOU  BELIEVE  IN  MAGIC?  |  ZAK       44   Together,  AR  and  RCT  accentuate  the  evolution  of  how  we  consume  and  display  information.  From   storytelling  to  printed  pages  to  electronic  devices,  our  engagement  with  information  will  never  be   the  same  again.  As  we  move  forward,  it  is  important  to  continue  to  ask  new  questions,  seek  new   explanations,  and  try  to  formulate  the  most  appropriate  answers  for  the  contexts  in  which  we  all   deal  with  information,  be  it  gathering,  organizing,  seeking,  or  understanding.  This  study  is  one   piece  in  a  puzzle,  and  it  prompts  more  questions  than  it  provides  answers.  AR  can  and  should  be   studied  from  every  aspect  of  the  field  of  LIS,  if  it  is  in  fact  a  new  direction  toward  our  new  normal.   REFERENCES     1.     Nathan  Crilly,  “The  Design  Stance  in  User-­‐System  Interaction,”  Design  Issues  27,  no.  4  (2011):   16–29.   2.     Pelle  Ehn,  “The  End  of  the  User—The  Computer  as  a  Thing,”  in  End-­‐User  Development,  ed.  Y.   Dittrich,  M.  Burnett,  A.  Morch  and  D.  Redmiles  (Berlin,  Germany:  Springer,  2009),  8–8.   3.     Daniel  Fallman,  “The  New  Good:  Exploring  the  Potential  of  Philosophy  of  Technology  to   Contribute  to  Human-­‐Computer  Interaction.”  Paper  presented  at  the  SIGCHI  Conference  on   Human  Factors  in  Computing  Systems,  Vancouver,  British  Columbia,  May  2011).   4.     Bruce  M.  Hanington,  “Relevant  and  Rigorous:  Human-­‐Centered  Research  and  Design     Education,”  Design  Issues  26,  no.  3  (2010):  18–26.   5.     Manuel  Imaz  and  David  Benyon,  Designing  with  Blends:  Conceptual  Foundations  of  Human-­‐ Computer  Interaction  and  Software  Engineering  (Cambridge,  MA:  MIT  Press,  2007).   6.     Laura  Manzari  and  Jeremiah  Trinidad-­‐Christensen,  “User-­‐Centered  Design  of  a  Web  Site  for   Library  and  Information  Science  Students:  Heuristic  Evaluation  and  Usability  Testing,”   Information  Technology  &  Libraries  25,  no.  3  (2006):  163–69.   7.     Yoram  Moses  and  Marcia  K.  Shamo,  “A  Knowledge-­‐Based  Treatment  of  Human-­‐Automation   Systems,”  (2013),  http://arxiv.org/abs/1307.2191.   8.     Marc  Steen,  “Human-­‐Centered  Design  as  a  Fragile  Encounter,”  Design  Issues  28,  no.  1  (2012):   72–80.   9.     Ronald  T.  Azuma,  “A  Survey  of  Augmented  Reality,”  Presence:  Teleoperators  and  Virtual   Environments  6,  no.  4  (1997):  355.   10.    Antti  Aaltonen  and  Juha  Lehikoinen,  “Exploring  Augmented  Reality  Visualizations,”   Proceedings  of  the  AVI  ’06:  Proceedings  of  the  Working  Conference  on  Advanced  Visual   Interfaces  (2006):  453–56,  http://portal.acm.org/citation.cfm?id=1133357.     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  DECEMBER  2014   45     11.    Peter  Anders,  “Designing  Mixed  Reality:  Perception,  Projects  and  Practice,”  Technoetic  Arts:  A   Journal  of  Speculative  Research  6,  no.  1  (2008):  19–29,     http://dx.doi.org/10.1386/tear.6.1.19_1.   12.    Blair  MacIntyre,  Jay  David  Bolter,  Emmanuel  Moreno,  and  Brendan  Hanigan,  “Augmented   Reality  as  a  New  Media  Experience,”  Proceedings  of  the  IEEE  and  ACM  International   Symposium  on  Augmented  Reality  (New  York,  New  York:  2001),  197-­‐206,   http://dx.doi.org/10.1109/ISAR.2001.970538.   13.     Aaltonen  and  Lehikoinen,  “Exploring  Augmented  Reality  Visualizations.”   14.    Benjamin  Avery  et  al.,  “Evaluation  of  User  Satisfaction  and  Learnability  For  Outdoor   Augmented  Reality  Gaming”  User  Interfaces  2006:  Proceedings  of  the  Seventh  Australasian  User   Interface  Conference—Volume  50  (Darlinghurst,  Australia:  Australian  Computer  Society,   2006),  17–24.   15.    Oliver  Bimber,  L.  Miguel  Encarnacao,  and  Dieter  Schmalstieg,  “The  Virtual  Showcase  as  a  New   Platform  for  Augmented  Reality  Digital  Storytelling”  Proceedings  of  the  EGVE  ’03:  Proceedings   of  the  Workshop  on  Virtual  Environments  (New  York:  ACM,  2003),  87–95,   http://portal.acm.org/citation.cfm?id=769964.   16.    Push  Singh,  Barbara  Barry  and  H.  Liu,  “Teaching  Machines  About  Everyday  Life,”  BT   Technology  Journal  22,  no.  4  (2004):  211–26.   17.    Alan  M.  Turing,  “Computing  Machinery  and  Intelligence,”  Creative  Computing  6,  no.  1  (1950):   44–53.   18.    Chih-­‐Ming  Chen  and  Yen-­‐Nung  Tsai,  “Interactive  Augmented  Reality  System  for  Enhancing   Library  Instruction  in  Elementary  Schools,”  Computers  and  Education  59,  no.  2  (2012):  638– 52.   19.    David  Chen  et  al.,  “Mobile  Augmented  Reality  for  Books  on  a  Shelf.”  Paper  presented  at  2011   IEEE  International    Conference  on  Multimedia  and  Expo  (ICME),  Barcelona,  Spain,  July  2011,   http://dx.doi.org/10.1109/ICME.2011.6012171.   20.    Giovanni  Saggio  and  Davide  Borra,  “Augmented  Reality  for  Restoration/Reconstruction  of   Artefacts  with  Artistic  or  Historical  Value”  (informally  published  manuscript,  University  of   Rome,  Italy,  2012),  http://tainguyenso.vnu.edu.vn/jspui/handle/123456789/29953.   21.    Andre  Walsh,  “QR  Codes—Using  Mobile  Phones  to  Deliver  Library  Instruction  and  Help  at  the   Point  of  Need,”  Journal  of  Information  Literacy  4,  no.  1  (2010):  55–63.   22.    Azuma,  “A  Survey  of  Augmented  Reality.”     DO  YOU  BELIEVE  IN  MAGIC?  |  ZAK       46     23.    Telmo  Zarraonandia  et  al.,  “Augmented  Lectures  around  the  Corner?”  British  Journal  of   Educational  Technology  42,  no.  4  (2011):  E76–E78.   24.    Mark  Billinghurst  and  Andreas  Duenser,  “Augmented  Reality  in  the  Classroom,”  Computer  45,   no.  7  (2012):  56–63.   25.    Angela  Di  Serio,  Maria  Blanca  Ibanez,  and  Carlos  Delgado  Kloos,  “Impact  of  an  Augmented   Reality  System  on  Students’  Motivation  for  a  Visual  Art  Course,”  Computers  and  Education  68   (2013):  586–96.   26.    Liv  Valmestad,  “Q(a)r(t)  Code  Public  Art  Project:  A  Convergence  of  Media  and  Mobile   Technology,”  Art  Documentation:  Journal  of  the  Art  Libraries  Society  of  North  America  30,  no.  2   (2011):  70–73.   27.    Claudio  Kirner  et  al.,  “Design  of  a  Cognitive  Artifact  Based  on  Augmented  Reality  to  Support   Multiple  Learning  Approaches,”  Proceedings  of  World  Conference  on  Educational  Multimedia,   Hypermedia  and  Telecommunications  (Denver,  CO:  June  2006).   28.    Deborah  Lee,  “The  2011  Horizon  Report:  Emerging  Technologies,”  Mississippi  Libraries  75,  no.   1  (2012):  7–8.   29.    George  Margetis  et  al.,  “Augmented  Interaction  with  Physical  Books  in  an  Ambient  Intelligence   Learning  Environment,”  Multimedia  Tools  and  Applications  67,  no.  2  (2013):  473–95,   http://dx.doi.org/10.1007/s11042-­‐011-­‐0976-­‐x.   30.    Stefaan  Ternier  and  Fred  De  Vries,  “Mobile  Augmented  Reality  in  Higher  Education.”  Paper   presented  at  the  Learning  in  Context  ’12  Workshop,  Brussels,  Belgium,  March  2012,     http://hdl.handle.net/1820/4219.   31.    Bimber,  Encarnacao,  and  Schmalstieg,  “The  Virtual  Showcase  as  a  New  Platform  for   Augmented  Reality  Digital  Storytelling.”   32.    Alisa  Barry  et  al.,  “Augmented  Reality  in  a  Public  Space:  The  Natural  History  Museum,   London,”  Computer,  45,  no.  7  (2012):  42–47,     http://doi.ieeecomputersociety.org/10.1109/MC.2012.106.   33.    David  Marimon  et  al.,  “Mobiar:  Tourist  Experiences  Through  Mobile  Augmented  Reality.”   Paper  presented  at  the  Networked  and  Electronic  Media  Summit,  Barcelona,  Spain,  2012.   34.    Ann  Morrison  et  al.,  “Collaborative  Use  of  Mobile  Augmented  Reality  with  Paper  Maps,”   Computers  and  Graphics  35,  no.  4  (2011):  789–99.   35.    Yetao  Huang  et  al.,  “Iterative  Design  of  Augmented  Reality  Device  in  Yuanmingyuan  for  Public   Use,”  Vrcai  ’11:  Proceedings  of  the  10th  International  Conference  on  Virtual  Reality  Continuum     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  DECEMBER  2014   47     and  its  Applications  in  Industry,  Hong  Kong,  2011  (New  York:  ACM,  2011),   http://dx.doi.org/10.1145/2087756.2087847.   36.    Azuma,  “A  Survey  of  Augmented  Reality.”   37.    Selim  Balcisoy  and  Daniel  Thalmann,  “Interaction  between  Real  and  Virtual  Humans  in   Augmented  Reality.”  Paper  presented  at  Computer  Animation,  Geneva,  Switzerland,  1997,   http://portal.acm.org/citation.cfm?id=791510.   38.    Enylton  Machado  Coelho,  Blair  MacIntyre,  and  Simon  J.  Julier,  “Supporting  Interaction  in   Augmented  Reality  in  the  Presence  of  Uncertain  Spatial  Knowledge.”  Paper  presented  at  the   Eighteenth  Annual  ACM  Symposium  on  User  Interface  Software  and  Technology,  Seattle,  WA,   October  23–27,  2005,  http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.91.4041.   39.    Brian  X.  Chen,  “If  You’re  Not  Seeing  Data,  You’re  Not  Seeing,”  Wired  (blog),  August  25,  2009,   http://www.wired.com/gadgetlab/2009/08/augmented-­‐reality/  .   40.    Isabel  Pedersen,  “A  Semiotics  of  Human  Actions  for  Wearable  Augmented  Reality  Interfaces,”   Semiotica  155,  no.  1–4  (2005):  183–200.   41.    Ibid.   42.    Balcisoy  and  Thalmann,  “Interaction  Between  Real  and  Virtual  Humans  in  Augmented   Reality.”   43.    Ray  Kurzweil,  “Robots  ’R’  Us,”  Popular  Science  269,  no.  3  (2006):  54–57.   44.    Chen,  “If  You’re  Not  Seeing  Data,  You’re  Not  Seeing.”     45.    Avery  et  al.,  “Evaluation  of  User  Satisfaction  and  Learnability  For  Outdoor  Augmented  Reality   Gaming.”   46.    Gerhard  Reitmayr  and  Dieter  Schmalstieg,  “Location  Based  Applications  for  Mobile   Augmented  Reality”  Paper  presented  at  the  Fourth  Australian  User  Interface  Conference  on   User  Interfaces,  Adelaide,  Austrialia,  2003.   47.    Coelho,  MacIntyre  and  Julier,  “Supporting  Interaction  in  Augmented  Reality  in  the  Presence  of   Uncertain  Spatial  Knowledge.”   48.    Carol  C.  Kuhlthau,  Seeking  Meaning:  A  Process  Approach  to  Library  and  Information  Services   (Westport,  CT:  Libraries  Unlimited,  2004).   49.    Crystal  Fulton,  “Chatman’s  Life  in  the  Round,”  in  Theories  of  Information  Behavior,  ed.  Karen  E.   Fisher,  Sanda  Erdelez,  and  Lynne  McKechnie  (Medford,  NJ:  Information  Today,  2009),  79–82.   50.    Constance  A.  Mellon,  “Library  Anxiety:  A  Grounded  Theory  and  Its  Development,”  College  &   Research  Libraries  47  (1986),  160–65.     DO  YOU  BELIEVE  IN  MAGIC?  |  ZAK       48     51.    Donald  O.  Case,  Looking  for  Information:  A  Survey  of  Research  on  Information  Seeking,  Needs   and  Behavior  (New  York:  Academic  Press,  2008).   52.    Leanne  Bowler  et  al.,  “Issues  in  User-­‐Centered  Design  in  US,”  Library  Trends  59,  no.  4  (2011):   721–52.   53.    Ibid.   54.    Heather  L.  O’Brien  and  Elaine  G.  Toms,  “What  is  User  Engagement?  A  Conceptual  Framework   for  Defining  User  Engagement  with  Technology,”  Journal  of  the  American  Society  For   Information  Science  &  Technology  59,  no.  6  (2008):  938–55.   55.    Liu  Shu,  “Engaging  Users:  The  Future  of  Academic  Library  Web  Sites,”  College  &  Research   Libraries  69,  no.  1  (2008):  6–27.   56.    Djilali  Idoughi,  Ahmed  Seffah,  and  Christophe  Kolski,  “Adding  User  Experience  into  the   Interactive  Service  Design  Loop:  A  Persona-­‐Based  Approach,”  Behaviour  &  Information   Technology  31,  no.  3  (2012):  287–303.   57.    Natalie  Pang  and  Don  Schauder,  “The  Culture  of  Information  Systems  in  Knowledge-­‐Creating   Contexts:  The  Role  of  User-­‐Centred  Design,”  Informing  Science  10  (January  2007):  203–35.     58.    Michelle  L.  Young,  Annette  Bailey,  and  Leslie  O’Brien,  “Designing  a  User-­‐Centric  Catalog   Interface:  A  Discussion  of  the  Implementation  Process  for  Virginia  Tech’s  Integrated  Library   System,”  Virginia  Libraries  53,  no.  4  (2007):  11–15.   59.    Shawn  R.  Wolfe  and  Yi  Zhang,  “User-­‐Centric  Multi-­‐Criteria  Information  Retrieval”  (paper   presented  at  the  32nd  International  ACM  SIGIR  Conference  on  Research  and  Development  in   Information  Retrieval,  Boston,  July  19–23,  2009).   60.    Mary  M.  Somerville  and  Mary  Nino,  “Collaborative  Co-­‐Design:  A  User-­‐Centric  Approach  for   Advancement  of  Organizational  Learning,”  Performance  Measurement  &  Metrics  8,  no.  3   (2007):  180–88.   61.    Tamar  Sadeh,  “User-­‐Centric  Solutions  for  Scholarly  Research  in  the  Library,”  Liber  Quarterly:   The  Journal  of  European  Research  Libraries  17,  no.  1–4  (2007):  253–68.   62.    Bettina  Berendt  and  Anett  Kralisch,  “A  User-­‐Centric  Approach  to  Identifying  Best  Deployment   Strategies  for  Language  Tools:  The  Impact  of  Content  and  Access  Language  on  Web  User   Behaviour  and  Attitudes,”  Information  Retrieval  12,  no.  3  (2009):  380–99.   63.    Gail  Clement  and  Melissa  Levine,  “Copyright  and  Publication  Status  of  Pre-­‐1978  Dissertations:   A  Content  Analysis  Approach,”  portal:  Libraries  and  the  Academy  11,  no.  3  (2011):  813–29,   http://dx.doi.org/10.1353/pla.2011.0032.     64.    Ibid.     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  DECEMBER  2014   49     65.    Eliza  T.  Dresang,  “Radical  Change,”  in  Theories  of  Information  Behavior,  ed.  Karen  E.  Fisher,   Sanda  Erdelez,  and  Lynne  McKechnie  (Medford,  NJ:  Information  Today,  2009),  298–302.   66.    Grace  M.  Jackson-­‐Brown,  “Content  Analysis  Study  of  Librarian  Blogs:  Professional   Development  and  Other  Uses,”  First  Monday  18,  no.  2  (2013):  2.   67     Dahlia  K.  Remler  and  Gregg  G.  Van  Ryzin,  Research  Methods  in  Practice:  Strategies  for   Description  and  Causation  (Los  Angeles:  Sage,  2011).   68.    Kangdon  Lee,  “Augmented  Reality  in  Education  and  Training,”  Techtrends:  Linking  Research   and  Practice  To  Improve  Learning  56,  no.  2  (2012):  13–21.   69.    YFS  Magazine,  “Interview:  Gravity  Jack  CEO,  Luke  Richey  Talks  Industry  Leadership,   Augmented  Reality  and  Why  Cash  Isn’t  King,”  Yfsmagazine.com  (blog),  December  19,  2012,   http://yfsentrepreneur.com/2012/12/19/interview-­‐gravity-­‐jack-­‐ceo-­‐luke-­‐richey-­‐talks-­‐ industry-­‐leadership-­‐augmented-­‐reality-­‐and-­‐why-­‐cash-­‐isnt-­‐king/  .   70.    Carol  Pitts  Diedrichs,  “Discovery  and  Delivery:  Making  it  Work  for  Users,”  Serials  Librarian  56,   no.  1–4  (2009):  79–93.   71.    Baker  Evans,  “The  Ubiquity  of  Mobile  Devices  in  Universities—Usage  and  Expectations,”   Serials  24  (2011):  S11–S16.   72.    Andrew  J.  Flanagin  and  Miriam  J.  Metzger,  “Perceptions  of  Internet  Information  Credibility,”   Journalism  and  Mass  Communication  Quarterly  77,  no.  3  (2000):  515–40.   73.    Ronald  M.  Solorzano,  “Adding  Value  at  the  Desk:  How  Technology  and  User  Expectations  are   Changing  Reference  Work,”  Reference  Librarian  54,  no.  2  (2013):  89–102,   http://dx.doi.org/10.1080/02763877.2013.755398.   74.    Robin  Canuel,  Chad  Crichton,  and  Maria  Savova,  “Tablets  as  Powerful  Tools  for  University   Research,”  Library  Technology  Reports  48,  no.  8  (2012):  35–41.   75.    David  Meyer,  “Telefonica  Bets  on  Augmented  Reality  with  Aurasma  Tie-­‐In,”  Gigaom  (blog),   September  17,  2012,  http://gigaom.com/2012/09/17/telefonica-­‐bets-­‐on-­‐augmented-­‐reality-­‐ with-­‐aurasma-­‐tie-­‐in.   76.    Eliza  T.  Dresang,  “The  Information-­‐Seeking  Behavior  of  Youth  in  the  Digital  Environment,”   Library  Trends  54,  no.  2  (2005):  178–96.   77.    Ibid.   78.    Dave  Rodgerson,  “Experiments  in  Augmented  Reality  Hint  at  its  Potential  for  Retailers,”  Future   of  Retail  Alliance  (blog),  October  5,  2012,  http://www.joinfora.com/experiments-­‐in-­‐ augmented-­‐reality-­‐hint-­‐at-­‐its-­‐potential-­‐for-­‐retailers/.       DO  YOU  BELIEVE  IN  MAGIC?  |  ZAK       50     79.    Wolfgang  Narzt  et  al.,  “Augmented  Reality  Navigation  Systems,”  Universal  Access  in  the   Information  Society  4,  no.  3  (2005):  177–87.   80.    Mito  Akiyoshi  and  Hiroshi  Ono,  “The  Diffusion  of  Mobile  Internet  in  Japan,”  Information   Society  24,  no.  5  (2008):  292–303,  http://dx.doi.org/10.1080/01972240802356067.   81.    Jeffrey  James,  “Institutional  and  Societal  Innovations  in  Information  Technology  for   Developing  Countries,”  Information  Development  28,  no.  3  (2012):  183–88.     82.    Andromeda  Yelton,  “Dispatches  from  the  Field.  Bridging  the  Digital  Gap,”  American  Libraries   43,  no.  1/2  (2012):  30,  http://www.americanlibrariesmagazine.org/article/bridging-­‐digital-­‐ divide-­‐mobile-­‐services.     83.    Dresang,  “The  Information-­‐Seeking  Behavior  of  Youth  in  the  Digital  Environment.”   84.    Marta  J.  Abele,  “Responses  to  Radical  Change:  Children’s  Books  by  Preservice  Teachers”   (doctoral  dissertation,  Capella  University,  Minneapolis,  Minnesota,  2003).     85.    Jacqueline  N.  Glasgow,  “Radical  Change  in  Young  Adult  Literature  Informs  the  Multigenre   Paper,”  The  English  Journal  92,  no.  2  (2002):  41-­‐51,  http://www.jstor.org/stable/822225.     86.    Sylvia  Pantaleo,  “Readers  and  Writers  as  Intertexts:  Exploring  the  Intertextualities  in  Student   Writing,”  Australian  Journal  Of  Language  and  Literacy  29,  no.  2  (2006):  163–81,   http://search.informit.com.au/documentSummary;dn=157093987891049;res=IELHSS.     87.    Tom  D.  Wilson,  “Information  Behavior:  An  Interdisciplinary  Perspective,”  Information   Processing  &  Management  33,  no.  4  (1997):  551–72.   88.    Sanda  Erdelez,  “Information  Encountering:  It’s  More  than  Just  Bumping  into  Information,”   Bulletin  of  the  American  Society  for  Information  Science  &  Technology  25,  no.  3  (1999):  25–29,   http://www.asis.org/Bulletin/Feb-­‐99/erdelez.html.   89.    Marcia  J.  Bates,  “The  Design  of  Browsing  and  Berrypicking  Techniques  for  the  Online  Search   Interface,”  Online  Review  13,  no.  5  (1989):  407–24.   90.    Clement  and  Levine,  “Copyright  and  Publication  Status  of  Pre-­‐1978  Dissertations.”   5664 ---- Microsoft Word - March_ITAL_tharani_TC proofread.docx Linked  Data  in  Libraries:  A  Case  Study     of  Harvesting  and  Sharing  Bibliographic   Metadata  with  BIBFRAME     Karim  Tharani     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  MARCH  2015             5   ABSTRACT   By  way  of  a  case  study,  this  paper  illustrates  and  evaluates  the  Bibliographic  Framework  (or   BIBFRAME)  as  means  for  harvesting  and  sharing  bibliographic  metadata  over  the  web  for  libraries.   BIBFRAME  is  an  emerging  framework  developed  by  the  Library  of  Congress  for  bibliographic   description  based  on  Linked  Data.  Much  like  Semantic  Web,  the  goal  of  Linked  Data  is  to  make  the   web  “data  aware”  and  transform  the  existing  web  of  documents  into  a  web  of  data.  Linked  Data   leverages  the  existing  web  infrastructure  and  allows  linking  and  sharing  of  structured  data  for   human  and  machine  consumption.   The  BIBFRAME  model  attempts  to  contextualize  the  Linked  Data  technology  for  libraries.  Library   applications  and  systems  contain  high-­‐quality  structured  metadata,  but  this  data  is  generally  static   in  its  presentation  and  seldom  integrated  with  other  internal  metadata  sources  or  linked  to  external   web  resources.  With  BIBFRAME  existing  disparate  library  metadata  sources  such  as  catalogs  and   digital  collections  can  be  harvested  and  integrated  over  the  web.  In  addition,  bibliographic  data   enriched  with  Linked  Data  could  offer  richer  navigational  control  and  access  points  for  users.  With   Linked  Data  principles,  metadata  from  libraries  could  also  become  harvestable  by  search  engines,   transforming  dormant  catalogs  and  digital  collections  into  active  knowledge  repositories.  Thus   experimenting  with  Linked  Data  using  existing  bibliographic  metadata  holds  the  potential  to   empower  libraries  to  harness  the  reach  of  commercial  search  engines  to  continuously  discover,   navigate,  and  obtain  new  domain  specific  knowledge  resources  on  the  basis  of  their  verified   metadata.   The  initial  part  of  the  paper  introduces  BIBFRAME  and  discusses  Linked  Data  in  the  context  of   libraries.  The  final  part  of  this  paper  outlines  and  illustrates  a  step-­‐by-­‐step  process  for  implementing   BIBFRAME  with  existing  library  metadata.   INTRODUCTION   Library  applications  and  systems  contain  high-­‐quality  structured  metadata,  but  this  data  is  seldom   integrated  or  linked  with  other  web  resources.  This  is  adequately  illustrated  by  the  nominal   presence  of  library  metadata  on  the  web.1  Libraries  have  much  to  offer  to  the  web  and  its  evolving   future.  Making  library  metadata  harvestable  over  the  web  may  not  only  refine  precision       Karim  Tharani  (karim.tharani@usask.ca)  is  Information  Technology  Librarian  at  the  University   of  Saskatchewan  in  Saskatoon,  Canada.     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  MARCH  2015   6   and  recall  but  has  the  potential  to  empower  libraries  to  harness  the  reach  of  commercial  search   engines  to  continuously  discover,  navigate,  and  obtain  new  domain  specific  knowledge  resources   on  the  basis  of  their  verified  metadata.  This  is  a  novel  and  feasible  idea,  but  its  implementation   requires  libraries  to  both  step  out  of  their  comfort  zones  and  to  step  up  to  the  challenge  of  finding   collaborative  solutions  to  bridge  the  islands  of  information  that  we  have  created  on  the  web  for   our  users  and  ourselves.     By  way  of  a  case  study,  this  paper  illustrates  and  evaluates  the  Bibliographic  Framework  (or   BIBFRAME)  as  means  for  harvesting  and  sharing  bibliographic  metadata  over  the  web  for  libraries.   BIBFRAME  is  an  emerging  framework  developed  under  the  auspices  of  the  Library  of  Congress  to   exert  bibliographic  control  over  traditional  and  web  resources  in  an  increasingly  digital  world.   While  BIBFRAME  has  been  introduced  as  a  potential  replacement  for  MARC  (Machine-­‐Readable   Cataloging)  in  libraries;2  however,  the  goal  of  this  paper  is  to  highlight  the  merits  of  BIBFRAME  as   a  mechanism  for  libraries  to  share  metadata  over  the  web.   BIBFRAME  and  Linked  Data   While  the  impetus  behind  BIBFRAME  may  have  been  replacement  of  MARC,  “it  seems  likely  that   libraries  will  continue  using  MARC  for  years  to  come  because  that  is  what  works  with  available   library  systems.”3  Despite  its  uncertain  future  in  the  cataloging  world,  BIBFRAME  in  its  current   form  provides  fresh  and  insightful  mechanism  for  libraries  to  repackage  and  share  bibliographic   metadata  over  the  web.  BIBFRAME  utilizes  the  Linked  Data  paradigm  for  publishing  and  sharing   data  over  the  web.4  Much  like  Semantic  Web,  the  goal  of  Linked  Data  is  to  make  the  web  “data   aware”  and  transform  the  existing  web  of  documents  into  a  web  of  data.  Linked  Data  utilizes   existing  web  infrastructure  and  allows  linking  and  sharing  of  structured  data  for  human  and   machine  consumption.  In  a  recent  study  to  understand  and  reconcile  various  perspectives  on  the   effectiveness  of  Linked  Data,  the  authors  raise  intriguing  questions  about  the  possibilities  of   leveraging  Linked  Data  for  sharing  library  metadata  over  the  web:     Although  library  metadata  made  the  transition  from  card  catalogs  to  online  catalogs   over  40  years  ago,  and  although  a  primary  source  of  information  in  today’s  world  is  the   Web,  metadata  in  our  OPACs  are  no  more  free  to  interact  on  the  Web  today  than  when   they  were  confined  on  3"  ×  5"  catalog  cards  in  wooden  drawers.  What  if  we  could  set   free  the  bound  elements?  That  is,  what  if  we  could  let  serial  titles,  subjects,  creators,   dates,  places,  and  other  elements,  interact  independently  with  data  on  the  Web  to  which   they  are  related?  What  might  be  the  possibilities  of  a  statement-­‐based,  Linked  Data   environment?  5       LINKED  DATA  IN  LIBRARIES:  A  CASE  STUDY  OF  HARVESTING  AND  SHARING  BIBLIOGRAPHIC  METADATA     WITH  BIBFRAME  |  THARANI       7     Figure  1.  The  BIBFRAME  Model6   BIBFRAME  provides  the  means  for  libraries  to  experiment  with  Linked  Data  to  find  answers  to   these  questions  for  themselves.  This  makes  BIBFRAME  both  daunting  and  delighting   simultaneously.  It  is  daunting  because  it  imposes  a  paradigm  shift  in  how  libraries  have   historically  managed,  exchanged,  and  shared  metadata.  But  embracing  Linked  Data  also  leads  to  a   promise  land  where  metadata  within  and  among  libraries  can  be  exchanged  seamlessly  and   economically  over  the  web.  BIBFRAME  (http://bibframe.org)  consists  of  a  model  and  a  vocabulary   set  specifically  designed  for  bibliographic  control.7  The  model  identifies  four  main  classes,  namely,   Work,  Instance,  Authority,  and  Annotation  (see  figure  1).  For  each  of  these  classes,  there  are  many   hierarchical  attributes  that  help  in  describing  and  linking  instantiations  of  these  classes.  These   properties  are  collectively  called  the  BIBFRAME  vocabulary.     Philosophically,  Linked  Data  is  based  on  the  premise  that  more  links  among  resources  will  lead  to   better  contextualization  and  credibility  of  resources,  which  in  turn  will  help  in  filtering  irrelevant   resources  and  discovering  new  and  meaningful  resources.  At  a  more  practical  level,  Linked  Data   provides  a  simple  mechanism  to  make  connections  among  pieces  of  information  or  resources  over   the  web.  More  specifically,  it  not  only  allows  humans  to  make  use  of  these  links  but  also  machines   to  do  so  without  human  intervention.  This  may  sound  eerie,  but  one  has  to  understand  the  history   behind  the  origin  of  Linked  Data  not  to  think  of  this  as  yet  another  conspiracy  for  machines  to  take   over  the  World  (Wide  Web).     In  1994  Tim  Berners-­‐Lee,  the  inventor  of  the  web,  put  forth  his  vision  of  the  Semantic  Web  as  a   “Web  of  actionable  information—information  derived  from  data  through  a  Semantic  theory  for     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  MARCH  2015   8   interpreting  the  symbols.  The  Semantic  theory  provides  an  account  of  ‘meaning’  in  which  the   logical  connection  of  terms  establishes  interoperability  between  systems.”8  While  the  idea  of   Semantic  Web  has  not  been  fully  realized  for  a  variety  of  functional  and  technical  reasons,  the   notion  of  Linked  Data  introduced  subsequently  has  made  the  concept  much  more  accessible  and   feasible  for  a  wider  application.9  Once  again,  it  was  Tim  Berners-­‐Lee  who  put  forth  the  ground   rules  for  publishing  data  on  the  web  that  are  now  known  as  the  Linked  Data  Principles.10  These   principles  advocate  using  standard  mechanisms  for  naming  each  resource  and  their  relationships   with  unique  Universal  Resource  Identifiers  (URIs);  making  use  of  the  existing  web  infrastructure   for  connecting  resources;  and  using  Resource  Description  Framework  (RDF)  for  documenting  and   sharing  resources  and  their  relationships.     A  URI  serves  as  a  persistent  name  or  handle  for  a  resource  and  is  ideally  independent  of  the   underlying  location  and  technology  of  the  resource.  Although  often  used  interchangeably,  a  URI  is   different  from  a  URL  (or  Universal  Resource  Locator),  which  is  a  more  commonly  used  term  for   web  resources.  A  URL  is  a  special  type  of  URI,  which  points  to  the  actual  location  (or  the  web   address)  of  a  resource,  including  the  file  name  and  extension  (such  as  .html  or  .php)  of  a  web   resource.  Being  more  generic,  the  use  of  URIs  (as  opposed  to  URLs)  in  Linked  Data  provides   persistency  and  flexibility  of  not  having  to  change  the  names  and  references  every  time  resources   are  relocated  or  there  is  a  change  in  server  technology.  For  example  if  an  organization  switches  its   underlying  web-­‐scripting  technology  from  Active  Server  Pages  (ASP)  to  Java  Server  Pages  (JSP),  all   the  files  on  a  web  server  will  bear  a  different  extension  (e.g.,  .jsp)  causing  all  previous  URLs  with   old  extension  (e.g.,  .asp)  to  become  invalid.  This  technology  change,  however,  may  have  no  impact   if  URIs  are  used  instead  of  URLs  because  the  underlying  implementation  and  location  details  for  a   resource  are  masked  from  the  public.  Thus  the  URI  naming  scheme  within  an  organization  must   be  developed  independent  of  the  underlying  technology.  There  are  diverse  best  practices  on  how   to  name  URIs  to  promote  usability,  longevity,  and  persistence.11  The  most  important  factors,   however,  remain  the  purpose  and  the  context  for  which  the  resources  are  being  harvested  and   shared.     Use  of  RDF  is  also  a  requirement  of  using  Linked  Data  for  sharing  data  over  the  web.  Much  like   how  HTML  (Hypertext  Markup  Language)  is  used  to  create  and  publish  documents  over  the  web,   RDF  is  used  to  create  and  publish  Linked  Data  over  the  web.  The  format  of  RDF  is  very  simple  and   makes  use  of  three  fundamental  elements,  namely,  subject,  predicate,  and  object.  Similar  to  the   structure  of  a  basic  sentence,  the  three  elements  make  up  the  unit  of  description  of  a  resource   known  as  a  triple  in  the  RDF  terminology.  Unsurprisingly,  RDF  requires  all  three  elements  to  be   denoted  by  URIs  with  the  exception  of  the  object,  which  may  also  be  represented  by  constant   values  such  as  a  dates,  strings,  or  numbers.12  As  an  example,  consider  the  work  Divine  Comedy.  The   fact  this  work,  also  known  as  Divina  Commedia,  was  created  by  Dante  Alighieri  can  be  represented   by  the  following  two  triples  (using  N-­‐triples  format):       LINKED  DATA  IN  LIBRARIES:  A  CASE  STUDY  OF  HARVESTING  AND  SHARING  BIBLIOGRAPHIC  METADATA     WITH  BIBFRAME  |  THARANI       9              .         “Divina  Commedia”.   In  the  first  triple  of  this  example,  the  work  Divine  Comedy  (subject)  is  being  attributed  to  a  person   called  Dante  Alighieria  (object)  as  the  creator  (predicate).  In  the  second  triple  the  use  of  sameAs   predicate  asserts  that  both  Divine  Comedy  and  Divina  Commedia  refer  to  the  same  resource.  Thus   using  URIs  makes  the  resources  and  relationships  persistent  whereas  use  of  RDF  makes  the   format  discernible  by  humans  and  machines.  This  seemingly  simple  idea  allows  data  to  be   captured,  formatted,  shared,  transmitted,  received,  and  decoded  over  the  web.  Use  of  the  existing   web  protocol  (HTTP  or  Hypertext  Transfer  Protocol)  for  exchanging  and  integrating  data  saves   the  overhead  of  putting  additional  agreements  and  infrastructure  in  place  among  parties  willing   or  wishing  to  exchange  data.  This  ease  and  freedom  to  define  relationships  among  resources  over   the  web  also  makes  it  possible  for  disparate  data  sources  to  interact  and  integrate  with  each  other   openly  and  free  of  cost.     Why  is  this  seemingly  simple  idea  so  significant  for  the  future  of  the  web?  From  a  functional   perspective,  what  this  means  is  that  Linked  Data  facilitates  “using  the  Web  to  create  typed  links   between  data  from  different  sources.  These  may  be  as  diverse  as  databases  maintained  by  two   organisations  in  different  geographical  locations,  or  simply  heterogeneous  systems  within  one   organisation  that,  historically,  have  not  easily  interoperated  at  the  data  level.”13  The  notion  of   typed  linking  refers  to  the  facility  and  freedom  of  being  able  to  have  and  name  multiple   relationships  among  resources.  From  a  technical  point  of  view,  “Linked  Data  refers  to  data   published  on  the  Web  in  such  a  way  that  it  is  machine-­‐readable,  its  meaning  is  explicitly  defined,  it   is  linked  to  other  external  data  sets,  and  can  in  turn  be  linked  to  from  external  data  sets.”14  In  a   traditional  database,  relationships  between  entities  or  resources  are  predefined  by  virtue  of  tables   and  column  names.  Moreover,  data  in  such  databases  become  part  of  the  Deep  Web  and  not   readily  accessed  or  indexed  by  search  engines.  15   The  use  of  URIs  to  name  relationships  allows  data  sources  to  establish,  use,  and  reuse   vocabularies  to  define  relationships  between  existing  resources.  These  names  or  vocabularies,   much  like  the  resources  they  describe,  have  their  own  dedicated  URIs,  making  it  possible  for   resources  to  form  long-­‐term  and  reliable  relationships  with  each  other.  If  resources  and   relationships  have  and  retain  their  identities  by  virtue  of  their  URIs,  then  links  between  resources   add  to  the  awareness  of  these  resources  both  for  humans  and  machines.  This  is  a  key  concept  in   realizing  the  overall  mission  of  Linked  Data  to  imbue  data  awareness  and  transforming  the   existing  web  of  documents  into  a  web  of  data.  Consequently  various  institutions  and  industries     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  MARCH  2015   10   have  established  standard  vocabularies  and  made  them  available  for  others  to  use  with  their  data.   For  example,  the  Library  of  Congress  has  published  its  subject  headings  as  Linked  Data.  The   impetus  behind  this  gesture  is  that  if  data  from  multiple  organizations  is  “typed  link”  using  LCSH   (Library  of  Congress  Subject  Headings)  with  Linked  Data,  then  libraries  and  others  gain  the  ability   to  categorize,  collocate,  and  integrate  data  from  disparate  systems  over  the  Web  by  virtue  of  using   a  common  vocabulary.  As  more  and  more  resources  link  to  each  other  through  established  and   reusable  vocabularies,  the  more  data  aware  the  Web  becomes.  Recognizing  this  opportunity,  the   Library  of  Congress  has  also  developed  and  shared  its  vocabulary  for  bibliographic  control  as  part   of  the  BIBFRAME  framework.16     Implementing  BIBFRAME  to  Harvest  and  Share  Bibliographic  Metadata   Nowadays,  systems  like  catalogs  and  digital  collection  repositories  are  commonplace  in  libraries,   but  these  source  systems  often  operate  as  islands  of  data  both  within  and  across  libraries.  The   goal  of  this  case  study  is  to  explore  and  evaluate  BIBFRAME  as  a  viable  approach  for  libraries  to   integrate  and  share  disparate  metadata  over  the  web.  As  discussed  above,  the  BIBFRAME  model   attempts  to  contextualize  the  use  of  Linked  Data  for  libraries  and  provides  a  conceptual  model  and   underlying  vocabulary  to  do  so.  To  this  end,  a  unique  collection  of  Ismaili  Muslim  community  was   identified  for  the  case  study.  The  collection  is  physically  housed  at  the  Harvard  University  Library   (HUL)  and  the  metadata  for  the  collection  is  dispersed  across  multiple  systems  within  the  library.   An  additional  objective  of  this  case  study  has  been  to  define  concrete  and  replicable  steps  for   libraries  to  implement  BIBFRAME.  The  discussion  below  is  therefore  presented  in  a  step-­‐by-­‐step   format  for  harvesting  and  sharing  bibliographic  metadata  over  the  web.     1. Establishing  a  Purpose  for  Harvesting  Metadata   The  Harvard  Collection  of  Ismaili  Literature  is  first  of  its  kind  in  North  America.  “The  most   important  genre  represented  in  the  collection  is  that  of  the  ginans,  or  the  approximately  one   thousand  hymn-­‐like  poems  written  in  an  assortment  of  Indian  languages  and  dialects.”17  The   feasibility  of  BIBFRAME  was  explored  in  this  case  study  by  creating  a  thematic  research  collection   of  ginans  by  harvesting  existing  bibliographic  metadata  at  HUL.  The  purpose  of  this  thematic   research  collection  is  to  make  ginans  accessible  to  researchers  and  scholars  for  textual  criticism.   Historically  libraries  have  played  a  vital  role  in  making  extant  manuscripts  and  other  primary   sources  accessible  to  scholars  for  textual  criticism.  The  need  for  having  such  a  collection  in  place   for  ginans  was  identified  by  Dr.  Ali  Asani,  professor  of  Indo-­‐Muslim  and  Islamic  Religion  and   Cultures  at  Harvard  University:     Perhaps  the  greatest  obstacle  for  further  studies  on  the  ginan  literature  is  the  almost   total  absence  of  any  kind  of  textual  criticism  on  the  literature.  Thus  far  merely  two  out  of   the  nearly  one  thousand  compositions  have  been  critically  edited.  Naturally,  the   availability  of  reliably  edited  texts  is  fundamental  to  any  substantial  scholarship  in  this   field.  .  .  .  For  the  scholar  of  post-­‐classical  Ismaili  literature,  recourse  to  this  kind  of     LINKED  DATA  IN  LIBRARIES:  A  CASE  STUDY  OF  HARVESTING  AND  SHARING  BIBLIOGRAPHIC  METADATA     WITH  BIBFRAME  |  THARANI       11   material  has  become  especially  critical  with  the  growing  awareness  that  there  exist   significant  discrepancies  between  modern  printed  versions  of  several  ginans  and  their   original  manuscript  form.  Fortunately,  the  Harvard  collection  is  particularly  strong  in  its   holdings  of  a  large  number  of  first  editions  of  printed  ginan  texts—a  strength  that  should   greatly  facilitate  comparisons  between  recensions  of  ginans  and  the  preparation  of   critical  editions.18   2. Modeling  the  Data  to  Fulfill  Functional  Requirements   Historically,  the  physicality  of  resources  such  as  book  or  compact  disc  has  dictated  what  is   described  in  library  catalogs  and  to  what  extent.  The  issue  of  cataloging  serials  and  other  works   embedded  within  larger  works  has  always  been  challenging  for  catalogers.  For  this  case  study  as   well,  one  of  the  major  implementation  decisions  revolved  around  the  granularity  of  defining  a   work.  Designating  each  ginan  as  a  work  (rather  than  a  manuscript  or  lithograph)  was  perhaps  an   unconventional  decision,  but  one  that  was  highly  appropriate  for  the  purpose  of  the  collection.   Thus  there  was  a  conscious  and  genuine  effort  to  liberate  a  work  from  the  confines  of  its  carriers.   Fortuitously,  BIBFRAME  does  not  shy  away  from  this  challenge  and  accommodates  embedded  and   hierarchal  works  in  its  logical  model.  But  BIBFRAME,  like  any  other  conceptual  model,  only   provides  a  starting  point,  which  needs  to  be  adapted  and  implemented  for  individual  project   needs.       Figure  2.  Excerpt  of  Project  Data  Model     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  MARCH  2015   12   The  data  model  for  this  case  study  (see  figure  2)  was  designed  to  balance  the  need  to   accommodate  bibliographic  metadata  with  the  demands  of  Linked  Data  paradigm.  Central  to  the   project  data  model  is  the  resources  table  where  information  on  all  resources  along  with  their  URIs   and  categories  (work,  instance,  etc.)  are  stored.  Resources  relate  to  each  other  with  use  of   predicates  table,  which  captures  relevant  and  applicable  vocabularies.  The  namespace  table  keeps   track  of  all  the  set  of  vocabularies  being  used  for  the  project.  In  the  triples  table,  resources  are   typed  linked  using  appropriate  predicates.  Once  the  data  model  for  the  project  was  finalized,  a   database  was  created  using  MySQL  to  house  the  project  data.   3. Planning  the  URI  Scheme     In  general  the  URI  scheme  for  this  case  study  conformed  to  the  following  intuitive  nomenclature:   .     This  URI  naming  scheme  ensures  that  a  URI  assigned  to  a  resource  depends  on  its  class  and   category  (see  table  1).  While  it  may  be  customary  to  use  textual  identifiers  in  the  URIs,  the  project   used  numeric  identifiers  to  account  for  the  fact  that  most  of  the  ginans  (works)  are  untitled  and   transliterated  into  English  from  various  Indic  languages.  Generally  support  for  using  URIs  is  either   already  built-­‐in  or  added  on  depending  on  the  server  technology  being  used.  This  case  study   utilized  the  LAMP  (Linux,  Apache,  MySQL,  and  PHP)  technology  stack,  and  the  URI  handler  for  the   project  was  added  on  to  the  Apache  webserver  using  URL-­‐rewriting  (or  mod_rewrite)  facility.19     Resource  Types   BIBFRAME  Category   URI  Example   Organizations   Annotation   http://domain.com/organization/1   Collections   Annotation   http://domain.com/collection/1   Items   Instance   http://domain.com/item/1     Ginan   Work   http://domain.com/ginan/1     Subjects   Authority   http://domain.com/subject/1     Table  1.  URI  Naming  Scheme  and  Examples   4. Using  Standard  Vocabularies     BIBFRAME  provides  the  relevant  vocabulary  and  the  underlying  URIs  to  implement  Linked  Data   with  bibliographic  data  in  libraries.  While  not  all  attributes  may  be  applicable  or  used  in  a  project,   the  ones  that  are  identified  as  relevant  must  be  referenced  with  their  rightful  URI.  For  example,   the  predicate  hasAuthority  from  BIBFRAME  has  a  persistent  URI   (http://bibframe.org/vocab/hasAuthority)  enabling  humans  as  well  as  machines  to  access  and   decode  the  purpose  and  scope  of  this  predicate.  Other  vocabulary  sets  or  namespaces  commonly   used  with  Linked  Data  include  Resource  Description  Frameowrk  (RDF),  Web  Ontology  Language   (OWL),  Friend  of  a  Friend  (FOAF),  etc.  In  rare  circumstances,  libraries  may  also  choose  to  publish   their  own  specific  vocabulary.  For  example,  any  unique  predicates  for  this  case  study  could  be     LINKED  DATA  IN  LIBRARIES:  A  CASE  STUDY  OF  HARVESTING  AND  SHARING  BIBLIOGRAPHIC  METADATA     WITH  BIBFRAME  |  THARANI       13   defined  and  published  using  the  http://domain.com/vocab  namespace.   5. Identifying  Data  Sources     The  bibliographic  metadata  used  for  this  case  study  was  obtained  from  within  HUL.  As  mentioned   above,  the  data  pertained  to  a  unique  collection  of  religious  literature  belonging  to  the  Ismaili   Muslim  community  of  the  Indian  subcontinent.  This  collection  was  acquired  by  the  Middle  Eastern   Department  of  the  Harvard  College  Library  in  1980.  The  collection  comprises  28  manuscripts,  81   printed  books,  and  11  lithographs.  In  1992,  a  book  on  the  contents  of  this  collection  was  published   in  1992  by  Dr.  Asani  and  was  titled  The  Harvard  Collection  of  Ismaili  Literature  in  Indic  Languages:   A  Descriptive  Catalog  and  Finding  Aid.  The  indexes  in  the  book  served  as  one  of  the  sources  of  data   for  this  case  study.     Subsequent  to  the  publication  of  the  book,  the  Harvard  Collection  of  Ismaili  Literature  was  also   made  available  through  Harvard’s  OPAC  (online  public  access  catalog)  called  HOLLIS  (see  figure  3).   The  catalog  records  were  also  obtained  from  the  library  for  the  case  study.  Some  of  the  120  items   from  the  collection  were  subsequently  digitized  and  shared  as  part  of  the  Harvard’s  Islamic   Heritage  Project.  The  digital  surrogates  of  these  items  were  shared  through  the  Harvard   University  Library  Open  Collections  Program.  and  the  library  catalog  records  were  also  updated  to   provide       Figure  3.  HOLLIS:  Harvard  University  Library’s  OPAC   direct  access  to  the  digital  copies  where  available.  Additional  metadata  for  the  digitized  items  was   also  developed  by  the  library  to  facilitate  open  digital  access  through  Harvard  Library’s  Page   Delivery  Service  (PDS)  to  provide  page-­‐turning  navigational  interface  for  scanned  page  images   over  the  web.  Data  from  all  these  sources  was  leveraged  for  the  case  study.       INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  MARCH  2015   14     6. Transforming  Source  Metadata  for  Reuse   ETL  (Extract,  Transform,  and  Load)  is  an  acronym  commonly  used  to  refer  to  the  steps  needed  to   populate  a  target  database  by  moving  data  from  multiple  and  disparate  source  systems.  Extraction   is  the  process  of  getting  the  data  out  of  the  identified  source  systems  and  making  it  available  for   the  exclusive  use  of  the  new  database  being  designed.  In  the  context  of  the  library  realm,  this  may   mean  getting  MARC  records  out  from  a  catalog  or  getting  descriptive  and  administrative  metadata   out  of  a  digital  repository.  Format  in  which  data  is  extracted  out  of  a  source  system  is  also  an   important  aspect  of  the  data  extraction  process.  Use  of  XML  (Extensible  Markup  Language)  format   is  fairly  common  nowadays  as  most  library  source  systems  have  built-­‐in  functionality  to  export   data  into  a  recognized  XML  standard  such  as  MARCXML  (MARC  data  encoded  in  XML),  MODS   (Metadata  Object  Description  Schema),  METS  (Metadata  Encoding  and  Transmission  Standard),   etc.  In  certain  circumstances,  data  may  be  extracted  using  CSV  (comma-­‐separated  values)  format.   Transformation  is  the  step  in  which  data  from  one  or  more  source  systems  is  massaged  and   prepared  to  be  loaded  to  a  new  database.  The  design  of  the  new  database  often  enforces  new  ways   of  organizing  source  data.  The  transformation  process  is  responsible  to  make  sure  that  the  data   from  all  source  systems  is  integrated  while  retaining  its  integrity  before  being  loaded  to  the  new   database.  A  simplistic  example  of  data  transformation  may  be  that  the  new  system  may  require   authors’  first  and  last  names  to  be  stored  in  separate  fields  rather  than  in  a  single  field.  How  such   transformations  are  automated  will  depend  on  the  format  of  the  source  data  as  well  as  the   infrastructure  and  programming  skills  available  within  an  organization.  Since  XML  is  becoming   the  de  facto  standard  for  most  data  exchange,  use  of  XSLT  (Extensible  Stylesheet  Language   Transformations)  scripts  is  common.  With  XSLT,  data  in  XML  format  can  be  manipulated  and   given  different  structure  to  aid  in  the  transformation  process.     The  loading  process  is  responsible  for  populating  the  newly  minted  database  once  all   transformations  have  been  applied.  One  of  the  major  considerations  in  this  process  is  maintaining   the  referential  integrity  of  the  data  by  observing  the  constraints  dictated  by  the  data  model.  This  is   achieved  by  making  sure  that  records  are  correctly  linked  to  each  other  and  are  loaded  in  proper   sequence.  For  instance,  to  ensure  referential  integrity  of  items  and  their  annotations,  it  may  be   necessary  to  load  the  items  first  and  then  the  annotations  with  correct  reference  to  the  associated   item  identifiers.   For  this  case  study,  records  from  source  systems  were  obtained  in  MARCXML  and  METS  formats,   and  specific  scripts  were  developed  to  extract  desired  elements  and  transform  them  into  the   required  format.  A  somewhat  unconventional  mechanism  was  used  to  capture  and  reuse  the  data   from  Dr.  Asani’s  book,  which  was  only  available  in  print.  The  entire  book  was  scanned  and   processed  by  an  OCR  (Optical  Character  Recognition)  tool  to  glean  various  data  elements.  Once  the   data  was  cleaned  and  verified,  the  information  was  transformed  into  a  CSV  data  file  to  facilitate     LINKED  DATA  IN  LIBRARIES:  A  CASE  STUDY  OF  HARVESTING  AND  SHARING  BIBLIOGRAPHIC  METADATA     WITH  BIBFRAME  |  THARANI       15   database  loading.   7. Generating  RDF  Triples   The  RDF  triples  can  be  written  or  serialized  using  a  variety  of  formats  such  as  Turtle,  N-­‐Triples,   JSON,  as  well  as  RDF/XML,  among  others.  The  traditional  RDF/XML  format,  which  was  the  first   standard  to  be  recommended  for  RDF  serialization  by  the  World  Wide  Web  Consortium  (W3C),   was  used  for  this  case  study  (see  figure  4).  The  format  was  chosen  for  its  modularity  in  preserving   the  context  of  resources  and  their  relationships  as  well  as  its  readability  for  humans.  Generating   RDF  may  be  a  simple  act  if  the  data  is  already  stored  in  a  triplestore,  which  is  a  database   specifically  designed  to  store  RDF  data.  But  given  that  this  project  was  implemented  using  a   relational  database  management  system  (RDBMS),  i.e.,  MySQL,  the  programming  effort  to  generate   RDF  data  was  complex.  The  complications  arose  in  identifying  and  tracking  the  hierarchical  nature   of  the  RDF  data,  especially  in  the  chosen  serialization  format.  Several  server-­‐side  scripts  were   developed  to  aid  in  discerning  the  relationships  among  resources  and  formatting  them  to  generate   triples.  In  hindsight  generating  triples  would  have  been  easier  using  the  N-­‐triples  serialization  but   that  would  have  also  required  more  complex  programming  for  rebuilding  the  context  for  the  user   interface  design.   Figure  4.  A  Sample  of  Triples  Serialized  for  the  Project   8. Formatting  RDF  Triples  for  Human  and  Machine  Consumption   The  raw  RDF  data  is  sufficient  for  machines  to  parse  and  process,  but  humans  typically  require   intuitive  user  interface  to  contextualize  triples.  In  this  case  study,  XSL  was  extensively  used  for   formatting  the  triples.  While  XSLT  and  XSL  (Extensible  Stylesheet  Language)  are  intricately   related,  they  serve  different  purposes.  XSLT  is  a  scripting  language  to  manipulate  XML  data   whereas  XSL  is  a  formatting  specification  used  in  presentation  of  XML,  much  like  how  CSS   (Cascading  Style  Sheets)  are  used  for  presenting  HTML.  A  special  routing  script  was  also   developed  to  detect  whether  the  request  for  data  was  intended  for  machine  or  human   consumption.  For  machine  requests,  the  triples  were  served  unformatted  whereas  for  human   requests,  the  triples  were  formatted  to  display  in  HTML.       INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  MARCH  2015   16     Figure  5.  Formatted  Triples  for  Human  Consumption   DISCUSSION   Models  are  tools  of  communicating  simple  and  complex  relations  between  objects  and  entities  of   interest.  Effectiveness  of  any  model  is  often  realized  during  implementation  when  the  theoretical   constructs  of  the  models  are  put  to  test.  The  challenge  faced  by  BIBFRAME,  like  any  new  model,  is   to  establish  its  worthiness  in  the  face  of  the  existing  legacy  of  MARC.  The  existing  hold  of  MARC  in   libraries  is  so  strong  that  it  may  take  several  years  for  BIBFRAME  to  be  in  a  position  to  challenge   the  status  quo.  Historically  bibliographic  practices  in  libraries  such  as  describing,  classifying,  and   cataloging  resources  have  primarily  catered  to  tangible,  print-­‐based  knowledge  carriers  such  as   books  and  journals.20  BIBFRAME  challenges  libraries  to  revisit  and  refresh  their  traditional  notion   of  text  and  textuality.   Although  initially  introduced  as  a  replacement  for  MARC,  BIBFRAME  is  far  from  being  an  either-­‐or   proposition  given  the  MARC  legacy.  Nevertheless,  BIBFRAME  has  made  Linked  Data  paradigm   much  more  accessible  and  practical  for  libraries.  Rather  than  perceiving  BIBFRAME  as  a  threat  to   existing  cataloging  praxis,  it  may  be  useful  for  libraries  to  allow  BIBFRAME  to  coexist  within  the   current  cataloging  landscape  as  a  means  for  sharing  bibliographic  data  over  the  web.  Libraries   maintain  and  provide  authentic  metadata  about  knowledge  resources  for  their  users  based  on   internationally  recognized  standards.  This  high  quality  structured  metadata  from  library  catalogs   and  other  systems  can  be  leveraged  and  repurposed  to  fulfill  unmet  and  emerging  needs  of  users.   With  Linked  Data,  library  metadata  could  become  readily  harvestable  by  search  engines,   transforming  dormant  catalogs  and  collections  into  active  knowledge  repositories.   In  this  case  study  seemingly  disparate  library  systems  and  data  were  integrated  to  provide  a   unified  and  enabling  access  to  create  a  thematic  research  collection.  It  is  also  possible  to  create   such  purpose-­‐specific  digital  libraries  and  collections  as  part  of  library  operations  without  having   to  acquire  additional  hardware  and  commercial  software.  It  was  also  evident  from  this  case  study   that  digital  libraries  built  using  BIBFRAME  offer  superior  navigational  control  and  access  points     LINKED  DATA  IN  LIBRARIES:  A  CASE  STUDY  OF  HARVESTING  AND  SHARING  BIBLIOGRAPHIC  METADATA     WITH  BIBFRAME  |  THARANI       17   for  users  to  actively  interact  with  bibliographic  data.  Any  Linked  Data  predicate  has  the  potential   to  become  an  access  point  and  act  as  a  pivot  to  provide  insightful  view  of  the  underlying   bibliographic  records  (see  figure  6).  With  advances  in  digital  technologies  “richer  interaction  is   possible  within  the  digital  environment  not  only  as  more  content  is  put  within  reach  of  the  user,   but  also  as  more  tools  and  services  are  put  directly  in  the  hands  of  the  user.”21  Developing  capacity   to  effectively  respond  to  the  informational  needs  of  users  is  part  and  parcel  of  libraries’   professional  and  operational  responsibilities.  With  the  ubiquity  of  the  web  and  increased  reliance   of  users  on  digital  resources,  libraries  must  constantly  reevaluate  and  reimagine  their  services  to   remain  responsive  and  relevant  to  their  users.       Figure  6.  Increased  Navigational  Options  with  Linked  Data   CONCLUSION   Just  as  libraries  rely  on  vendors  to  develop,  store,  and  share  metadata  for  commercial  books  and   journals,  similar  metadata  partnerships  need  to  be  put  in  place  across  libraries.  The  benefits  and   implications  of  establishing  such  a  collaborative  metadata  supply  chain  are  far  reaching  and  can   also  accommodate  cultural  and  indigenous  resources.  Library  digital  collections  typically   showcase  resources  that  are  unique  and  rare,  and  the  metadata  to  make  these  collections   accessible  must  be  shared  over  the  web  as  part  of  library  service.     As  the  amount  of  data  on  the  web  proliferates,  users  find  it  more  and  more  difficult  to  differentiate   between  credible  knowledge  resources  and  other  resources.  BIBFRAME  has  the  potential  to   address  many  of  the  issues  that  plague  the  web  from  a  library  and  information  science  perspective,   including  precise  search,  authority  control,  classification,  data  portability,  and  disambiguation.   Most  popular  search  engines  like  Google  are  gearing  up  to  automatically  index  and  collocate   disparate  resources  using  Linked  Data.22  Libraries  are  particularly  well  positioned  to  realize  this   goal  with  their  expertise  in  search,  metadata  generation,  and  ontology  development.  This  research   looks  forward  to  further  initiatives  by  libraries  to  become  more  responsive  and  make  library     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  MARCH  2015   18   resources  more  relevant  to  the  knowledge  creation  process.     REFERENCES     1.     Tim  F.  Knight,  “Break  On  Through  to  the  Other  Side:  The  Library  and  Linked  Data,”  TALL   Quarterly  30,  no.  1  (2011):  1–7,  http://hdl.handle.net/10315/6760.   2.     Eric  Miller  et  al.,  “Bibliographic  Framework  as  a  Web  of  Data:  Linked  Data  Model  and   Supporting  Services,”  November  11,  2012,  http://www.loc.gov/bibframe/pdf/marcld-­‐report-­‐ 11-­‐21-­‐2012.pdf.   3.     Angela  Kroeger,  “The  Road  to  BIBFRAME:  The  Evolution  of  the  Idea  of  Bibliographic   Transition  into  a  Post-­‐MARC  Future,”  Cataloging  &  Classification  Quarterly  51,  no.  8  (2013):   873–89,  http://dx.doi.org/10.1080/01639374.2013.823584.   4.     Eric  Miller  et  al.,  “Bibliographic  Framework  as  a  Web  of  Data:  Linked  Data  Model  and   Supporting  Services,”  November  11,  2012,  http://www.loc.gov/bibframe/pdf/marcld-­‐report-­‐ 11-­‐21-­‐2012.pdf.   5.     Nancy  Fallgren  et  al.,  “The  Missing  Link:  The  Evolving  Current  State  of  Linked  Data  for  Serials,”   Serials  Librarian  66,  no.  1–4  (2014):  123–38,   http://dx.doi.org/10.1080/0361526X.2014.879690.   6.     The  figure  has  been  adapted  from  Eric  Miller  et  al.,  “Bibliographic  Framework  as  a  Web  of   Data:  Linked  Data  Model  and  Supporting  Services,”  November  11,  2012,   http://www.loc.gov/bibframe/pdf/marcld-­‐report-­‐11-­‐21-­‐2012.pdf.   7.     “Bibliographic  Framework  Initiative  Project,”  Library  of  Congress,  accessed  August  15,  2014,   http://www.loc.gov/bibframe.   8.     Nigel  Shadbolt,  Wendy  Hall,  and  Tim  Berners-­‐Lee,  “The  Semantic  Web  Revisited,”  Intelligent   Systems  21  no.  3  (2006):  96–101,  http://dx.doi.org/10.1109/MIS.2006.62.   9.     Sören  Auer  et  al.,  “Introduction  to  Linked  Data  and  Its  Lifecycle  on  the  Web,”  in  Reasoning   Web:  Semantic  Technologies  for  Intelligent  Data  Access,  edited  by  Sebastian  Rudolph  et  al.,  1– 90  (Heidelberg:  Springer,  2011),  http://dx.doi.org/10.1007/978-­‐3-­‐642-­‐23032-­‐5_1.   10.    Tim  Berners-­‐Lee,  “Linked  Data,”  Design  Issues,  last  modified  June  18,  2009,   http://www.w3.org/DesignIssues/LinkedData.html.   11.    Danny  Ayers  and  Max  Völkel,  “Cool  URIs  for  the  Semantic  Web,”  World  Wide  Web  Consortium   (W3C),  last  modified  March  31,  2008,  http://www.w3.org/TR/cooluris.   12.    Tom  Heath  and  Christian  Bizer,  Linked  Data:  Evolving  the  Web  into  a  Global  Data  Space   (Morgan  &  Claypool,  2011),  http://dx.doi.org/10.2200/S00334ED1V01Y201102WBE001.     LINKED  DATA  IN  LIBRARIES:  A  CASE  STUDY  OF  HARVESTING  AND  SHARING  BIBLIOGRAPHIC  METADATA     WITH  BIBFRAME  |  THARANI       19     13.    Christian  Bizer,  Tom  Heath,  and  Tim  Berners-­‐Lee,  “Linked  Data—The  Story  So  Far,”   International  Journal  on  Semantic  Web  and  Information  Systems  5,  no.  3  (2009):  1–22,   http://dx.doi.org/10.4018/jswis.2009081901.   14.    Ibid.     15.    Tony  Boston,  “Exposing  the  Deep  Web  to  Increase  Access  to  Library  Collections”  (paper   presented  at  the  AusWeb05,  The  Twelfth  Australasian  World  Wide  Web  Conference,   Queensland,  Australia,  2005),   http://www.nla.gov.au/openpublish/index.php/nlasp/article/view/1224/1509.   16.      “Bibliographic  Framework  Initiative,”  BIBFRAME.ORG,  accessed  August  15,  2014,     http://bibframe.org/vocab;  “Bibliographic  Framework  Initiative  Project,”  Library  of  Congress,   accessed  August  15,  2014,  http://www.loc.gov/bibframe.   17.    Ali  Asani,  The  Harvard  Collection  Ismaili  Literature  in  Indic  Languages:  A  Descriptive  Catalog   and  Finding  Aid  (Boston:  G.K.  Hall,  1992).   18.    Ibid.   19.    Ralf  S.  Engelschall,  “URL  Rewriting  Guide,”  Apache  HTTP  Server  Documentation,  last  modified   December,  1997,  http://httpd.apache.org/docs/2.0/misc/rewriteguide.html.   20.    Yann  Nicolas,  “Folklore  Requirements  for  Bibliographic  Records:  Oral  Traditions  and  FRBR,”   Cataloging  &  Classification  Quarterly  39,  no.  3–4  (2005):  179–95,   http://dx.doi.org/10.1300/J104v39n03_11.   21.    Lee  L.  Zia,  “Growing  a  National  Learning  Environments  and  Resources  Network  for  Science,   Mathematics,  Engineering,  and  Technology  Education:  Current  Issues  and  Opportunities  for   the  NSDL  Program,”  D-­‐Lib  Magazine  7,  no.  3  (2001),   http://www.dlib.org/dlib/march01/zia/03zia.html.     22.    Thomas  Steiner,  Raphael  Troncy,  and  Michael  Hausenblas,  “How  Google  is  Using  Linked  Data   Today  and  Vision  for  Tomorrow”  (paper  presented  at  the  Linked  Data  in  the  Future  Internet   at  the  Future  Internet  Assembly  (FIA  2010),  Ghent,  December  2010),   http://research.google.com/pubs/pub37430.html. 5666 ---- President’s Message: UX Thinking and the LITA Member Experience Rachel Vacek INFORMATION TECHNOLOGIES AND LIBRARIES | SEPTEMBER 1014 1 My mind has been occupied lately with user experience (UX) thinking in both the web world and in the physical world around me. I manage a web services department in an academic library, and it’s my department’s responsibility to contemplate how best to present website content so students can easily search for the articles they are looking for, or so faculty can quickly navigate to their favorite database. In addition to making these tasks easy and efficient, we want to make sure that users feel good about their accomplishments. My department has to ensure that the other systems and services that are integrated throughout the site are located in meaningful places and can be used at the point of need. Additionally, the site’s graphic and interaction design must not only contribute to but also enhance the overall user experience. We care about usability, graphic design, and the user interfaces of our library’s web presence, but these are just subsets of the larger UX picture. For example, a site can have a great user interface and design, but if a user can’t get to the actual information she is looking for, the overall experience is less than desirable. Jesse James Garrett is considered to be one of the founding fathers of user-centered design, the creator of the pivotal diagram defining the elements of user experience, and author of book, The Elements of User Experience. He believes that “experience design is the design of anything, independent of medium, or across media, with human experience as an explicit outcome, and human engagement as an explicit goal.”1 In other words, applying a UX approach to thinking involves paying attention to a person’s behaviors, feelings, and attitudes about a particular product, system, or service. Someone who does UX design, therefore, focuses on building the relationship between people and the products, systems, and services in which they interact. Garrett provides a roadmap of sorts for us by identifying and defining the elements of a web user experience, some of which are the visual, interface, and interaction design, the information architecture, and user needs.2 In time, these come together to form a cohesive, holistic approach to impacting our users’ overarching experience across our library’s web presence. Paying attention to these more contextual elements informs the development and management of a web site. Let’s switch gears for a moment. Prior to winning the election and becoming the LITA Vice- President/President-Elect, I reflected on my experiences as a new LITA member and before I became really engaged within the association. I endeavored to remember how I felt when I had joined LITA in 2005. Was I welcomed and informed, or did I feel distant and uninformed? Was the path clear to getting involved in interest groups and committees, or were there barriers that Rachel Vacek (revacek@uh.edu) is LITA President 2014-15 and Head of Web Services, University Libraries, University of Houston, Houston, Texas. mailto:revacek@uh.edu PRESIDENT’S MESSAGE | VACEK 2 prevented me from getting engaged? What was my attitude about the overall organization? How were my feelings about LITA impacted? Luckily, there were multiple times when I felt embraced by LITA members, such as participating in BIGWIG’s Social Media Showcase, teaching pre-conferences, hanging out at the Happy Hours, and attending the Forums. I discovered ample networking opportunities and around every corner there always seemed to be a way to get involved. I attended as many LITA programs at Annual and Midwinter conferences as I could, and in doing so, ran into the same crowds of people over and over again. Plus, the sessions I attended always had excellent content and friendly, knowledgeable speakers. Over time, many of these members became some of my friends and most trusted colleagues. Unfortunately, I’m confident that not every LITA member or prospective member has had similar, consistent, or as engaging experiences as I’ve had, or as many opportunities to travel to conferences and network in-person. We all have different expectations and goals that color our personal experiences in interacting with LITA and its members. One of my goals as LITA President is to enhance the member experience. I want to apply the user experience design concepts that I’m so familiar with to effect change and improve the overall experience for current members and those who are on the fence about joining. To be clear, when I say LITA member, I am including board members, committee members and chairs, interest group members and chairs, representatives, and those just observing on the sidelines. We are all LITA members and deserve to have a good experience no matter the level within the organization. So what does “member experience” really mean? Don Norman, author of The Design of Everyday Things and the man attributed with actually coining the phrase user experience, explains that "user experience encompasses all aspects of the end-user's interaction with the company, its services, and its products.” 3 Therefore, I would say that the LITA member experience encompasses all aspects of a member’s interaction with the association, including its programming, educational opportunities, publications, events, and even other members. I believe that there are several components that define a good member experience. First, we have to ensure quality, coherence, and consistency in programming, publications, educational opportunities, communications and marketing, conferences, and networking opportunities. Second, we need to pay attention to our members’ needs and wants as well as their motivations for joining. This means we have to engage with our members more on a personal level, and discover their interests and strengths, and help them get involved in LITA in ways that benefit the association as well assist them in reaching their professional goals. Third, we need to be welcoming and recognize that first impressions are crucial to gaining new members and retaining current ones. Think about how you felt and what you thought when you received a product that really impressed you, or when you started an exciting new job, or even used a clean and usable web site. If your initial impression was positive, you were more likely to connect with the product, environment, or even a website. If prospective and relatively new LITA INFORMATION TECHNOLOGIES AND LIBRARIES | SEPTEMBER 1014 3 members experience a good first impression, they are more likely to join or renew their membership. They feel like they are part of a community that cares about them and their future. That experience became meaningful. Finally, the fourth component to a good member experience is that we need to stop looking at the tangible benefits that we provide to users as the only things that matter. Sure, it’s great to get discounts on workshops and webinars or be able to vote in an election and get appointed to a committee, but we can’t continue to focus on these offerings alone. We need to assess the way we communicate through email, social media, and our web page and determine if it adds or detracts from the member experience. What is the first impression someone might have in looking at the content and design of LITA’s web page? Do the presenters for our educational programs feel valued? Does ITAL contain innovative and useful information? Is the process for joining LITA, or volunteering to be on a committee, simple, complex, or unbearable? What kinds of interactions do members have with the LITA Board or the LITA Staff? These less tangible interactions are highly contextual and can add to or detract from our current and prospective members’ abilities to meet their own goals, measure satisfaction, or define success. As LITA President, and with the assistance of the Board of Directors, there are several things we have done or intend to do to help LITA embrace UX thinking: • We have implemented a Chair and Vice-Chair model for committees so that there is a smoother transition and the Vice-Chair can learn the responsibilities of the Chair role prior to being in that role. • We have established a new Communications Committee that will create a communication strategy focused on communicating the LITA’s mission, vision, goals, and relevant and timely news to LITA membership across various communication channels. • We are encouraging our committees to create more robust documentation. • We are creating richer documentation that supports the workings of the Board. • We are creating documentation and training materials for LITA representatives to compliment the materials we have for committee chairs. • We have disbanded committees that no longer serve a purpose at the LITA level and whose concerns are now addressed in groups higher within ALA. • The Assessment and Research Committee is preparing to do a Membership Survey. The last one was done in 2007. • We are going to be holding a few virtual and in-person LITA “Kitchen Table Conversations” in the Fall of 2014 to assist with strategic planning and to discuss how LITA’s goals align with ALA’s strategic goals of information policy, professional development, and advocacy. • The Membership Development Committee is exploring how to more easily and frequently reach out, engage, appreciate, acknowledge, and highlight current and prospective members. They will work closely with the Communications Committee. PRESIDENT’S MESSAGE | VACEK 4 I believe that we’ve arrived at a time where it’s crucial that we employ UX thinking at a more pragmatic and systematic level and treat at it as our strategic partner when exploring how to improve LITA and help the association evolve to meet the needs of today’s library and informational professionals. Garrett summarizes my argument nicely. He says, “What makes people passionate, pure and simple, is great experiences. If they have great experience with your product [and] they have great experiences with your service, they’re going to be passionate about your brand, they’re going to be committed to it. That’s how you build that kind of commitment.”4 I personally am very passionate about and committed to LITA, and I truly believe that our UX efforts will positively impact your experience as a LITA member. REFERENCES 1. http://uxdesign.com/events/article/state-of-ux-design-garrett/203, Garrett said this quote in a presentation entitled “State of User Experience” that he gave during UX Week 2009, a very popular conference for UX designers. 2. http://www.jjg.net/elements/pdf/elements.pdf 3. http://www.nngroup.com/articles/definition-user-experience/ 4. http://www.teresabrazen.com/podcasts/what-the-heck-is-user-experience-design, Garret said this quote in a podcast interview with Teresa Brazen, “What the Heck is User Experience Design??!! (And Why Should I Care?)” http://uxdesign.com/events/article/state-of-ux-design-garrett/203 http://www.jjg.net/elements/pdf/elements.pdffunctional http://www.nngroup.com/articles/definition-user-experience/ http://www.teresabrazen.com/podcasts/what-the-heck-is-user-experience-design 5699 ---- Microsoft Word - 5699-11611-7-CE.docx Geographic  Information  and  Technologies   in  Academic  Libraries:  An  ARL  Survey  of   Services  and  Support       Ann  L.  Holstein     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  MARCH  2015             38   ABSTRACT   One  hundred  fifteen  academic  libraries,  all  current  members  of  the  Association  of  Research  Libraries   (ARL),  were  selected  to  participate  in  an  online  survey  in  an  effort  to  better  understand  campus   usage  of  geographic  data  and  geospatial  technologies,  and  how  libraries  support  these  uses.  The   survey  was  used  to  capture  information  regarding  geographic  needs  of  their  respective  campuses,  the   array  of  services  they  offer,  and  the  education  and  training  of  geographic  information  services   department  staff  members.  The  survey  results,  along  with  review  of  recent  literature,  were  used  to   identify  changes  in  geographic  information  services  and  support  since  1997,  when  a  similar  survey   was  conducted  by  ARL.  This  new  study  has  enabled  recommendations  to  be  made  for  building  a   successful  geographic  information  service  center  within  the  campus  library  that  offers  a  robust  and   comprehensive  service  and  support  model  for  all  geographic  information  usage  on  campus.   INTRODUCTION   In  June  1992,  the  ARL  in  partnership  with  Esri  (Environmental  Systems  Research  Institute)   launched  the  GIS  (Geographic  Information  Systems)  Literacy  Project.  This  project  sought  to   “introduce,  educate,  and  equip  librarians  with  the  skills  necessary”  to  become  effective  GIS  users   and  to  learn  how  to  provide  patrons  with  “access  to  spatially  referenced  data  in  all  formats.”1   Through  the  implementation  of  a  GIS  program,  libraries  can  provide  “a  means  to  have  the   increasing  amount  of  digital  geographic  data  become  a  more  useful  product  for  the  typical   patron.”2     In  1997,  five  years  after  the  GIS  Literacy  Project  began,  a  survey  was  conducted  to  elucidate  how   ARL  libraries  support  patron  GIS  needs.  The  survey  was  distributed  to  121  ARL  members  for  the   purpose  of  gathering  information  about  GIS  services,  staffing,  equipment,  software,  data,  and   support  these  libraries  offered  to  their  patrons.  Seventy-­‐two  institutions  returned  the  survey,  a  60%   response  rate.  At  that  time,  nearly  three-­‐quarters  (74%)  of  the  respondents  affirmed  that  their   library  administered  some  level  of  GIS  services.3  This  indicates  that  the  GIS  Literacy  Project  had  an   evident  positive  impact  on  the  establishment  of  GIS  services  in  ARL  member  libraries.   Since  then,  it  has  been  recognized  that  the  rapid  growth  of  digital  technologies  has  had  a   tremendous  effect  on  GIS  services  in  libraries.4  We  acknowledge  the  importance  of  assessing     Ann  L.  Holstein  (ann.holstein@case.edu)  is  GIS  Librarian  at  Kelvin  Smith  Library,  Case  Western   Reserve  University,  Cleveland,  Ohio.     GEOGRAPHIC  INFORMATION  AND  TECHNOLOGIES  IN  ACADEMIC  RESEARCH  LIBRARIES  |  HOLSTEIN   39   how  geographic  services  in  academic  research  libraries  have  further  evolved  over  the  past  17   years  in  response  to  these  advancing  technologies  as  well  as  the  increasingly  demanding   geographic  information  needs  of  their  user  communities.     METHOD   For  this  study,  115  academic  libraries,  all  current  members  of  ARL  as  of  January  2014,  were   invited  to  participate  in  an  online  survey  in  an  effort  to  better  understand  campus  usage  of   geographic  data  and  geospatial  technologies  and  how  libraries  support  these  uses.  Similar  in   nature  to  the  1997  ARL  survey,  the  2014  survey  was  designed  to  capture  information  regarding   geographic  needs  of  their  respective  campuses,  the  array  of  services,  software.  and  support  the   academic  libraries  offer,  and  the  education  and  training  of  geographic  information  services   department  staff  members.  Our  aim  was  to  be  able  to  determine  the  range  of  support  patrons  can   anticipate  at  these  libraries  and  ascertain  changes  in  GIS  library  services  since  the  1997  survey.   A  cross-­‐sectional  survey  was  designed  and  administered  using  Qualtrics,  an  online  survey  tool.  It   was  distributed  in  January  2014  via  email  to  the  person  identified  as  the  subject  specialist  for   mapping  and/or  geographic  information  at  each  ARL  member  academic  library.  When  the  survey   closed  after  two  weeks,  54  institutions  had  responded  to  the  survey.  This  accounts  for  47%   participation.  Responding  institutions  are  listed  in  the  appendix.   RESULTS   Software  and  Technologies   We  were  interested  in  learning  about  what  types  of  geographic  information  software  and   technologies  are  currently  being  offered  at  academic  research  libraries.  Results  show  that  100%  of   survey  respondents  offer  GIS  software/mapping  technologies  at  their  libraries,  36%  offer  remote   sensing  software  (to  process  and  analyze  remotely  sensed  data  such  as  aerial  photography  and   satellite  imagery),  and  36%  offer  Global  Positioning  System  (GPS)  equipment  and/or  software.   Nearly  all  (98%)  said  that  their  libraries  provide  Esri  ArcGIS  software,  with  83%  also  providing   access  to  Google  Maps  and  Google  Earth,  and  35%  providing  QGIS  (previously  known  as  Quantum   GIS).  Smatterings  of  other  GIS,  remote-­‐sensing,  and  GPS  products  are  also  offered  by  some  of  the   libraries,  although  not  in  large  numbers  (see  table  1  for  full  listing).     The  fact  that  nearly  all  survey  respondents  offer  ArcGIS  software  at  their  libraries  comes  as  no   surprise.  ArcGIS  is  the  most  commonly  provided  mapping  software  available  in  academic  libraries,   and  in  2011,  it  was  determined  that  2,500  academic  libraries  were  using  Esri  products.5  Esri   software  was  most  popular  in  1997  as  well,  undoubtedly  because  they  offered  free  software  and   training  to  participants  of  the  GIS  Literacy  Project.6         INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  MARCH  2015   40   Software/Technology   Type   %  of  Providing   Libraries   Esri  ArcGIS   GIS   98   Google  Maps/Earth   GIS   83   QGIS   GIS   35   AutoCad   GIS   19   ERDAS  IMAGINE   Remote  Sensing   19   GRASS   GIS   15   ENVI   Remote  Sensing   15   GeoDa   GIS   6   PCI  Geomatica   Remote  Sensing   6   Garmin  Map  Source   GPS   6   SimplyMap   GIS   4   Trimble  TerraSync   GPS   4   Table  1.  Geographic  Information  Software/Mapping  Technologies  Provided  at  ARL  Member   Academic  Libraries  (2014)   Google  Maps  and  Google  Earth,  launched  in  2005,  have  quickly  become  very  popular  mapping   products  used  at  academic  libraries—a  close  second  only  to  Esri  ArcGIS.  In  addition  to  being  free,   their  ease  of  use,  powerful  visualization  capabilities,  “customizable  map  features  and  dynamic   presentation  tools”  make  them  attractive  alternatives  to  commercial  GIS  software  products.7     Since  1997,  many  software  programs  have  fallen  out  of  favor.  MapInfo,  Idrisi,  Maptitude,  and   Sammamish  Data  Finder/Geosight  Pro  were  GIS  software  programs  listed  in  the  1997  survey   results  that  are  not  used  today  at  ARL  member  academic  libraries.8  Instead,  open  source  software   such  as  QGIS,  GRASS,  and  GeoDa  are  growing  in  popularity.  They  are  free  to  use  and  their  source   code  may  be  modified  as  needed.   GPS  equipment  lending  can  be  very  beneficial  to  students  and  campus  researchers  who  need  to   collect  their  own  field  research  locational  data.  The  2014  survey  found  that  30%  of  respondents   loan  recreational  GPS  equipment  at  their  libraries  and  10%  loan  mapping-­‐grade  GPS  equipment.   The  high  cost  of  mapping-­‐grade  GPS  equipment  (several  thousand  dollars)  may  be  a  barrier  for   some  libraries;  however,  this  is  the  type  of  equipment  recommended  in  best-­‐practice  methods  for   gathering  highly  accurate  GPS  data  for  research.  In  addition  to  expense,  complexity  of  operation  is   another  consideration.  While  it  is  “fairly  simple  to  use  a  recreational  GPS  unit,”  a  certain  level  of   advanced  training  is  required  for  operating  mapping-­‐grade  GPS  equipment.9  A  designated  staff   member  may  need  to  take  on  the  responsibility  of  becoming  the  in-­‐house  GPS  expert  and  routinely   offer  training  sessions  to  those  interested  in  borrowing  mapping-­‐grade  GPS  equipment.     Location     GEOGRAPHIC  INFORMATION  AND  TECHNOLOGIES  IN  ACADEMIC  RESEARCH  LIBRARIES  |  HOLSTEIN   41   At  36%  of  responding  libraries,  the  geographic  information  services  area  is  located  where  the   paper  maps  are  (map  department/services);  19%  have  separated  this  area  and  designated  it  as  a   geospatial  data  center,  GIS,  or  data  services  department;  13%  integrate  it  with  the  reference   department;  and  just  4%  of  libraries  house  the  GIS  area  in  government  documents.  Table  2  lists  all   reported  locations  for  this  service  area.  Not  surprisingly,  in  1997,  government  documents  (39%)   was  just  as  popular  a  location  for  this  service  area  as  within  the  map  department  (43%).10   Libraries  identified  government  documents  as  a  natural  fit,  keeping  GIS  services  within  close   proximity  to  spatial  data  sets  recently  being  distributed  by  government  agencies,  most  notably  the   US  Government  Printing  Office  (GPO).  These  agencies  had  made  the  decision  to  distribute  “most   data  in  machine  readable  form,”11  including  the  1990  Census  data  as  Topographically  Integrated   Geographic  Encoding  and  Referencing  (TIGER)  files.12  GIS  technologies  were  needed  to  access  and   most  effectively  use  information  within  these  massive  spatial  datasets.     Location   %  of  Libraries  (1997)   %  of  Libraries  (2014)   Map  Department/Services   43   36   Government  Documents   39   4   Reference   10   13   Geospatial  Data  Center,  GIS,  or  Data  Services   3   19   Not  in  any  one  location   -­‐   9   Digital  Scholarship  Center   -­‐   6   Combined  Area  (i.e.,  Map  Dept.  &  Gov.  Docs.)   -­‐   6   Table  2.  Location  of  the  Geographic  Information  Services  Area  within  the  Library  (1997  and  2014)   At  59%  of  responding  libraries,  geographic  information  software  is  available  on  computer   workstations  in  a  designated  area,  such  as  within  the  map  department.  However,  many  do  not   restrict  users  by  location  and  have  the  software  available  on  all  computer  workstations   throughout  the  library  (37%)  or  on  designated  workstations  distributed  throughout  the  library   (33%).  A  small  percentage  (7%)  loan  laptops  to  patrons  with  the  software  installed,  allowing  full   mobility  throughout  the  entire  library  space.   Staffing   Most  professional  staff  working  in  the  geographic  information  services  department  hold  one  or   more  postbaccalaureate  advanced  degrees.  Of  113  geographic  services  staff  at  responding   libraries,  65%  had  obtained  an  MA/MS,  MLS/MLIS,  or  PhD;  43%  have  one  advanced  degree,  while   22%  have  two  postbaccalaureate  degrees.  Half  (50%)  hold  an  MLS/MLIS,  31%  hold  an  MA/MS,   and  6%  hold  a  PhD.  Nearly  one-­‐third  (31%)  have  obtained  a  BA/BS  as  their  highest  educational   degree,  3%  had  a  two-­‐year  technical  degree,  and  2%  had  only  earned  a  GED  or  high  school   diploma.  In  1997,  84%  of  GIS  librarians  and  specialists  at  ARL  libraries  had  an  MLS  degree.13  At   that  time,  the  incumbent  was  most  often  recruited  from  within  the  library  to  assume  this  new  role,     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  MARCH  2015   42   whereas  today’s  GIS  professionals  are  just  as  likely  to  come  from  nonlibrary  backgrounds,   bringing  their  expertise  and  advanced  geographic  training  to  this  nontraditional  librarian  role.     Figure  1.  Highest  Educational  Degree  of  Geographic  Services  Staff  (2014)   On  average,  this  department  is  staffed  by  two  professional  staff  members  and  three  student  staff.   Student  employees  can  be  a  terrific  asset,  especially  if  they  have  been  previously  trained  in  GIS.   Students  are  likely  to  be  recruited  from  departments  that  are  the  heaviest  GIS  users  at  the   university  (i.e.,  geography,  geology).  Some  libraries  have  implemented  “co-­‐op”  programs  where   students  can  receive  credit  for  working  at  the  GIS  services  area.  These  dual-­‐benefit  positions  are   quite  lucrative  to  students.14     Campus  Users   In  a  typical  week  during  the  course  of  a  semester,  responding  libraries  each  serve  approximately   sixteen  GIS  users,  four  remote  sensing  users,  and  three  GPS  users.  These  users  may  obtain   assistance  from  department  staff  either  in-­‐person  or  remotely  via  phone  or  email.     On  average,  undergraduate  and  graduate  students  compose  the  majority  (75%)  of  geographic   service  users  (32%  and  43%,  respectively).  Faculty  members  compose  14%  of  the  users,  followed   by  staff  (including  postdoctoral  researchers)  at  7%.  Some  institutions  also  provide  support  to   public  patrons  and  alumni  (4%  and  1%,  respectively).  In  1997,  it  was  estimated  that  on  average,   63%  of  GIS  users  were  students,  22%  were  faculty,  8%  were  staff,  and  8%  were  public.15   GED/HS   2%   2yr  Tech   3%   BA/BS   31%   MA/MS/MLIS   58%   PhD   6%     GEOGRAPHIC  INFORMATION  AND  TECHNOLOGIES  IN  ACADEMIC  RESEARCH  LIBRARIES  |  HOLSTEIN   43     Figure  2.  Comparison  of  the  Percentage  of  Geographic  Service  Users  by  Patron  Status  (1997  and   2014)   The  top  three  departments  that  use  GIS  software  at  ARL  campuses  are  Environmental   Science/Studies,  Urban  Planning/Studies,  and  Geography.  The  most  frequent  remote  sensing   software  users  come  from  the  departments  of  Environmental  Science/Studies,  Geography,  and   Archaeology.  GPS  equipment  loan  and  software  usage  is  most  popular  with  the  departments  of   Environmental  Science/Studies,  Geography,  Biology/Ecology  and  Archaeology  (see  table  3  for  full   listing).  Some  departments  are  heavy  users  of  all  geographic  technologies,  while  others  have   shown  interest  in  only  one.  For  example,  the  departments  of  Psychology  and  Medicine/Dentistry   have  used  GIS  but  have  expressed  little  or  no  interest  in  using  remote-­‐sensing  or  GPS  technologies.   Support  and  Services   The  campus  community  is  supported  by  library  staff  in  a  variety  of  ways  with  regards  to  GIS,   remote-­‐sensing,  and  GPS  technology  and  software  use.  Nearly  all  (94%)  libraries  provide   assistance  using  the  software  for  specific  class  assignments  and  projects,  and  78%  are  able  to   provide  more  in-­‐depth  research  project  consultations.  More  than  one-­‐quarter  (27%)  of  reporting   libraries  will  make  custom  GIS  maps  for  patrons,  although  there  may  be  a  charge  depending  on  the   library,  project,  and  patron  type  (10%).  Most  (90%)  offer  basic  use  and  troubleshooting  support;   however,  just  39%  offer  support  for  software  installation,  and  55%  offer  technical  support  for   problems  such  as  licensing  issues  and  turning  on  extensions.  The  campus  computing  center  or   information  technology  services  (ITS)  at  ARL  institutions  most  likely  fields  some  of  the  software   installation  and  technical  issues  rather  than  the  library,  thus  accounting  for  the  lower  percentages.     A  variety  of  software  training  may  be  offered  to  the  campus  community  through  the  library;  80%   of  responding  libraries  make  visits  to  classes  to  give  presentations  and  training  sessions,  69%  host   workshops,  47%  provide  opportunities  for  virtual  training  courses  and  tutorials,  and  4%  offer   certificate  training  programs.     0   10   20   30   40   50   60   70   80   Students   Faculty   Staff   Public   Alumni   1997   2014     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  MARCH  2015   44   Department   GIS   Remote  Sensing   GPS   Anthropology   24   10   8   Archaeology   24   14   13   Architecture   24   1   6   Biology/Ecology   32   10   13   Business/Economics   23   1   3   Engineering   18   9   11   Environmental  Science/Studies   41   22   16   Forestry/Wildlife/Fisheries   21   12   10   Geography   35   22   15   Geology   31   12   10   History   27   2   2   Information  Sciences   14   1   0   Nursing   8   1   2   Medicine/Dentistry   9   0   0   Political  Science   25   3   5   Psychology   4   0   0   Public  Health/Epidemiology/  Biostatistics   30   3   9   Social  Work   2   0   1   Sociology   22   0   3   Soil  Science   17   5   4   Statistics   8   3   0   Urban  Planning/Studies   36   7   9   Table  3.  Number  of  ARL  Libraries  Reporting  Frequent  Users  of  GIS,  Remote-­‐Sensing,  or  GPS   Software  and  Technologies  from  a  Campus  Department  (2014)     Often,  the  library  is  not  the  only  place  people  can  go  to  obtain  software  support  and  training  on   campus.  Most  (86%)  responding  libraries  state  that  their  university  offers  credit  courses,  and  41%   of  campuses  have  a  GIS  computer  lab  located  elsewhere  on  campus  that  may  be  utilized.  ITS  is   available  for  assistance  at  29%  of  the  universities,  and  continuing  education  offers  some  level  of   training  and  support  at  14%  of  campuses.     Data  Collection  and  Access   Most  (85%)  of  responding  libraries  collect  geographic  data  and  allow  an  annual  budget  for  it.   “Libraries  that  have  invested  money  in  proprietary  software  and  trained  staff  members  will  tend   to  also  develop  and  maintain  their  own  collection  of  data  resources.”16  Of  those  collecting  data,  26%   spend  less  than  $1,000  annually,  15%  spend  between  $1,000  and  $2,499,  17%  spend  between   $2,500  and  $5,000,  while  41%  spend  more  than  $5,000.  In  1997,  79%  of  libraries  spent  less  than   $2,000  annually,  and  only  9%  spent  more  than  $5,000.17       GEOGRAPHIC  INFORMATION  AND  TECHNOLOGIES  IN  ACADEMIC  RESEARCH  LIBRARIES  |  HOLSTEIN   45     Figure  3.  Annual  budget  allocations  for  geographic  data  (2014)   A  dramatic  shift  has  occurred  over  the  years  with  budget  allocations  for  data  sets.  No  longer  are   academic  libraries  just  collecting  free  government  data  sets  as  was  typically  the  case  back  in  1997,   but  they  are  investing  much  more  of  their  materials  budget  into  building  up  the  geographic  data   collection  for  their  users.     Data  is  made  accessible  to  campus  users  in  a  variety  of  ways.  A  majority  (84%)  offer  data  via   remote  access  or  download  from  a  networked  campus  computer,  using  a  virtual  private  network   (VPN)  or  login.  More  than  half  (62%)  of  responding  libraries  provide  access  to  data  from   workstations  within  the  library,  and  64%  lend  CD-­‐ROMs.   Roughly  one-­‐quarter  (26%)  of  responding  libraries  provide  users  with  storage  for  their  data.  Of   those,  29%  have  a  dedicated  geographic  data  server,  14%  use  the  main  library  server,  29%  point   users  to  the  university  server  or  institutional  repository,  and  36%  allow  users  to  store  their  data   directly  onto  a  library  computer  workstation  hard  drive.   Internal  Use  of  GIS  in  Libraries   Geographic  information  technologies  may  be  used  internally  to  help  patrons  navigate  the  library’s   physical  collections  and  efficiently  locate  print  materials.  Of  the  survey  respondents,  60%  use  GIS   for  map  or  air  photo  indexing,  27%  use  the  technology  to  create  floor  maps  of  the  library  building,   and  15%  use  it  to  map  the  library’s  physical  collections.  “The  use  of  GIS  in  mapping  library   collections  is  one  of  the  non-­‐traditional  but  useful  applications  of  GIS.”18  GIS  can  be  used  to  link   library  materials  to  simulated  views  of  floor  maps  through  location  codes.19  This  enables  patrons   to  determine  the  exact  location  of  library  material  by  providing  them  with  item  “location  details   such  as  stacks,  row,  rack,  shelf  numbers,  etc.”20  The  GIS  system  can  become  a  useful  tool  for   collection  management  and  can  be  a  tremendous  time-­‐saver  for  patrons,  especially  those   unfamiliar  with  the  cataloging  system  or  collection  layout.     DISCUSSION   Recommendations  for  Building  a  Successful  Geographic  Information  Service  Center   0   5   10   15   20   25   30   35   40   45   percent  (%)     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  MARCH  2015   46   The  geographic  information  services  area  is  often  a  blend  of  the  traditional  and  modern.  It  can   extend  to  paper  maps,  atlases,  GPS  equipment,  software  manuals,  large-­‐format  scanners,  printers,   and  GIS.  GIS  services  may  include  a  cluster  of  computers  with  GIS  software  installed,  an  accessible   collection  of  GIS  data  resources,  and  assistance  available  from  the  library  staff.  The  question  for   academic  libraries  today  is  no  longer  “whether  to  offer  GIS  services  but  what  level  of  service  to   offer.”21  Every  university  has  different  GIS  needs,  and  the  library  must  decide  how  it  can  best   support  these  needs.  There  is  no  set  formula  for  building  a  geographic  information  service  center   because  each  institution  “has  a  different  service  mission  and  user  base.”22  Every  library’s  GIS   service  program  will  be  designed  with  its  unique  institutional  needs  in  mind;  however,  they  each   will  incorporate  some  combination  of  hardware,  software,  data,  and  training  opportunities   provided  by  at  least  one  knowledgeable  staff  member.23     “GIS  represents  a  significant  investment  in  hardware,  software,  staffing,  data  acquisition,  and   ongoing  staff  development.  Either  new  money  or  significant  reallocation  is  required.”24   Establishing  new  or  enhancing  GIS  services  in  the  library  requires  the  “serious  assessment  of  long-­‐ term  support  and  funding  needs.”25  Commitment  of  the  university  as  a  whole,  or  at  least  support   from  senior  administration,  “library  administration,  and  related  campus  departments”  is  crucial  to   its  success.26  Receiving  “more  funding  will  mean  more  staff,  better  trained  staff,  a  more  in-­‐depth   collection,  better  hardware  and  software,  and  the  ability  to  offer  multiple  types  of  GIS  services.”27     Once  funding  for  this  endeavor  has  been  secured,  it  is  of  utmost  importance  to  recruit  a  GIS   professional  to  manage  the  geographic  information  service  center.  To  be  most  effective  in  this   position,  the  incumbent  should  possess  a  graduate  degree  in  GIS  or  geography;  however,   depending  on  what  additional  responsibilities  would  be  required  of  the  candidate  (i.e.,  reference,   cataloging,  etc.)  a  second  degree  in  library  science  is  strongly  recommended.  This  staff  member   should  possess  mapping  and  GIS  skills,  which  include  experience  with  Esri  software  and  remote   sensing  technologies.  Employees  in  this  position  may  be  given  a  job  titles  such  as  “GIS  specialists,   GIS/data  librarians,  GIS/map  librarians,  digital  cartographers,  spatial  data  specialists,  and  GIS   coordinators.”28     With  the  new  staff  member  on  board,  hereafter  referred  to  as  “GIS  specialist,”  decisions  such  as   what  software  to  provide,  which  data  sets  to  collect,  and  what  types  of  training  and  support  to   offer  to  the  campus  can  be  made.  Consulting  with  research  centers  and  academic  departments  that   currently  use  or  are  interested  in  using  GIS  and  remote  sensing  technologies  is  a  good  place  to   learn  about  software,  data,  and  training  needs  and  to  determine  the  focus  and  direction  of  the   geographic  information  services  department.29  Campus  users  often  come  from  academic   departments  that  “have  neither  staff  nor  facilities  to  support  GIS,”  and  “may  only  consist  of  one  or   two  faculty  and  a  few  graduate  students.  These  GIS  users  need  access  to  software,  data,  and   expertise  from  a  centralized,  accessible  source  of  research  assistance,  such  as  the  library.”30     At  minimum,  Esri  ArcGIS,  Google  Maps  and  Google  Earth  should  be  supported,  with  additional   remote  sensing  or  open  source  GIS  software  depending  on  staff  expertise  and  known  campus     GEOGRAPHIC  INFORMATION  AND  TECHNOLOGIES  IN  ACADEMIC  RESEARCH  LIBRARIES  |  HOLSTEIN   47   needs.  When  purchasing  commercial  software  licenses,  such  as  for  Esri  ArcGIS,  discounts  for   educational  institutions  are  usually  available.  Additionally,  negotiating  campus-­‐wide  software   licenses  may  be  a  good  option  to  consider  as  the  costs  are  usually  far  less  than  purchasing   individual  or  floating  licenses.  Costs  for  campus-­‐wide  licensing  are  typically  determined  by  full-­‐ time  equivalent  (FTE)  students  enrolled  at  the  university.     Facilitating  “access  to  educational  resources  such  as  software  tools  and  applications,  how-­‐to-­‐ guides  for  data  and  software,”  and  tutorials  is  crucial.31  The  GIS  specialist  must  be  familiar  with   how  GIS  software  can  be  used  by  many  disciplines,  the  availability  of  “training  courses  or  tutorials,   sources  or  extensible  GIS  software,  and  hundreds  of  software  and  application  books.”32  Tutorials   may  be  provided  direct  from  a  software  vendor  (i.e.,  Esri  Virtual  Campus)  or  developed  in-­‐house   by  the  GIS  specialist.  Creating  “GIS  tutorials  on  short,  task-­‐based  techniques  such  as   georeferencing  or  geocoding”  and  making  them  readily  available  online  or  as  a  handout  may  save   time  having  to  repeatedly  explain  these  techniques  to  patrons.33   Geospatial  data  collection  development  is  a  core  function  of  the  geographic  information  services   department.  To  effectively  develop  the  data  collection,  the  GIS  specialist  must  fully  comprehend   the  needs  of  the  user  community  as  well  as  possess  a  “fundamental  understanding  of  the  nature   and  use  of  GIS  data.”34  This  is  often  referred  to  as  “spatial  literacy.”35  It  is  crucial  to  keep  abreast  of   “recent  developments,  applications,  and  data  sets.”36   The  GIS  specialist  will  spend  much  more  time  searching  for  and  acquiring  geographic  data  sets   than  selecting  and  purchasing  traditional  print  items  such  as  maps,  monographs,  and  journals  for   the  collection.  A  budget  should  be  established  annually  for  the  purchase  of  all  geographic   materials,  both  print  and  digital.  A  great  challenge  for  the  specialist  is  to  acquire  data  at  the  lowest   cost  possible.  While  a  plethora  of  free  data  is  available  online  from  government  agencies  and   nonprofit  organizations,  other  data,  available  only  from  private  companies,  may  be  quite   expensive  because  of  the  high  production  costs.  A  collection  development  policy  should  be  created   that  indicates  the  types  of  materials  and  data  collected  and  specifies  geographic  regions,  formats,   and  preferred  scales.37  The  needs  of  the  user  community  must  be  carefully  considered  when   establishing  the  policy.     The  expertise  of  the  GIS  specialist  is  needed  not  only  to  help  patrons  locate  the  appropriate   geographic  data,  but  also  to  use  the  software  to  process,  interpret,  and  analyze  it.  “Only  the  few   library  patrons  that  have  had  GIS  experience  are  likely  to  obtain  any  level  of  success  without   intervention  by  library  staff”;38  thus,  for  any  mapping  program  installed  on  a  library  computer,   “staff  must  have  working  knowledge  of  the  program”  and  must  be  able  to  provide  support  to   users.39  Furthermore,  the  GIS  specialist  must  be  able  to  train  patrons  to  use  the  software  to   complete  common  tasks  such  as  file  format  conversion,  data  projection,  data  manipulation,  and   geoprocessing.  These  geospatial  technologies  involve  a  steep  learning  curve,  and  unfortunately   “hands-­‐on  training  options  outside  the  university  are  often  cost-­‐prohibitive”  for  many.40  The   campus  community  requires  training  opportunities  to  be  both  convenient  and  inexpensive.     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  MARCH  2015   48   Teaching  hands-­‐on  geospatial  technology  workshops,  from  basic  to  the  advanced,  is  fundamental   to  educating  the  campus  community.  Workshops  will  “vary  from  institution  to  institution,  with   some  offering  students  an  introduction  to  mapping  and  others  focusing  on  specific  features  of  the   program,  such  as  georeferencing,  geocoding,  and  spatial  analysis.  Some  also  offer  workshops  that   are  theme  specific,”  such  as  “Working  with  census  data”  or  “Digital  elevation  modeling.”41  Custom   workshops  or  training  sessions  can  be  developed  to  meet  a  specific  campus  need,  tailored  for  a   specific  class  in  consult  with  an  instructor,  or  designed  especially  for  other  library  staff.     Today’s  Geographic  Information  Service  Center   The  academic  map  librarian  from  the  1970s  or  1980s  would  hardly  recognize  todays’  geographic   information  service  center.  What  was  once  a  room  of  map  cases  and  shelves  of  atlases  and   gazetteers  is  now  a  bustling  geospatial  center.  Computers,  powerful  GIS  and  remote-­‐sensing   technologies,  GPS  devices,  digital  maps,  and  data  are  now  available  to  library  patrons.  Every   library  surveyed  provides  GIS  software  to  campus  users,  and  85%  also  actively  collect  GIS  and   remotely  sensed  data.  With  the  assistance  of  expertly  trained  library  staff,  users  with  no  or  limited   experience  using  geospatial  technologies  are  enabled  to  analyze  spatial  data  sets  and  create   custom  maps  for  coursework,  projects,  and  research.  Nearly  all  surveyed  libraries  (94%)  have   staff  that  can  assist  students  specifically  with  software  use  for  class  assignments  and  projects,   while  90%  provide  assistance  with  more  generalized  use  of  the  software.  A  majority  of  libraries   also  offer  a  variety  of  software  training  sessions,  workshops,  and  give  presentations  to  the  campus   community.  All  this  is  made  possible  through  the  library’s  commitment  to  this  service  area  and  the   availability  of  highly  trained  professional  staff,  most  who  hold  a  masters  or  doctoral  degree.  The   library  has  truly  established  itself  as  the  go-­‐to  location  on  campus  for  spatial  mapping  and  analysis.   This  role  has  only  strengthened  in  the  years  since  the  launch  of  the  ARL  GIS  Literacy  Project  in   1992.   REFERENCES   1.     D.  Kevin  Davie  et  al.,  comps.,  SPEC  Kit  238:  The  ARL  Geographic  Information  Systems  Literacy   Project  (Washington,  DC:  Association  of  Research  Libraries,  Office  of  Leadership  and   Management  Services,  1999),  16.   2.   Ibid.,  3.   3.   Ibid.,  i.   4.   Abraham  Parrish,  “Improving  GIS  Consultations:  A  Case  Study  at  Yale  University  Library,”   Library  Trends  55,  no.  2  (2006):  328,  http://dx.doi.org/10.1353/lib.2006.0060.     5.     Eva  Dodsworth,  Getting  Started  with  GIS:  A  LITA  Guide  (New  York:  Neal-­‐Schuman,  2012),  161.   6.   Davie  et  al.,  SPEC  Kit  238,  i.     GEOGRAPHIC  INFORMATION  AND  TECHNOLOGIES  IN  ACADEMIC  RESEARCH  LIBRARIES  |  HOLSTEIN   49   7.   Eva  Dodsworth  and  Andrew  Nicholson,  “Academic  Uses  of  Google  Earth  and  Google  Maps  in  a   Library  Setting,”  Information  Technology  &  Libraries  31,  no.  2  (2012):  102,   http://dx.doi.org/10.6017/ital.v31i2.1848.   8.   Davie  et  al.,  SPEC  Kit  238,  8.   9.   Gregory  H.  March,  “Surveying  Campus  GIS  and  GPS  Users  to  Determine  Role  and  Level  of   Library  Services,”  Journal  of  Map  &  Geography  Libraries  7,  no.  2  (2011):  170–71,   http://dx.doi.org/10.1080/15420353.2011.566838.   10.   Davie  et  al.,  SPEC  Kit  238,  5.     11.   George  J.  Soete,  SPEC  Kit  219:  Transforming  Libraries  Issues  and  Innovation  in  Geographic   Information  Systems.  (Washington,  DC:  Association  of  Research  Libraries,  Office  of   Management  Services,  1997),  5.   12.   Camila  Gabaldón  and  John  Repplinger,  “GIS  and  the  Academic  Library:  A  Survey  of  Libraries   Offering  GIS  Services  in  Two  Consortia,”  Issues  in  Science  and  Technology  Librarianship  48   (2006),  http://dx.doi.org/10.5062/F4QJ7F8R.   13.   Davie  et  al.,  SPEC  Kit  238,  5.   14.   Soete,  SPEC  Kit  219,  9.   15.   Davie  et  al.,  SPEC  Kit  238,  10.   16.   Dodsworth,  Getting  Started  with  GIS,  165.   17.   Davie  et  al.,  SPEC  Kit  238,  9.   18.   D.  N.  Phadke,  Geographical  Information  Systems  (GIS)  in  Library  and  Information  Services  (New   Delhi:  Concept,  2006),  36–37.   19.   Ibid.,  13.   20.   Ibid.,  74.   21.   Rhonda  Houser,  “Building  a  Library  GIS  Service  from  the  Ground  Up,”  Library  Trends  55,  no.  2   (2006):  325,  http://dx.doi.org/10.1353/lib.2006.0058.   22.   Melissa  Lamont  and  Carol  Marley,  “Spatial  Data  and  the  Digital  Library,”  Cartography  and   Geographic  Information  Systems  25,  no.  3  (1998):  143,   http://dx.doi.org/10.1559/152304098782383142.     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  MARCH  2015   50   23.   Carolyn  D.  Argentati,  “Expanding  Horizons  for  GIS  Services  in  Academic  Libraries,”  Journal  of   Academic  Librarianship  23,  no.  6  (1997):  463,   http://dx.doi.org/10.1559/152304098782383142.   24.   Soete,  SPEC  Kit  219,  11.   25.   Carol  Cady  et  al.,  “Geographic  Information  Services  in  the  Undergraduate  College:   Organizational  Models  and  Alternatives,”  Cartographica  43,  no.  4  (2008):  249,   http://dx.doi.org/10.3138/carto.43.4.239.   26.   Houser,  “Building  a  Library,”  325.   27.   R.  B.  Parry  and  C.  R.  Perkins,  eds.,  The  Map  Library  in  the  New  Millennium  (Chicago:  American   Library  Association,  2001),  59–60.   28.  Patrick  Florance,  “GIS  Collection  Development  within  an  Academic  Library,”  Library  Trends  55,   no.  2  (2006):  223,  http://dx.doi.org/10.1353/lib.2006.0057.   29.   Houser,  “Building  a  Library,”  325.   30.   Ibid.,  323.   31.   Ibid.,  322.   32.   Parrish.  “Improving  GIS,”  329.   33.   Ibid,  336.   34   Florance,  “GIS  Collection  Development,”  222.   35.    Soete,  SPEC  Kit  219,  6.   36.    Dodsworth,  Getting  Started  with  GIS,  165.   37.   Soete,  SPEC  Kit  219,  8.   38.   Gabaldón  and  Repplinger,  “GIS  and  the  Academic  Library.”   39.   Dodsworth,  Getting  Started  with  GIS,  164.   40.   Houser,  “Building  a  Library,”  323.   41.   Dodsworth,  Getting  Started  with  GIS,  161–62.         GEOGRAPHIC  INFORMATION  AND  TECHNOLOGIES  IN  ACADEMIC  RESEARCH  LIBRARIES  |  HOLSTEIN   51   APPENDIX   Responding  Institutions   Arizona  State  University  Libraries   University  of  Michigan  Library   Auburn  University  Libraries   Michigan  State  University  Libraries   Boston  College  Libraries   University  of  Nebraska–Lincoln  Libraries   University  of  Calgary  Libraries  and  Cultural  Resources   New  York  University  Libraries   University  of  California,  Los  Angeles,  Library   University  of  North  Carolina  at  Chapel  Hill  Libraries   University  of  California,  Riverside,  Libraries   North  Carolina  State  University  Libraries   University  of  California,  Santa  Barbara,  Libraries   Northwestern  University  Library   Case  Western  Reserve  University  Libraries   University  of  Oregon  Libraries   Colorado  State  University  Libraries   University  of  Ottawa  Library   Columbia  University  Libraries   University  of  Pennsylvania  Libraries   University  of  Connecticut  Libraries   Pennsylvania  State  University  Libraries   Cornell  University  Library   Purdue  University  Libraries   Dartmouth  College  Library   Queen’s  University  Library   Duke  University  Library   Rice  University  Library   University  of  Florida  Libraries   University  of  South  Carolina  Libraries   Georgetown  University  Library   University  of  Southern  California  Libraries   University  of  Hawaii  at  Manoa  Library   Syracuse  University  Library   University  of  Illinois  at  Chicago  Library   University  of  Tennessee,  Knoxville,  Libraries   University  of  Illinois  at  Urbana-­‐Champaign  Library   University  of  Texas  Libraries   Indiana  University  Libraries  Bloomington   Texas  Tech  University  Libraries   Johns  Hopkins  University  Libraries   University  of  Toronto  Libraries   University  of  Kansas  Libraries   Tulane  University  Library   McGill  University  Library   Vanderbilt  University  Library   University  of  Manitoba  Libraries   University  of  Waterloo  Library   University  of  Maryland  Libraries   University  of  Wisconsin–Madison  Libraries   Massachusetts  Institute  of  Technology  Libraries   Yale  University  Library   University  of  Miami  Libraries   York  University  Libraries   5702 ---- Microsoft Word - June_ITAL_Owen_final.docx Engine  of  Innovation:     Building  the  High  Performance  Catalog        Will  Owen  and   Sarah  C.  Michalak     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  JUNE  2015               5   ABSTRACT   Numerous  studies  have  indicated  that  sophisticated  web-­‐based  search  engines  have  eclipsed  the   primary  importance  of  the  library  catalog  as  the  premier  tool  for  researchers  in  higher  education.   We  submit  that  the  catalog  remains  central  to  the  research  process.  Through  a  series  of  strategic   enhancements,  the  University  of  North  Carolina  at  Chapel  Hill,  in  partnership  with  the  other   members  of  the  Triangle  Research  Libraries  Network  (TRLN),  has  made  the  catalog  a  carrier  of   services  in  addition  to  bibliographic  data,  facilitating  not  simply  discovery,  but  also  delivery  of  the   information  researchers  seek.   INTRODUCTION In  2005,  an  OCLC  research  report  documented  what  many  librarians  already  knew—that  the   library  webpage  and  catalog  were  no  longer  the  first  choice  to  begin  a  search  for  information.  The   report  states,   The  survey  findings  indicate  that  84  percent  of  information  searches  begin  with  a  search   engine.  Library  Web  sites  were  selected  by  just  1  percent  of  respondents  as  the  source  used  to   begin  an  information  search.  Very  little  variability  in  preference  exists  across  geographic   regions  or  U.S.  age  groups.  Two  percent  of  college  students  start  their  search  at  a  library  Web   site.1   In  2006  a  report  by  Karen  Calhoun,  commissioned  by  the  Library  of  Congress,  asserted,  “Today  a   large  and  growing  number  of  students  and  scholars  routinely  bypass  library  catalogs  in  favor  of   other  discovery  tools.  .  .  .  The  catalog  is  in  decline,  its  processes  and  structures  are  unsustainable,   and  change  needs  to  be  swift.”2     Ithaka  S+R  has  conducted  national  faculty  surveys  triennially  since  2000.  Summarizing  the  2000– 2006  surveys,  Roger  Schonfeld  and  Kevin  Guthrie  stated,  “When  the  findings  from  2006  are   compared  with  those  from  2000  and  2003,  it  becomes  evident  that  faculty  perceive  themselves  as   becoming  decreasingly  dependent  on  the  library  for  their  research  and  teaching  needs.”3   Furthermore,  it  was  clear  that  the  “library  as  gateway  to  scholarly  information”  was  viewed  as   decreasingly  important.  The  2009  survey  continued  the  trend  with  even  fewer  faculty  seeing  the       Will  Owen  (owen@email.unc.edu)  is  Associate  University  Librarian  for  Technical  Services  and   Systems  and  Sarah  C.  Michalak  (smichala@email.unc.edu)  is  University  Librarian  and  Associate   Provost  for  University  Libraries,  University  of  North  Carolina  at  Chapel  Hill.     ENGINE  OF  INNOVATION:  BUILDING  THE  HIGH-­‐PERFORMANCE  CATALOG  |  OWEN  AND  MICHALAK       doi:  10.6017/ital.v34i2.5702   6   gateway  function  as  critical.  These  results  occurred  in  a  time  when  electronic  resources  were   becoming  increasingly  important  and  large  Google-­‐like  search  engines  were  rapidly  gaining  in   use.4     These  comments  extend  into  the  twenty-­‐first  century  more  than  thirty  years  of  concern  about  the   utility  of  the  library  catalog.  Through  the  first  half  of  this  decade  new  observations  emerged  about   patron  perceptions  of  catalog  usability.  Even  after  migration  from  the  card  to  the  online  catalog   was  complete,  the  new  tool  represented  primarily  the  traditionally  cataloged  holdings  of  a   particular  library.  Providing  direct  access  to  resources  was  not  part  of  the  catalog’s  mission.   Manuscripts,  finding  aids,  historical  photography,  and  other  special  collections  were  not  included   in  the  traditional  catalog.  Journal  articles  could  only  be  discovered  through  abstracting  and   indexing  services.  As  these  discovery  tools  began  their  migration  to  electronic  formats,  the   centrality  of  the  library’s  bibliographic  database  was  challenged.   The  development  of  Google  and  other  sophisticated  web-­‐based  search  engines  further  eclipsed  the   library’s  bibliographic  database  as  the  first  and  most  important  research  tool.  Yet  we  submit  that   the  catalog  database  remains  a  necessary  fixture,  continuing  to  provide  access  to  each  library’s   particular  holdings.  While  the  catalog  may  never  regain  its  pride  of  place  as  the  starting  point  for   all  researchers,  it  still  remains  an  indispensable  tool  for  library  users,  even  if  it  may  be  used  only   at  a  later  stage  in  the  research  process.   At  the  University  of  North  Carolina  at  Chapel  Hill,  we  have  continued  to  invest  in  enhancing  the   utility  of  the  catalog  as  a  valued  tool  for  research.  Librarians  initially  reasoned  that  researchers   still  want  to  find  out  what  is  available  to  them  in  their  own  campus  library.  Gradually  they  began   to  see  completely  new  possibilities.  To  that  end,  we  have  committed  to  a  program  that  enhances   discovery  and  delivery  through  the  catalog.  While  most  libraries  have  built  a  wide  range  of   discovery  tools  into  their  home  pages—adding  links  to  databases  of  electronic  resources,  article   databases,  and  Google  Scholar—we  have  continued  to  enhance  both  the  content  to  be  found  in  the   primary  local  bibliographic  database  and  the  services  available  to  students  and  researchers  via  the   interface  to  the  catalog.   In  our  local  consortium,  the  Triangle  Research  Libraries  Network  (TRLN),  librarians  have   deployed  the  search  and  faceting  services  of  Endeca  to  enrich  the  discovery  interfaces.  We  have   gone  beyond  augmenting  the  catalog  through  the  addition  of  MARCIVE  records  for  government   documents,  by  including  Encoded  Archival  Description  (EAD)  finding  aids  and  selected  (and  ever-­‐ expanding)  digital  collections  that  are  not  easily  discoverable  through  major  search  engines.  We   have  similarly  enhanced  services  related  to  the  discovery  and  delivery  of  items  listed  in  the   bibliographic  database,  including  not  only  common  features  like  the  ability  to  export  citations  in  a   variety  of  formats  but  also  more  extensive  services  such  as  document  delivery,  an  auto-­‐suggest   feature  that  maximizes  use  of  Library  of  Congress  Subject  Headings  (LCSH),  and  the  ability  to   submit  cataloged  items  to  be  processed  for  reserve  reading.     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  JUNE  2015     7   Both  students  and  faculty  have  embraced  e-­‐books,  and  in  adding  more  than  a  million  such  titles  to   the  UNC-­‐Chapel  Hill  catalog  we  continue  to  blend  discovery  and  delivery,  but  now  on  a  very  large   scale.  Coupling  catalog  records  with  a  metadata  service  that  provides  book  jackets,  tables  of   contents,  and  content  summaries,  cataloging  Geographic  Information  Systems  (GIS)  data  sets,  and   adding  live  links  to  the  finding  aids  for  digitized  archival  and  manuscript  collections  have  further   enhanced  the  blended  discovery/delivery  capacity  of  the  catalog.   We  have  also  leveraged  the  advantages  of  operating  in  a  consortial  environment  by  extending  the   discovery  and  delivery  services  among  the  members  of  TRLN  to  provide  increased  scope  of   discovery  and  shared  processing  of  some  classes  of  bibliographic  records.  TRLN  comprises  four   institutions  and  content  from  all  member  libraries  is  discoverable  in  a  combined  catalog   (http://search.trln.org).  Printed  material  requested  through  this  combined  catalog  is  often   delivered  between  TRLN  libraries  within  twenty-­‐four  hours.   At  UNC,  our  search  logs  show  that  use  of  the  catalog  increases  as  we  add  new  capacity  and  content.   These  statistics  demonstrate  the  catalog’s  continuing  relevance  as  a  research  tool  that  adds  value   above  and  beyond  conventional  search  engines  and  general  web-­‐based  information  resources.  In   this  article  we  will  describe  the  most  important  enhancements  to  our  catalog,  include  data  from   search  logs  to  demonstrate  usage  changes  resulting  from  these  enhancements,  and  comment  on   potential  future  developments.   LITERATURE  REVIEW   An  extensive  literature  discusses  the  past  and  future  of  online  catalogs,  and  many  of  these   materials  themselves  include  detailed  literature  reviews.  In  fact,  there  are  so  many  studies,   reviews,  and  editorials,  it  becomes  clear  that  although  the  online  catalog  may  be  in  decline,  it   remains  a  subject  of  lively  interest  to  librarians.  Two  important  threads  in  this  literature  report  on   user-­‐query  studies  and  on  other  usability  testing.  Though  there  are  many  earlier  studies,  two   relatively  recent  articles  analyze  search  behavior  and  provide  selective  but  helpful  literature   surveys.5     There  are  many  efforts  to  define  directions  for  the  catalog  that  would  make  it  more  web-­‐like,  more   Google-­‐like,  and  thus  more  often  chosen  for  search,  discovery,  and  access  by  library  patrons.   These  articles  aim  to  define  the  characteristics  of  the  ideal  catalog.  Charles  Hildreth  provides  a   benchmark  for  these  efforts  by  dividing  the  history  of  the  online  catalog  into  three  generations.   From  his  projections  of  a  third  generation  grew  the  “next  generation  catalog”—really  the  current   ideal.  He  called  for  improvement  of  the  second-­‐generation  catalog  through  an  enhanced  user-­‐ system  dialog,  automatic  correction  of  search-­‐term  spelling  and  format  errors,  automatic  search   aids,  enriched  subject  metadata  in  the  catalog  record  to  improve  search  results,  and  the   integration  of  periodical  indexes  in  the  catalog.  As  new  technologies  have  made  it  possible  to   achieve  these  goals  in  new  ways,  much  of  what  Hildreth  envisioned  has  been  accomplished.6       ENGINE  OF  INNOVATION:  BUILDING  THE  HIGH-­‐PERFORMANCE  CATALOG  |  OWEN  AND  MICHALAK       doi:  10.6017/ital.v34i2.5702   8   Second-­‐generation  catalogs,  anchored  firmly  in  integrated  library  systems,  operated  throughout   most  of  the  1980s  and  the  1990s  without  significant  improvement.  By  the  mid-­‐2000s  the  search   for  the  “next-­‐gen”  catalog  was  in  full  swing,  and  there  are  numerous  articles  that  articulate  the   components  of  an  improved  model.  The  catalog  crossed  a  generational  line  for  good  when  the   North  Carolina  State  University  Libraries  (NCSU)  launched  a  new  catalog  search  engine  and   interface  with  Endeca  in  January  2006.  Three  NCSU  authors  published  a  thorough  article   describing  key  catalog  improvements.  Their  Endeca-­‐enhanced  catalog  fulfilled  the  most  important   criteria  for  a  “next-­‐gen”  catalog:  improved  search  and  retrieval  through  “relevance-­‐ranked  results,   new  browse  capabilities,  and  improved  subject  access.”7     Librarians  gradually  concluded  that  the  catalog  need  not  be  written  off  but  would  benefit  from   being  enhanced  and  aligned  with  search  engine  capabilities  and  other  web-­‐like  characteristics.   Catalogs  should  contain  more  information  about  titles,  such  as  book  jackets  or  reviews,  than   conventional  bibliographic  records  offered.  Catalog  search  should  be  understandable  and  easy  to   use.  Additional  relevant  works  should  be  presented  to  the  user  along  with  result  sets.  The   experience  should  be  interactive  and  participatory  and  provide  access  to  a  broad  array  of   resources  such  as  data  and  other  nonbook  content.8     Karen  Markey,  one  of  the  most  prolific  online  catalog  authors  and  analysts,  writes,  “Now  that  the   era  of  mass  digitization  has  begun,  we  have  a  second  chance  at  redesigning  the  online  library   catalog,  getting  it  right,  coaxing  back  old  users  and  attracting  new  ones.”9   Marshall  Breeding  predicted  characteristics  of  the  next-­‐generation  catalog.  His  list  includes   expanded  scope  of  search,  more  modern  interface  techniques,  such  as  a  single  point  of  entry,   search  result  ranking,  faceted  navigation,  and  “did  you  mean  .  .  .  ?”  capacity,  as  well  as  an  expanded   search  universe  that  includes  the  full  text  of  journal  articles  and  an  array  of  digitized  resources.10     A  concept  that  is  less  represented  in  the  literature  is  that  of  envisioning  the  catalog  as  a   framework  for  service,  although  the  idea  of  the  catalog  designed  to  ensure  customer  self-­‐service   has  been  raised.11  Michael  J.  Bennett  has  studied  the  effect  of  catalog  enhancements  on  circulation   and  interlibrary  loan.12  Service  and  the  online  catalog  have  a  new  meaning  in  Morgan’s  idea  of   “services  against  texts,”  supporting  “use  and  understand”  in  addition  to  the  traditional  “find  and   get.”13  Lorcan  Dempsey  commented  on  the  catalog  as  an  identifiable  service  and  predicts  new   formulations  for  library  services  based  on  the  network-­‐level  orientation  of  search  and  discovery.14   But  the  idea  that  the  catalog  has  moved  from  a  fixed,  inward-­‐focused  tool  to  an  engine  for   services—a  locus  to  be  invested  with  everything  from  unmediated  circulation  renewal  and   ordering  delivery  to  the  “did  you  mean”  search  aid—has  yet  to  be  addressed  comprehensively  in   the  literature.   ENHANCING  THE  TRADITIONAL  CATALOG   One  of  the  factors  that  complicates  discussions  of  the  continued  relevance  of  the  library  catalog  to   research  is  the  very  imprecision  of  the  term  in  common  parlance,  especially  when  the  chief  point     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  JUNE  2015     9   of  comparison  to  today’s  ILS-­‐driven  OPACs  is  Google  or,  more  specifically,  Google  Scholar.  From   first-­‐year  writing  assignments  through  advanced  faculty  research,  many  of  the  resources  that  our   patrons  seek  are  published  in  the  periodical  literature,  and  the  library  catalog,  the  one  descended   from  the  cabinets  full  of  cards  that  occupied  prominent  real  estate  in  our  buildings,  has  never  been   an  effective  tool  for  identifying  relevant  periodical  literature.   This  situation  has  changed  in  recent  years  as  products  like  Summon,  from  ProQuest,  and  EBSCO   Discovery  Service  have  introduced  platforms  that  can  accommodate  electronic  article  indexing  as   well  as  MARC  records  for  the  types  of  materials—books,  audio,  and  video—that  have  long  been   discovered  through  the  OPAC.  In  the  following  discussion  of  “catalog”  developments  and   enhancements,  we  focus  initially  not  on  these  integrated  solutions,  but  on  the  catalog  as  more   traditionally  defined.  However,  as  electronic  resources  become  an  ever-­‐greater  percentage  of   library  collections,  we  shall  see  a  convergence  of  these  two  streams  that  will  portend  significant   changes  in  the  nature  and  utility  of  the  catalog.   Much  work  has  been  done  in  the  first  decade  of  the  twenty-­‐first  century  to  enhance  discovery   services  and,  as  noted  above,  North  Carolina  State  University’s  introduction  of  their  Endeca-­‐based   search  engine  and  interface  was  a  significant  game-­‐changer.  In  the  years  following  the   introduction  of  the  Endeca  interface  at  NCSU,  the  Triangle  Research  Libraries  Network  invested  in   further  development  of  features  that  enhanced  the  utility  of  the  Endeca  software  itself.   Programmed  enhancements  to  the  interface  provided  additional  services  and  functionality.  In   some  cases,  these  enhancements  were  aimed  at  improving  discovery.  In  others,  they  allowed   researchers  to  make  new  and  better  use  of  the  data  that  they  found  or  made  it  easier  to  obtain  the   documents  that  they  discovered.   Faceting  and  Limiting  Retrieval  Results   Perhaps  the  most  immediately  striking  innovation  in  the  Endeca  interface  was  the  introduction  of   facets.  The  use  of  faceted  browsing  allowed  users  to  parse  the  bibliographic  record  in  new  ways   (and  more  ways)  than  had  preceding  catalogs.  There  were  several  fundamentally  important  ways   faceting  enhanced  search  and  discovery.   The  first  of  these  was  the  formal  recognition  that  keyword  searching  was  the  user’s  default  means   of  interacting  with  the  catalog’s  data.  NCSU’s  initial  implementation  allowed  for  searches  using   several  indexes,  including  authors,  titles,  and  subject  headings,  and  this  functionality  remains  in   place  to  the  present  day.  However,  by  default,  searches  returned  records  containing  the  search   terms  “anywhere”  in  the  record.  This  behavior  was  more  in  line  with  user  expectations  in  an   information  ecosystem  dominated  by  Google’s  single  search  box.   The  second  was  the  significantly  different  manner  in  which  multiple  limits  could  be  placed  on  an   initial  result  set  from  such  a  keyword  search.  The  concept  of  limiting  was  not  a  new  one:  certain   facets  worked  in  a  manner  consistent  with  traditional  limits  in  prior  search  interfaces,  allowing   users  to  screen  results  by  language,  or  date  of  publication,  for  example.       ENGINE  OF  INNOVATION:  BUILDING  THE  HIGH-­‐PERFORMANCE  CATALOG  |  OWEN  AND  MICHALAK       doi:  10.6017/ital.v34i2.5702   10   It  was  the  ease  and  transparency  with  which  multiple  limits  could  be  applied  through  faceting  that   was  revolutionary.  A  user  who  entered  the  keyword  “java”  in  the  search  box  was  quickly  able  to   discriminate  between  the  programming  language  and  the  Indonesian  island.  This  could  be   achieved  in  multiple  ways:  by  choosing  between  subjects  (for  example,  “application  software”  vs.   “history”)  or  clearly  labeled  LC  classification  categories  (“Q  –  Science”  vs.  “D  –  History”).  These   limits,  or  facets,  could  be  toggled  on  and  off,  independently  and  iteratively.   The  third  and  highly  significant  difference  resulted  from  how  Library  of  Congress  Subject   Headings  (LCSH)  were  parsed  and  indexed  in  the  system.  By  making  LCSH  subdivisions   independent  elements  of  the  subject-­‐heading  index  in  a  keyword  search,  the  Endeca   implementation  unlocked  a  trove  of  metadata  that  had  been  painstakingly  curated  by  catalogers   for  nearly  a  century.  The  user  no  longer  needed  to  be  familiar  with  the  formal  structure  of  subject   headings;  if  the  keywords  appeared  anywhere  in  the  string,  the  subdivisions  in  which  they  were   contained  could  be  surfaced  and  used  as  facets  to  sharpen  the  focus  of  the  search.  This  was   revolutionary.   Utilizing  the  Power  of  New  Indexing  Structures   The  liberation  of  bibliographic  data  from  the  structure  of  MARC  record  indexes  presaged  yet   another  far-­‐reaching  alteration  in  the  content  of  library  catalogs.  To  this  day,  most  commercial   integrated  library  systems  depend  on  MARC  as  the  fundamental  record  structure.  In  NCSU’s   implementation,  the  multiple  indexes  built  from  that  metadata  created  a  new  framework  for   information.     This  change  made  possible  the  integration  of  non-­‐MARC  data  with  MARC  data,  allowing,  for   example,  Dublin  Core  (DC)  records  to  be  incorporated  into  the  universe  of  metadata  to  be  indexed,   searched,  and  retrieved.  There  was  no  need  to  crosswalk  DC  to  MARC:  it  sufficed  to  simply  assign   the  DC  elements  to  the  appropriate  Endeca  indexes.  With  this  capacity  to  integrate  rich  collections   of  locally  described  digital  resources,  the  scope  of  the  traditional  catalog  was  enlarged.   Expanding  Scopes  and  Banishing  Silos   At  UNC-­‐Chapel  Hill,  we  began  this  process  of  augmentation  with  selected  collections  of  digital   objects.  These  collections  were  housed  in  a  CONTENTdm  repository  we  had  been  building  for   several  years  at  the  time  of  the  Library’s  introduction  of  the  Endeca  interface.  Image  files,  which   had  not  been  accessible  through  traditional  catalogs,  were  among  the  first  to  be  added.  For   example,  we  had  been  given  a  large  collection  of  illustrated  postcards  featuring  scenes  of  North   Carolina  cities  and  towns.  These  postcards  had  been  digitized  and  metadata  describing  the  image   and  the  town  had  been  recorded.  Other  collections  of  digitized  historical  photographs  were  also   selected  for  inclusion  in  the  catalog.  These  historical  resources  proved  to  be  a  boon  to  faculty   teaching  local  history  courses  and,  interestingly,  to  students  working  on  digital  projects  for  their   classes.  As  class  assignments  came  to  include  activities  like  creating  maps  enhanced  by  the     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  JUNE  2015     11   addition  of  digital  photographs  or  digitized  newspaper  clippings,  the  easy  discovery  of  these   formerly  hidden  collections  enriched  students’  learning  experience.   Other  special  collection  materials  had  been  represented  in  the  traditional  catalog  in  somewhat   limited  fashion.  The  most  common  examples  were  manuscripts  collections.  The  processing  of   these  collections  had  always  resulted  in  the  creation  of  finding  aids,  produced  since  the  1930s   using  index  cards  and  typewriters.  During  the  last  years  of  the  twentieth  century,  archivists  began   migrating  many  of  these  finding  aids  to  the  web  using  the  EAD  format,  presenting  them  as  simple   HTML  pages.  These  finding  aids  were  accessible  through  the  catalog  by  means  of  generalized   MARC  records  that  described  the  collections  at  a  superficial  level.  However,  once  we  attained  the   ability  to  integrate  the  contents  of  the  finding  aids  themselves  into  the  indexes  underlying  the  new   interface,  this  much  richer  trove  of  keyword-­‐searchable  data  vastly  increased  the  discoverability   and  use  of  these  collections.   During  this  period,  the  Library  also  undertook  systematic  digitization  of  many  of  these  manuscript   collections.  Whenever  staff  received  a  request  for  duplication  of  an  item  from  a  manuscript   collection  (formerly  photocopies,  but  by  then  primarily  digital  copies),  we  digitized  the  entire   folder  in  which  that  item  was  housed.  We  developed  standards  for  naming  these  digital  surrogates   that  associated  the  individual  image  with  the  finding  aid.  It  then  became  a  simple  matter,  involving   the  addition  of  a  short  JavaScript  string  to  the  head  of  the  online  finding  aid,  to  dynamically  link   the  digital  objects  to  the  finding  aid  itself.     Other  library  collections  likewise  benefited  from  the  new  indexing  structures.  Some  uncataloged   materials  traditionally  had  minimal  bibliographic  control  provided  by  inventories  that  were  built   at  the  time  of  accession  in  desktop  database  applications;  funding  constraints  meant  that  full   cataloging  of  these  materials  (often  rare  books)  remained  elusive.  The  ability  to  take  the  data  that   we  had  and  blend  it  into  the  catalog  enhanced  the  discovery  of  these  collections  as  well.   We  also  have  an  extensive  collection  of  video  resources,  including  commercial  and  educational   films.  The  conventions  for  cataloging  these  materials,  held  over  from  the  days  of  catalog  cards,   often  did  not  match  user  expectations  for  search  and  discovery.  There  were  limits  to  the  number   of  added  entries  that  catalogers  would  make  for  directors,  actors,  and  others  associated  with  a   film.  Many  records  lacked  the  kind  of  genre  descriptors  that  undergraduates  were  likely  to  use   when  seeking  a  film  for  an  evening’s  entertainment.  To  compensate  for  these  limitations,  staff  who   managed  the  collection  had  again  developed  local  database  applications  that  allowed  for  the   inclusion  of  more  extensive  metadata  and  for  categories  such  as  country  of  origin  or  folksonomic   genres  that  patrons  frequently  indicated  were  desirable  access  points.  Once  again,  the  new   indexing  structures  allowed  us  to  incorporate  this  rich  set  of  metadata  into  what  looked  like  the   traditional  catalog.   Each  of  the  instances  described  above  represents  what  we  commonly  call  the  destruction  of  silos.   Information  about  library  collections  that  had  been  scattered  in  numerous  locations—and  not  all   of  them  online—was  integrated  into  a  single  point  of  discovery.  It  was  our  hope  and  intention  that     ENGINE  OF  INNOVATION:  BUILDING  THE  HIGH-­‐PERFORMANCE  CATALOG  |  OWEN  AND  MICHALAK       doi:  10.6017/ital.v34i2.5702   12   such  integration  would  drive  more  users  to  the  catalog  as  a  discovery  tool  for  the  library’s  diverse   collections  and  not  simply  for  the  traditional  monographic  and  serials  collections  that  had  been   served  by  MARC  cataloging.  Usage  logs  indicate  that  the  average  number  of  searches  conducted  in   the  catalog  rose  from  approximately  13,000  per  day  in  2009  to  around  19,000  per  day  in  2013.  It   is  impossible  to  tell  with  any  certainty  whether  there  was  heavier  use  of  the  catalog  simply   because  increasingly  varied  resources  came  to  be  represented  in  it,  but  we  firmly  believe  that  the   experience  of  users  who  search  for  material  in  our  catalog  has  become  much  richer  as  a  result  of   these  changes  to  its  structure  and  content.   Cooperation  Encouraging  Creativity   Another  way  we  were  able  to  harness  the  power  of  Endeca’s  indexing  scheme  involved  the  shared   loading  of  bibliographic  records  for  electronic  resources  to  which  multiple  TRLN  libraries   provided  access.  TRLN’s  Endeca  indexes  are  built  from  the  records  of  each  member.  Each   institution  has  a  “pipeline”  that  feeds  metadata  into  the  combined  TRLN  index.  Duplicate  records   are  rolled  up  into  a  single  display  via  OCLC  control  numbers  whenever  possible,  and  the   bibliographic  record  is  annotated  with  holdings  statements  for  the  appropriate  libraries.   We  quickly  realized  that  where  any  of  the  four  institutions  shared  electronic  access  to  materials,  it   was  redundant  to  load  copies  of  each  record  into  the  local  databases  of  each  institution.15  Instead,   one  institution  could  take  responsibility  for  a  set  of  records  representing  shared  resources.   Examples  of  such  material  include  electronic  government  documents  with  records  provided  by   the  MARCIVE  Documents  Without  Shelves  program,  large  sets  like  Early  English  Books  Online,  and   PBS  videos  streamed  by  the  statewide  services  of  NC  LIVE.   In  practice,  one  institution  takes  responsibility  for  loading,  editing,  and  performing  authority   control  on  a  given  set  of  records.  (For  example,  UNC,  as  the  regional  depository,  manages  the   Documents  Without  Shelves  record  set.)  These  records  are  loaded  with  a  special  flag  indicating   that  they  are  part  of  the  shared  records  program.  This  flag  generates  a  holdings  statement  that   reflects  the  availability  of  the  electronic  item  at  each  institution.  The  individual  holdings   statements  contain  the  institution-­‐specific  proxy  server  information  to  enable  and  expedite  access.   In  addition  to  this  distributed  model  of  record  loading  and  maintenance,  we  were  able  to  leverage   OAI-­‐PMH  feeds  to  add  selected  resources  to  the  SearchTRLN  database.  All  four  institutions  have   access  to  the  data  made  available  by  the  Inter-­‐university  Consortium  for  Political  and  Social   Research  (ICPSR).  As  we  do  not  license  these  resources  or  maintain  them  locally,  and  as  records   provided  by  ICPSR  can  change  over  time,  we  developed  a  mechanism  to  harvest  the  metadata  and   push  it  through  a  pipeline  directly  into  the  SearchTRLN  indexes.  None  of  the  member  libraries’   local  databases  house  this  metadata,  but  the  records  are  made  available  to  all  nonetheless.   While  we  were  engaged  in  implementing  these  enhancements,  additional  sources  of  potential   enrichment  of  the  catalog  were  appearing.  In  particular,  vendors  began  providing  indexing   services  for  the  vast  quantities  of  electronic  resources  contained  in  aggregator  databases.     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  JUNE  2015     13   Additionally,  they  made  it  possible  for  patrons  to  move  seamlessly  from  the  catalog  to  those   electronic  resources  via  OpenURL  technologies.  Indeed,  services  like  ProQuest’s  Summon  or   EBSCO’s  Discovery  Service  might  be  taken  as  another  step  toward  challenging  the  catalog’s   primacy  as  a  discovery  tool  as  they  offered  the  prospect  of  making  local  catalog  records  just  a   fraction  of  a  much  larger  universe  of  bibliographic  information  available  in  a  single,  keyword-­‐ searchable  database.   It  remains  to  be  seen,  therefore,  whether  continuing  to  load  many  kinds  of  MARC  records  into  the   local  database  is  an  effective  aid  to  discovery  even  with  the  multiple  delimiting  capabilities  that   Endeca  provides.  What  is  certain,  however,  is  that  our  approach  to  indexing  resources  of  any  kind   has  undergone  a  radical  transformation  over  the  past  few  years—a  transformation  that  goes   beyond  the  introduction  of  any  of  the  particular  changes  we  have  discussed  so  far.   Promoting  a  Culture  of  Innovation   One  important  way  Endeca  has  changed  our  libraries  is  that  a  culture  of  constant  innovation  has   become  the  norm,  rather  than  the  exception,  for  our  catalog  interface  and  content.  Once  we  were   no  longer  subject  to  the  customary  cycle  of  submitting  enhancement  requests  to  an  integrated   library  system  vendor,  hoping  that  fellow  customers  shared  similar  desires,  and  waiting  for  a   response  and,  if  we  were  lucky,  implementation,  we  were  able  to  take  control  of  our  aspirations.   We  had  the  future  of  the  interface  to  our  collections  in  our  own  hands,  and  within  a  few  years  of   the  introduction  of  Endeca  by  NCSU,  we  were  routinely  adding  new  features  to  enhance  its   functionality.   One  of  the  first  of  these  enhancements  was  the  addition  of  a  “type-­‐ahead”  or  “auto-­‐suggest”   option.16  Inspired  by  Google’s  autocomplete  feature,  this  service  suggests  phrases  that  might   match  the  keywords  a  patron  is  typing  into  the  search  box.  Ben  Pennell,  one  of  the  chief   programmers  working  on  Endeca  enhancement  at  UNC-­‐Chapel  Hill,  built  a  Solr  index  from  the  ILS   author,  title,  and  subject  indexes  and  from  a  log  of  recent  searches.  As  a  patron  typed,  a  drop-­‐ down  box  appeared  below  the  search  box.  The  drop-­‐down  contained  matching  terms  extracted   from  the  Solr  index  in  a  matter  of  seconds  or  less.  For  example,  typing  the  letters  “bein”  into  the   box  produced  a  list  including  “Being  John  Malkovich,”  “nature—effects  of  human  beings  on,”   “human  beings,”  and  “Bein,  Alex,  1903–1988.”  The  italicized  letters  in  these  examples  are   highlighted  in  a  different  color  in  the  drop-­‐down  display.  In  the  case  of  terms  drawn  directly  from   an  index,  the  index  name  appears,  also  highlighted,  on  the  right  side  of  the  box.  For  example,  the   second  and  third  terms  in  the  examples  above  are  tagged  with  the  term  “subject.”  The  last  example   is  an  “author.”   In  allowing  for  the  textual  mining  of  LCSH,  the  initial  implementation  of  faceting  in  the  Endeca   catalog  surfaced  those  headings  for  the  patron  by  uniting  keyword  and  controlled  vocabularies  in   an  unprecedented  manner.  There  was  a  remarkable  and  almost  immediate  increase  in  the  number   of  authority  index  searches  entered  into  the  system.  At  the  end  of  the  fall  semester  prior  to  the   implementation  of  the  auto-­‐suggest  feature,  an  average  of  around  1,400  subject  searches  were     ENGINE  OF  INNOVATION:  BUILDING  THE  HIGH-­‐PERFORMANCE  CATALOG  |  OWEN  AND  MICHALAK       doi:  10.6017/ital.v34i2.5702   14   done  in  a  week.  Approximately  one  month  into  the  spring  semester,  that  average  had  risen  to   around  4,000  subject  searches  per  week.  Use  of  the  author  and  title  indexes  also  rose,  although   not  quite  as  dramatically.  In  the  perpetual  tug-­‐of-­‐war  between  precision  and  recall,  the  balance   had  decidedly  shifted.   Another  service  that  we  provide,  which  is  especially  popular  with  students,  is  the  ability  to   produce  citations  formatted  in  one  of  several  commonly  used  bibliographic  styles,  including  APA,   MLA,  and  Chicago  (both  author-­‐date  and  note-­‐and-­‐bibliography  formats).  This  functionality,  first   introduced  by  NCSU  and  then  jointly  developed  with  UNC  over  the  years  that  followed,  works  in   two  ways.  If  a  patron  finds  a  monographic  title  in  the  catalog,  simply  clicking  on  a  link  labeled  “Cite”   produces  a  properly  formatted  citation  that  can  then  be  copied  and  pasted  into  a  document.  The   underlying  technology  also  powers  a  “Citation  Builder”  function  by  which  a  patron  can  enter  basic   bibliographic  information  for  a  book,  a  chapter  or  essay,  a  newspaper  or  journal  article,  or  a   website  into  a  form,  click  the  “submit”  button,  and  receive  a  citation  in  the  desired  format.     An  additional  example  of  innovation  that  falls  somewhat  outside  the  scope  of  the  changes   discussed  above  was  the  development  of  a  system  that  allowed  for  the  mapping  of  simplified   Chinese  characters  to  their  traditional  counterparts.  Searching  in  non-­‐Roman  character  sets  has   always  offered  a  host  of  challenges  to  library  catalog  users.  The  TRLN  Libraries  have  embraced  the   potential  of  Endeca  to  reduce  some  of  these  challenges,  particularly  for  Chinese,  through  the   development  of  better  keyword  searching  strategies  and  the  automatic  translation  of  simplified  to   traditional  characters.   Since  we  had  complete  control  over  the  Endeca  interface,  it  proved  relatively  simple  to  integrate   document  delivery  services  directly  into  the  functionality  of  the  catalog.  Rather  than  simply   emailing  a  bibliographic  citation  or  a  call  number  to  themselves,  patrons  could  request  the   delivery  of  library  materials  directly  to  their  campus  addresses.  Once  we  had  implemented  this   feature,  we  quickly  moved  to  amplify  its  power.  Many  catalogs  offer  a  “shopping  cart”  service  that   allows  patrons  to  compile  lists  of  titles.  One  variation  on  this  concept  that  we  believe  is  unique  to   our  library  is  the  ability  for  a  professor  to  compile  such  a  list  of  materials  held  by  the  libraries  on   campus  and  submit  that  list  directly  to  the  reserve  reading  department,  where  the  books  are   pulled  from  the  shelves  and  placed  on  course-­‐reserve  lists  without  the  professor  needing  to  visit   any  particular  library  branch.  These  new  features,  in  combination  with  other  service   enhancements  such  as  the  delivery  of  physical  documents  to  campus  addresses  from  our  on-­‐ campus  libraries  and  our  remote  storage  facility,  have  increased  the  usefulness  of  the  catalog  as   well  as  our  users’  satisfaction  with  the  Library.  We  believe  that  these  changes  have  contributed  to   the  ongoing  vitality  of  the  catalog  and  to  its  continued  importance  to  our  community.   In  December  2012,  the  Libraries  adopted  ProQuest’s  Summon  to  provide  enhanced  access  to   article  literature  and  electronic  resources  more  generally.  At  the  start  of  the  following  fall   semester,  the  Libraries  instituted  another  major  change  to  our  discovery  and  delivery  services   through  a  combined  single-­‐search  box  on  our  home  page.  This  has  fundamentally  altered  how     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  JUNE  2015     15   patrons  interact  with  our  catalog  and  its  associated  resources.  First,  because  we  are  now   searching  both  the  catalog  and  the  Summon  index,  the  type-­‐ahead  feature  that  we  had  deployed  to   suggest  index  terms  from  our  local  database  to  users  as  they  entered  search  strings  no  longer   functions  as  an  authority  index  search.  We  have  returned  to  querying  both  databases  through  a   simple  keyword  search.     Second,  in  our  implementation  of  the  single  search  interface  we  have  chosen  to  present  the  results   from  our  local  database  and  the  retrievals  from  Summon  in  two,  side-­‐by-­‐side  columns.  This  has   the  advantage  of  bringing  article  literature  and  other  resources  indexed  by  Summon  directly  to   the  patron’s  attention.  As  a  result,  more  patrons  interact  directly  with  articles,  as  well  as  with   books  in  major  digital  repositories  like  Google  Books  and  HathiTrust.  This  change  has   undoubtedly  led  patrons  to  make  less  in-­‐depth  use  of  the  local  catalog  database,  although  it   preserves  much  of  the  added  functionality  in  terms  of  discovering  our  own  digital  collections  as   well  as  those  resources  whose  cataloging  we  share  with  our  TRLN  partners.  We  believe  that  the   ease  of  access  to  the  resources  indexed  by  Summon  complements  the  enhancements  we  have   made  to  our  local  catalog.   CONCLUSION  AND  FURTHER  DIRECTIONS   One  might  argue  that  the  integration  of  electronic  resources  into  the  “catalog”  actually  shifts  the   paradigm  more  significantly  than  any  previous  enhancements.  As  the  literature  review  indicates,   much  of  the  conversation  about  enriching  library  catalogs  has  centered  on  improving  the  means   by  which  search  and  discovery  are  conducted.  The  reasonably  direct  linking  to  full  text  that  is  now   possible  has  once  again  radically  shifted  that  conversation,  for  the  catalog  has  come  to  be  seen  not   simply  as  a  discovery  platform  based  on  metadata  but  as  an  integrated  system  for  delivering  the   essential  information  resources  for  which  users  are  searching.   Once  the  catalog  is  understood  to  be  a  locus  for  delivering  content  in  addition  to  discovering  it,  the   local  information  ecosystem  can  be  fundamentally  altered.  At  UNC-­‐Chapel  Hill  we  have  engaged  in   a  process  whereby  the  catalog,  central  to  the  library’s  web  presence  (given  the  prominence  of  the   single  search  box  on  the  home  page),  has  become  a  hub  from  which  many  other  services  are   delivered.  The  most  obvious  of  these,  perhaps,  is  a  system  for  the  delivery  of  physical  documents   that  is  analogous  to  the  ability  to  retrieve  the  full  text  of  electronic  documents.  If  an  information   source  is  discovered  that  exists  in  the  library  only  in  physical  form,  enhancements  to  the  display  of   the  catalog  record  facilitate  the  receipt  by  the  user  of  the  print  book  or  a  scanned  copy  of  an  article   from  a  bound  journal  in  the  stacks.     In  2013,  Ithaka  S+R  conducted  a  local  UNC  Faculty  Survey.  The  survey  posed  three  questions   related  to  the  catalog.  In  response  to  the  question,  “Typically  when  you  are  conducting  academic   research,  which  of  these  four  starting  points  do  you  use  to  begin  locating  information  for  your   research?,”  41  percent  chose  “a  specific  electronic  research  resource/computer  database.”  Nearly   one-­‐third  (30  percent)  chose  “your  online  library  catalog.”17     ENGINE  OF  INNOVATION:  BUILDING  THE  HIGH-­‐PERFORMANCE  CATALOG  |  OWEN  AND  MICHALAK       doi:  10.6017/ital.v34i2.5702   16   When  asked,  “When  you  try  to  locate  a  specific  piece  of  secondary  scholarly  literature  that  you   already  know  about  but  do  not  have  in  hand,  how  do  you  most  often  begin  your  process?,”  41   percent  chose  the  library’s  website  or  online  catalog,  and  40  percent  chose  “search  on  a  specific   scholarly  database  or  search  engine.”  In  response  to  the  question,  “How  important  is  it  that  the   library  .  .  .  serves  as  a  starting  point  or  ‘gateway’  for  locating  information  for  my  research?,”  78   percent  answered  extremely  important.     On  several  questions,  Ithaka  provided  the  scores  for  an  aggregation  of  UNC’s  peer  libraries.  For   the  first  question  (the  starting  point  for  locating  information),  18  percent  of  national  peers  chose   the  online  catalog  compared  to  30  percent  at  UNC.  On  the  importance  of  the  library  as  gateway,  61   percent  of  national  peers  answered  very  important  compared  to  the  78  percent  at  UNC.   In  2014,  the  UNC  Libraries  were  among  a  handful  of  academic  research  libraries  that  implemented   a  new  Ithaka  student  survey.  Though  we  don’t  have  national  benchmarks,  we  can  compare  our   own  student  and  faculty  responses.  Among  graduate  students,  31  percent  chose  the  online  catalog   as  the  starting  point  for  their  research,  similar  to  the  faculty.18  Of  the  undergraduate  students,  33   percent  chose  the  Library’s  website,  which  provides  access  to  the  catalog  through  a  single  search   box.19   A  finding  that  approximately  a  third  of  students  began  their  search  on  the  UNC  Library  website   was  gratifying.  OCLC’s  Perceptions  of  Libraries  2010  reported  survey  results  regarding  where   people  start  their  information  searches.  In  2005,  1  percent  said  they  started  on  a  library  website;   in  2010,  not  a  single  respondent  indicated  doing  so.20     The  gross  disparity  between  the  OCLC  reports  and  the  Ithaka  surveys  of  our  faculty  and  students   requires  some  explanation.  The  Libraries  at  the  University  of  North  Carolina  at  Chapel  Hill  are   proud  of  a  long  tradition  of  ardent  and  vocal  support  from  the  faculty,  and  we  are  not  surprised  to   learn  that  students  share  their  loyalty.  For  us,  the  recently  completed  Ithaka  surveys  point  out   directions  for  further  investigation  into  our  patrons’  use  of  our  catalog  and  why  they  feel  it  is  so   critical  to  their  research.   Anecdotal  reports  indicate  that  one  of  the  most  highly  valued  services  that  the  Libraries  provide  is   delivery  of  physical  materials  to  campus  addresses.  Some  faculty  admit  with  a  certain  degree  of   diffidence  that  our  services  have  made  it  almost  unnecessary  to  set  foot  in  our  buildings;  that  is  a   trend  that  has  also  been  echoed  in  conversations  with  our  peers.  Yet  the  online  presence  of  the   Library  and  its  collections  continues  to  be  of  significant  importance—perhaps  precisely  because  it   offers  an  effective  gateway  to  a  wide  range  of  materials  and  services.   We  believe  that  the  radical  redesign  of  the  online  public  access  catalog  initiated  by  North  Carolina   State  University  in  2006  marked  a  sea  change  in  interface  design  and  discovery  services  for  that   venerable  library  service.  Without  a  doubt,  continued  innovation  has  enhanced  discovery.   However,  we  have  come  to  realize  that  discovery  is  only  one  function  that  the  online  catalog  can   and  should  serve  today.  Equally  if  not  more  important  is  the  delivery  of  information  to  the     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  JUNE  2015     17   patron’s  home  or  office.  The  integration  of  discovery  and  delivery  is  what  sets  the  “next-­‐gen”   catalog  apart  from  its  predecessors,  and  we  must  strive  to  keep  that  orientation  in  mind,  not  only   as  we  continue  to  enhance  the  catalog  and  its  services,  but  as  we  ponder  the  role  of  the  library  as   place  in  the  coming  years.  Far  from  being  in  decline,  the  online  catalog  continues  to  be  an  “engine   of  innovation”  (to  borrow  a  phrase  from  Holden  Thorp,  former  chancellor  of  UNC-­‐Chapel  Hill)  and   a  source  of  new  challenges  for  our  libraries  and  our  profession.   REFERENCES     1.     Cathy  De  Rosa  et  al.,  Perceptions  of  Libraries  and  Information  Resources:  A  Report  to  the  OCLC   Membership  (Dublin,  OH:  OCLC  Online  Computer  Library  Center,  2005),  1–17,   https://www.oclc.org/en-­‐US/reports/2005perceptions.html.   2.     Karen  Calhoun,  The  Changing  Nature  of  the  Catalog  and  Its  Integration  with  Other  Discovery   Tools,  Final  Report,  Prepared  for  the  Library  of  Congress  (Ithaca,  NY:  K.  Calhoun,  2006),  5,   http://www.loc.gov/catdir/calhoun-­‐report-­‐final.pdf.   3.     Roger  C.  Schonfeld  and  Kevin  M.  Guthrie,  “The  Changing  Information  Services  Needs  of   Faculty,”  EDUCAUSE  Review  42,  no.  4  (July/August  2007):  8,   http://www.educause.edu/ero/article/changing-­‐information-­‐services-­‐needs-­‐faculty.   4.     Ross  Housewright  and  Roger  Schonfeld,  Ithaka’s  2006  Studies  of  Key  Stakeholders  in  the  Digital   Transformation  in  Higher  Education  (New  York:  Ithaka  S+R,  2008),  6,   http://www.sr.ithaka.org/sites/default/files/reports/Ithakas_2006_Studies_Stakeholders_Di gital_Transformation_Higher_Education.pdf.   5.     Xi  Niu  and  Bradley  M.  Hemminger,  “Beyond  Text  Querying  and  Ranking  List:  How  People  are   Searching  through  Faceted  Catalogs  in  Two  Library  Environments,”  Proceedings  of  the   American  Society  for  Information  Science  &  Technology  47,  no.  1  (2010):  1–9,   http://dx.doi.org/10.1002/meet.14504701294;  and  Cory  Lown,  Tito  Sierra,  and  Josh  Boyer,   “How  Users  Search  the  Library  from  a  Single  Search  Box,”  College  &  Research  Libraries  74,  no.   3  (2013):  227–41,  http://crl.acrl.org/content/74/3/227.full.pdf.     6.     Charles  R.  Hildreth,  “Beyond  Boolean;  Designing  the  Next  Generation  of  Online  Catalogs,”   Library  Trends  (Spring  1987):  647–67,  http://hdl.handle.net/2142/7500.   7.     Kristen  Antelman,  Emily  Lynema,  and  Andrew  K.  Pace,  “Toward  a  Twenty-­‐First  Century   Library  Catalog,”  Information  Technology  and  Libraries  25,  no.  3  (2006):  129,   http://dx.doi.org/10.6017/ital.v25i3.3342.   8.     Karen  Coyle,  “The  Library  Catalog:  Some  Possible  Futures,”  Journal  of  Academic  Librarianship   33,  no.  3  (2007):  415–16,  http://dx.doi.org/10.1016/j.acalib.2007.03.001.   9.     Karen  Markey,  “The  Online  Library  Catalog:  Paradise  Lost  and  Paradise  Regained?”  D-­‐Lib   Magazine  13,  no.  1/2  (2007):  2,  http://dx.doi.org/10.1045/January2007-­‐markey.     ENGINE  OF  INNOVATION:  BUILDING  THE  HIGH-­‐PERFORMANCE  CATALOG  |  OWEN  AND  MICHALAK       doi:  10.6017/ital.v34i2.5702   18     10.    Marshall  Breeding,  “Next-­‐Gen  Library  Catalogs,”  Library  Technology  Reports  (July/August   2007):  10–13.   11.    Jia  Mi  and  Cathy  Weng,  “Revitalizing  the  Library  OPAC:  Interface,  Searching,  and  Display   Challenges,”  Information  Technology  and  Libraries  27,  no.  1  (2008):  17–18,   http://dx.doi.org/10.6017/ital.v27i1.3259.   12.    Michael  J.  Bennett,  “OPAC  Design  Enhancements  and  Their  Effects  on  Circulation  and   Resource  Sharing  within  the  Library  Consortium  Environment,”  Information  Technology  and   Libraries  26,  no.  1  (2007):  36–46,  http://dx.doi.org/10.6017/ital.v26i1.3287.   13.    Eric  Lease  Morgan,  “Use  and  Understand;  the  Inclusion  of  Services  against  Texts  in  Library   Catalogs  and  Discovery  Systems,”  Library  Hi  Tech  (2012):  35–59,   http://dx.doi.org/10.1108/07378831211213201.   14.    Lorcan  Dempsey,  “Thirteen  Ways  of  Looking  at  Libraries,  Discovery,  and  the  Catalog:  Scale,   Workflow,  Attention,”  Educause  Review  Online  (December  10,  2012),   http://www.educause.edu/ero/article/thirteen-­‐ways-­‐looking-­‐libraries-­‐discovery-­‐and-­‐ catalog-­‐scale-­‐workflow-­‐attention.   15.    Charles  Pennell,  Natalie  Sommerville,  and  Derek  A.  Rodriguez,  “Shared  Resources,  Shared   Records:  Letting  Go  of  Local  Metadata  Hosting  within  a  Consortium  Environment,”  Library   Resources  &  Technical  Services  57,  no.  4  (2013):  227–38,   http://journals.ala.org/lrts/article/view/5586.   16.    Benjamin  Pennell  and  Jill  Sexton,  “Implementing  a  Real-­‐Time  Suggestion  Service  in  a  Library   Discovery  Layer,”  Code4Lib  Journal  10  (2010),  http://journal.code4lib.org/articles/3022.     17.    Ithaka  S+R,  UNC  Chapel  Hill  Faculty  Survey:  Report  of  Findings  (unpublished  report  to  the   University  of  North  Carolina  at  Chapel  Hill,  2013),  questions  20,  21,  33.   18.    Ithaka  S+R,  UNC  Chapel  Hill  Graduate  Student  Survey:  Report  of  Findings  (unpublished  report   to  the  University  of  North  Carolina  at  Chapel  Hill,  2014),  47.   19.    Ithaka  S+R,  UNC  Chapel  Hill  Undergraduate  Student  Survey:  Report  of  Findings  (unpublished   report  to  the  University  of  North  Carolina  at  Chapel  Hill,  2014),  39.   20.    Cathy  De  Rosa  et  al.,  Perceptions  of  Libraries,  2010:  Context  and  Community:  A  Report  to  the   OCLC  Membership  (Dublin,  OH:  OCLC  Online  Computer  Library  Center,  2011),  32,   http://oclc.org/content/dam/oclc/reports/2010perceptions/2010perceptions_all.pdf.     5831 ---- Static vs. Dynamic Tutorials: Applying Usability Principles to Evaluate Online Point-of-Need Instruction Benjamin Turner, Caroline Fuchs, and Anthony Todman INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2015 30 ABSTRACT This study had a two-fold purpose. One is to discover through the implementation of usability testing which mode of tutorial was more effective: screencasts containing audio/video directions (dynamic) or text-and-image tutorials (static). The other is to determine if online point-of-need tutorials were effective in helping undergraduate students use library resources. To this end, the authors conducted two rounds of usability tests consisting of three groups each, in which participants were asked to complete a database-searching task after viewing a text-and-image tutorial, audio/video tutorial, or no tutorial. The authors found that web usability testing was a useful tutorial-testing tool while discovering that participants learned most effectively from text-and-image tutorials because both rounds of participants completed tasks more accurately and more quickly than those who received audio/video instruction or no instruction. INTRODUCTION The provision of library instruction online has become increasingly important, given that more than one third of higher education students now take at least some of their courses online and that the number of students enrolling in online courses continues to increase more rapidly than the number of students in higher education as a whole.1 Academic library websites reflect the growth of online education. By 1998, online versions of journals had become ubiquitous.2 In contrast, electronic books have been slower to be adopted in academic libraries, but there has been a steady and significant growth of their use in recent years. Between 2010 and 2011, for example, the average number of electronic books available at academic libraries in the United States increased by 93 percent.3 Benjamin Turner (turnerb@stjohns.edu) is Associate Professor and Instructional Librarian, Caroline Fuchs (fuchsc@stjohns.edu) is Associate Professor and Outreach Librarian, and Anthony Todman (todmana@stjohns.edu) is Associate Professor and Reference and Government Documents Librarian, St. Johns University Libraries, New York, New York. mailto:turnerb@stjohns.edu mailto:fuchsc@stjohns.edu mailto:todmana@stjohns.edu STATIC VS. DYNAMIC TUTORIALS | TURNER, FUCHS, AND TODMAN | doi: 10.6017/ital.v34i4.5831 31 With the increasing availability of library content online, many users bypass the “brick and mortar” library and go directly to its website.4 Remote access to library collections has advantages in terms of convenience, which further underscores the importance of making library websites as intuitive as possible while offering quality instruction at point-of-need. A recent survey of 264 academic library websites found that 64 percent offered some form of online tutorials.5 The relative effectiveness of different types of tutorials in providing online, point-of-need library instruction is therefore an important consideration for library professionals. This study had a two-fold purpose. One is to discover through the implementation of usability testing which mode of tutorial was more effective: screencasts containing visual and audio directions (dynamic) or text-and-image tutorials (static). The other is to determine if online point-of-need tutorials were effective in helping undergraduate students use library resources. For the purpose of this study, researchers were less interested in the long-term effects of these tutorials on student research but rather focused on point-of-need instruction for database use. St. John’s University St. John’s University is a private, coeducational Roman Catholic University, founded in 1870 by the Vincentian Community. The University has three residential campuses within New York City and an Academic Center in Oakdale, New York, as well as international campuses in Rome, Italy, and Paris, France. The university comprises six schools and colleges: St. John’s College of Liberal Arts and Sciences; The School of Education; The Peter J. Tobin College of Business; College of Pharmacy and Health Sciences; College of Professional Studies; and the School of Law. There is a strong focus on online learning. Special academic programs include accelerated three-year bachelor’s degrees, five-year bachelor’s/master’s degrees in the graduate schools, a six-year bachelor’s/JD from the School of Law, and a six-year PharmD program. In fall 2013, total student enrollment was 20,729, with 15,773 registered undergraduates and 1,364 international students. During the 2012–13 academic year, 97 percent of undergraduate students received financial aid in the form of scholarships, loans, grants, and college work/study initiatives. The student body was 56 percent female and 44 percent male, representing 47 states and 116 countries. The diversity of student population is noted by the fact that 47 percent identified themselves as black, Hispanic, Asian, Native Hawaiian/Pacific Islander, American Indian, Alaska Native, or multiracial. St. John’s University has a library presence at four campuses: Queens, Staten Island, Manhattan, and Rome, Italy. In addition to traditional or in-person interaction, both INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2015 32 online and distance learning are integral parts of the library-tutorial and instruction environment. Undergraduate students receive a laptop computer at no cost, and the entire campus is wireless accessible. Full-time faculty members receive laptop computers as well. The University Libraries has 24/7 access to electronic resources, both on and off campus. The Libraries’ portal is located at http://www.stjohns.edu/libraries. An online catalog can be found at http://stjohns.waldo.kohalibrary.com. Wireless computing and printing are available at the four campus library sites as well as in other areas across campus. Library reference and research assistance services are delivered in-person or electronically. Library reserve services are accessible in either print or electronic formats. Interlibrary Loan has both domestic and international borrowing and lending via the ILLiad software platform. When the main Queens campus library is not open for service, a 24/7 quiet study area is available for current students within the library space. Library Instructional services take place in formal classes that are requested by faculty, as well as Library Faculty-initiated workshops held in either the libraries’ computerized classrooms or at other on-campus locations. There is no mandated information literacy session. During June 2012–May 2013, 333 instruction classes were offered to 4,435 students. LITERATURE REVIEW The library literature on online library tutorials might be divided into subcategories: early development of online instructional tutorials, library website usability testing, evaluation of online information-literacy instruction tutorials, best practices for the creation of library tutorials, and the best mediums for the creation of library tutorials. Early Development of Online Instruction Tutorials The need to evaluate and assess the usefulness of online instructional tutorials is not new. Although not explicitly related to today’s environment, Tobin and Kesselman’s work contains an early history detailing the design of internet-based information pages and their use in the library information environment.6 They also included the early guidelines of the Association of College and Research Libraries (ACRL), the International Federation of Library Associations (IFLA), and the American Library Association (ALA). A study by Dewald conducted around the same time evaluated twenty library tutorials according to the current best practices in library instruction, and concluded “online tutorials cannot completely substitute for http://www.stjohns.edu/libraries http://stjohns.waldo.kohalibrary.com/ STATIC VS. DYNAMIC TUTORIALS | TURNER, FUCHS, AND TODMAN | doi: 10.6017/ital.v34i4.5831 33 the human connection in learning”7 and should be designed specifically to support students’ academic work. Further, it was noted that tutorials should teach concepts, rather than mechanics, and incorporate active learning where possible.8 In a separate article, Dewald argued that the web made possible new, creative ways of teaching library skills, through features such as linked tables of contents, and the provision of immediate feedback through CGI scripts. Users also were able to open secondary windows to practice the skills they learned as they moved through tutorials. She further concluded that effective instructional content should not be text heavy, but rather include images and interactive features.9 Another early study of online tutorials discussed the development of a self-paced web tutorial at Seneca College in Toronto, called “Library Research Success,” which was designed to teach subject-specific and general research skills to first-year business majors. The creation of the tutorial was first requested by Seneca College’s School of Business Management, which collaborated with Seneca College Library, the school’s Centre for New Technology, and Centre for Professional development in completing the project. The tutorial was a success, with overwhelmingly positive feedback from students and faculty members.10 Despite such successful examples, a common concern expressed in early studies was that online tutorials would not be as effective as face-to-face instruction. One article compared and evaluated library skills instruction methods for first-year students at Deakin University.11 Another tracked the difference between CAI (Computer Assisted Instruction) without a personal librarian interaction and a more traditional library instruction incorporated into an English classroom setting, and which concluded that while useful, CAI was not a good substitute for face-to-face instruction.12 Library Website Usability Testing As concern grew at the onset of the twenty-first century for the need to evaluate online library tutorials, articles on library website usability testing began to appear more frequently. In one study, the authors noted that they would not have identified problems with their website had they not done usability testing: “Testers’ observations and the comments of the students participating in the test were invaluable in revealing where and why the site failed and helped evaluators to identify and prioritize the gross usability problems to be addressed.”13 Librarians aiming to examine their patrons’ ability to independently navigate their library’s webpage to fulfill key research needs, conducted similar studies. At Western Michigan University (WMU), librarians investigated how researchers navigated the WMU library Website in order to find three things: the title of a magazine article on affirmative action, the title of a journal article on endangered species, and a recent INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2015 34 newspaper about the Senate race in New York State. They successfully used the data gathered to identify problems with their website and to establish goals and priorities in clarifying language and navigation on their site.14 More recently, researchers conducted a usability study with the aim of showing how librarians could build websites to better compete with nonlibrary search sites such as Google, which would allow greater personalization by the individual user and more seamless integration into learning management systems.15 Other researchers have studied the readability of content on academic library websites. In one such study, Lim used a combination of readability formulas and focus groups to evaluate twenty-one academic library websites that serve significant numbers of academically underprepared students and/or students who spoke English as a second language. They concluded that the majority of information literacy content on library pages had poor readability, and that the lack of well- designed and well-written information literacy content could undermine its effectiveness in serving users.16 Kruger, Ray, and Knight employed a usability study to evaluate student knowledge of their library’s web resources. The study produced mixed results, with most students able to navigate to the library’s website and the OPAC, but large numbers unable to perform basic research tasks such as finding a journal article. The authors noted that such information would allow them to modify library instruction accordingly.17 Another study focused on the use of language as it relates to awareness of relevant databases. At Bowling Green University Library, staff members attempted to learn more about how users find and select databases through the library website’s electronic resources management system (ERM). Because of their study, the authors recommended that librarians should focus on promoting brand awareness of relevant databases among students in their subject disciplines by providing better database descriptions on the library webpages and by collaborating with subject faculty members.18 Evaluation of Online Information-Literacy Instruction Tutorials Librarians at Wayne State University conducted an assessment of their revamped information literacy tutorial, known as “re:Search.”19 They distributed a multiple- choice knowledge questionnaire to seventy-two students participating in their 2010 Wayne State Federal TRIO Student Support Service Summer Residential Program, which was based on Donald Kirkpatrick’s Evaluating Training Programs: The Four Levels.20 They concluded that their study highlighted some flaws in their tutorials, including navigational problems. As a result, they would consider partnering with WSU faculty in the future to develop better modules. One curious comment by the authors in their introduction warrants further discussion about assumptions made STATIC VS. DYNAMIC TUTORIALS | TURNER, FUCHS, AND TODMAN | doi: 10.6017/ital.v34i4.5831 35 by librarians regarding student research skills: “The internet has bolstered student confidence levels in their research abilities, increasing the demand for point-of-need instruction. Students are accustomed to online learning, not only because of the shift in higher education to online coursework, but also because they have been leaning online through YouTube, social networking, and other Websites.”21 At Purdue University, librarians evaluated the success of their seven-module online tutorial through the distribution of a post-test survey. These researchers found that the feedback received was essential for planning future versions of online instruction at their institution.22 A report from Zayed University (United Arab Emirates) outlined an evaluation of Infoasis, the University’s online information literacy tutorial, testing 4,000 female students with limited library proficiency and remedial English aptitudes.23 Best Practices for the Creation of Library Tutorials Other researchers developed guidelines and best practices for future planning and implementation. Bowles-Terry, Hensley, and Hinchliffe at the University of Illinois conducted interviews to investigate the usability, findability, and instruction effectiveness of online video tutorials. Although shorter than three minutes, students found the tutorials to be too lengthy, and would have preferred the option to skip ahead to pertinent sections. Other participants found the tutorials too slow, while some preferred to read rather than watch and listen. On the basis of their study, the authors recommended a set of best practices for creating library video tutorials, including pace, length, content, look and feel, video versus text, findability, and interest in using video tutorials.24 At Regis University Library, librarians created online interactive animated tutorials and incorporated Google Analytics for use statistics and tutorial assessment, from which they developed a list of tips and suggestions for tutorial development. These included suggestions regarding the technical aspects such as screen resolution and accessibility. Of some significance is that the data from the analytics suggest that the tutorials are being used both within and without the university. Most useful here is the “Best Practices for Creating and Managing Animated Tutorials” found in the article’s appendix.25 Best Mediums for the Creation of Library Tutorials Other authors have explored the need to accommodate different learning styles in library tutorials rather than relying too heavily on text to convey information.26 At the University of Leeds in the United Kingdom, an information literacy tutorial was planned and created to support online distance learners in the geography post- graduate program. Using an articulate presenter, the authors created a tutorial that INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2015 36 covered the same material that would be taught in a face-to-face session, and which incorporated visual, auditory, and textual elements. These researchers concluded that the online tutorial is supplemental and did not alleviate the need for face-to- face instruction.27 To reach different types of learners, many librarians have begun to use Adobe Flash (formerly Macromedia Flash) to create multimodal online information literacy tutorials. Authors who use Flash note that learning how to use the software correctly represents a significant investment in time and effort.28 Another study, conducted via a SUNY Albany web design class, focused on the effect/outcome of teaching with web-based tutorials in addition to or instead of face-to-face interaction. The authors of this study pointed out that self-paced instruction, lab time, office, hours, and email exchange were all factors that are affecting web-based multimedia (WBMM) Flash that were incorporated into instruction.29 Rather than focusing purely on the content of online library instruction tutorials, some studies considered and evaluated the various tutorial-creating software tools. Blevins and Elton conducted a case study at the William E. Laupus Health Sciences Library at East Caroline University, which set out “to determine the best practices for creating and delivering online database instruction tutorials for optimal information accessibility.”30 They produced “identical” tutorials using Microsoft’s PowerPoint, Sonic Foundry’s MediaSite, and TechSmith’s Camtasia software. They chose to include PowerPoint because “previous research has shown that online students prefer PowerPoint presentations to video lectures.”31 Their testing results indicated that participants found specific tutorial features to be most effective: video (33.3 percent), mouse movements (57.1 percent), instructor presence (28.6 percent), audio instruction only (28.6 percent), and interaction (28.6 percent). They concluded that Camtasia tutorials provided optimal results for short sessions such as database instruction and that for instruction where video and audio of instructor + screen shots, MediaSite was more appropriate. However, they also determined that PowerPoint tutorials were an acceptable solution if cost were an important factor.32 In a separate study at Florida Atlantic University, researchers described the process of designing and creating library tutorials using the screencasting software Camtasia. In addition to the creation of the tutorials themselves, the authors described how the project entailed the development of policies and guidelines for the creation of library tutorials, as well as training for of librarians in using Camtasia software.33 This study provides another good example of the time investment involved in the creation of multimedia tutorials. STATIC VS. DYNAMIC TUTORIALS | TURNER, FUCHS, AND TODMAN | doi: 10.6017/ital.v34i4.5831 37 While the professional literature thus shows that Flash-based tutorial software is popular among librarians, and the desire to accommodate students with different learning styles is a laudable goal, at least one study suggests that the time and money involved in the creation of multimedia tutorials could be better spent in other ways. A University of Illinois Urbana-Champaign study found that students from different learning styles performed better after using tutorials made with a combination of text and screenshots than from tutorials created with Camtasia software.34 METHOD Usability Testing For the evaluation of tutorials in dynamic audio/video tutorials compared with text and image tutorials, the researchers employed usability testing, which is “watching people use something that you have created, with the intention of making it easier to use, or proving that it is easy to use.”35 Usability testing requires relatively small numbers of participants to provide meaningful results, and it does not require the selection from a representative sample population.36 Participants Group Number of Participants Control Group 1 5 Text-and-Image Group 1 5 Dynamic Audio/Video Group 1 5 Group 1 Total 15 Control Group 2 5 Text-and-Image Group 2 5 Dynamic Audio Video Group 2 5 Group 2 Total 15 Total Participants 30 Table 1. Breakdown of Participants Thirty freshmen at St. John’s University participated in this study. While usability- testing experts do not place a great deal of importance on recruiting participants from a specific target audience, the researchers wanted to choose users who were less likely to have had significant experience with university library database searching, since prior knowledge could make it harder to determine the effectiveness of the tutorials. They therefore chose freshmen as the participants in the study. They did not seek any other variables such as age, gender, INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2015 38 ethnicity/culture, or any other demographic information. Participants were recruited through the St. John’s Central portal, which is the main channel of internal communication at St. John’s University, and through which mass emails can be sent to a targeted population of students. The email to students provided a registration link to a Google form, which asked students to provide their name, year of study, time availability preference, and contact information. Freshmen were selected from the response list. As an incentive for participation, the student participants became eligible for a Kindle Fire tablet for each of the two rounds of the study. Prior to beginning the study, the authors consulted St. John’s University’s Office of Institutional Research, which oversees all research at the university, and provides approval for the study of human subjects. Since this study focused on tutorials rather than the participants themselves, the authors were granted a waiver for the study. Tests Usability testing typically involves having participants complete a task or tasks in front of an observer. For this study, the authors designed two tasks that required participants to find articles in Academic Search Premier EBSCO database (ASP EBSCO). The first task, given to all participants in the first round of tests, was relatively simple, and consisted of three components: finding an article about climate change published in the journal Lancet and downloading a copy of the citation for that article in MLA format from the database. Participants who attempted the first task were labeled “Group 1” (see appendix I). The second task was given to all participants in the second round of tests and was more complex, comprising five components. Participants were asked to find an article about the Deepwater Horizon spill from a peer-reviewed journal published after 2011 that included color photographs. As with the first task, these participants were also required to download a copy of the citation for the article in MLA format from the database. Participants who attempted the second task were labeled “Group 2” (see appendix II). Group 1 and Group 2 were divided into three subgroups each. The first subgroup was the control group and received no instruction. The second subgroup was given access to the dynamic audio/visual tutorial (see appendix III). The third subgroup was given access to the static text-and-image tutorial instruction (see appendixes IV and V). Each subgroup consisted of five unique participants. Each participant was scheduled for a specific fifteen-minute time slot. Tests were conducted in a small meeting room in the library, with one participant at a time working with the facilitator. As the participants entered the meeting room, the STATIC VS. DYNAMIC TUTORIALS | TURNER, FUCHS, AND TODMAN | doi: 10.6017/ital.v34i4.5831 39 facilitator greeted them and confirmed their identities. Participants were provided with an information sheet (see appendix VI), which told participants that the session would be recorded, that the researchers were concerned with testing the library- instruction tutorials, not the participants themselves, and that the tests were confidential and anonymous. Participants were also told that they could end the test at any time for any reason. Additionally, the facilitator read aloud the points-of- information sheet. Participants were invited to ask questions or voice concerns. For both rounds of tests, participants had use of a laptop computer with a browser window open to the ASP EBSCO home page. For those who received instruction, a second browser window was open to either the dynamic or the static tutorial. For members of the control group, no tutorial was available. Those who received instruction were allowed to return to the tutorial at any point they wished. Using Adobe Connect software, the testing activities, tutorials, participants’ attempt(s) at task(s), participants’ computer screen, and any conversation between the participants and the facilitators were simultaneously recorded and broadcast to a separate room, where the two other researchers observed, listened and took notes. The participants were asked to verbally describe the steps they were taking, as per the “think aloud” protocol that is essential to usability testing. Recorded sessions were then available for later review by the research team. On completing the task, participants who received either the text-and-image or dynamic audio/video tutorial were asked to complete a short questionnaire giving feedback on the instruction received (see appendix VII). Participants who received no instruction were not asked to provide feedback. Tutorials The researchers created four tutorials for this study. Two were Flash-based dynamic audio/video tutorials created using TechSmith’s Jing software. The static text-and-image tutorials were created using Microsoft Word, which was then converted into a PDF document. The dynamic and static tutorials mirrored each other in terms of content, and were designed with the specific goal of helping participants complete the tasks successfully, though in both cases there was some variation between the tutorials and the tasks. The tutorials received by group 1, for instance, showed participants how to find articles about the Occupy Wall Street movement, limiting the search to “published in the New York Times,” and how to download the citation in MLA format. The tutorials for group 2 showed participants how to find articles about climate change that included color photographs, limiting the search to peer-reviewed journals that were published after 2011. DISCUSSION INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2015 40 The results of the usability study revealed two things: participants benefited from library instruction, through which they evidently acquired new skills; and participants benefited more from static text-and-image tutorials than from the dynamic audio/video tutorials. In both rounds of tests, the participants who received the text-and-image tutorials performed the tasks more effectively than did members of the control group or those who viewed the dynamic tutorials. Group 1 For the first round of tests, members of the control group spent longer on the task and made more mistakes than those who received either the dynamic or the static tutorial (see table 2). For example, one participant in the control group was unable to download the MLA citation, and another in the control group ventured outside the ASP EBSCO database platform to find the correct citation format. When members of the control group did succeed, they did so without a clear search strategy, evidenced by their use of natural language instead of Boolean connectors. (ASP EBSCO uses Boolean connectors by default, and natural language is usually ineffective.) Another participant reached several dead-ends in the search before finally succeeding. While most of the control group participants were at least partially successful in completing the task, it is reasonable to suspect that they would have given up in frustration in a non-test situation, and would have benefited from point-of-need instruction. Control 1 Control 2 Control 3 Control 4 Control 5 Relevant Article Y Y Y Y Y Lancet Y Y Y Y Y MLA Citation Y N Y Y Y Time on Task (minutes) 8:28 2:49 6:30 2:41 1:42 Average time on task: 4:26 mins. Table 2. Task Completion Success and Time, Control Group 1 The participants who received the static text-and-image tutorial performed the best, completing the task with the highest speed and with the greatest accuracy (see table 3). All five of the participants in this group managed to find appropriate articles and to download the citation in MLA format, though several had difficulty with the final task. All were able to navigate to the “cite” feature effectively, but all participants chose to click on the “MLA” link rather than simply copy the citation. Clearer directions in the tutorial might alleviate this problem. STATIC VS. DYNAMIC TUTORIALS | TURNER, FUCHS, AND TODMAN | doi: 10.6017/ital.v34i4.5831 41 T&I 1 T&I 2 T&I 3 T&I 4 T&I 5 Relevant Article Y Y Y Y Y Lancet Y Y Y Y Y MLA Citation Y N Y Y Y Time on Task (minutes) 2:01 3:00 2:21 2:40 3:15 Average time on task: 2:39 mins. Table 3. Task Completion Success and Time, Text and Image Tutorial, Group 1 Participants who received the dynamic video tutorial were more successful than those in the control group, but spent significantly longer on task than did those who received the static tutorial (see table 4). Interestingly, two of the participants searched for “climate change” as the “subject term” in ASP EBSCO, even though the tutorial did not instruct them to do so. (SU - Subject Term is one of the options in the drop-down menu in ASP EBSCO, which otherwise searches citation and abstract by default.) While “climate change” is a commonly accepted scientific term, and the searches produced relevant search results, it is not generally advisable to begin a search with controlled vocabulary terms. T&I 1 T&I 2 T&I 3 T&I 4 T&I 5 Relevant Article Y Y Y Y Y Lancet Y Y Y Y Y MLA Citation Y Y Y Y Y Time on Task (minutes) 4:34 3:17 3:17 3:07 3:28 Average time on task: 3:32 mins. Table 4. Task Completion Success and Time, Dynamic A/V Tutorial, Group Figure 1. Average Time on Task in Minutes, Group 1 INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2015 42 Figure 2. Successful Task Completion: Group 1 Group 2 The advantages of text-and-image instruction were more pronounced in the second round of tests, which involved a more complex task (See Figure 3). As in the first round of tests, the participants in the control group had the lowest number of satisfactory task completions, and spent the greatest amount of time on task. Although most of the participants in control group 2 had at least partial success in completing the task, most did so through trial and error, and showed a general lack of understanding of database terminology and functions. One participant, for example, attempted to use “peer-review” and “color photographs” as search terms. Another attempted to search for “deepwater horizon” as a journal title. Only two of the participants completed all components of the task successfully. Two others partially completed the task – one found a suitable article with color photographs, but published in The Nation, which is not peer-reviewed. One user failed to complete any part of the task and gave up in frustration (See Table 5). Control 1 Control 2 Control 3 Control 4 Control 5 Relevant Article Y N Y Y Y Peer-Reviewed Y N Y N Y Publication Date Y N N Y Y Color Photos Y N Y Y Y MLA Citation Y N Y N Y Time on Task (minutes) 1:51 7:39 2:54 9:16 7:55 Average time on task: 5:55 mins. Table 5. Task Completion and Success and Time, Control Group 2 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Relevant Article Lancet MLA Citation Control Text and Images Video STATIC VS. DYNAMIC TUTORIALS | TURNER, FUCHS, AND TODMAN | doi: 10.6017/ital.v34i4.5831 43 In contrast, participants who received the text-and-image tutorial enjoyed the most success in round 2. Three of the five participants who received the static tutorial completed all components of the task successfully. Errors committed by the two others were related to publication date. Participants in this group also completed the task more rapidly than those from the other two groups. T&I 1 T&I 2 T&I 3 T&I 4 T&I 5 Relevant Article Y Y Y Y Y Peer-Reviewed Y Y Y Y Y Publication Date Y Y N N Y Color Photos Y Y Y Y Y MLA Citation Y Y Y N Y Time on Task (minutes) 6:33 2:46 3:00 4:50 3:24 Average time on task: 4:06 mins. Table 6. Task Completion and Success and Time, Text and Image Tutorial Group 2 As in group 1, however, all but one of the participants who received the text-and- image tutorial first attempted to download the MLA citation by clicking on the “MLA” link, rather than simply copying the text. Two of the participants referred back to the tutorials after they had begun the task, which was permissible according to the facilitator’s instructions. This suggests that the text-and image-tutorials are suitable for quick reference and allow users to access needed information at a glance. Video 1 Video 2 Video 3 Video 4 Video 5 Relevant Article Y Y Y Y Y Peer-Reviewed Y Y Y Y N Publication Date N N N Y N Color Photos Y Y N Y Y MLA Citation Y Y N Y Y Time on Task (minutes) 4:13 5:39 6:33 3:59 4:40 Average time on task: 4:57 mins. Table 7. Task Completion and Success and Time, A/V Tutorial Group 2 Among the five participants who received the dynamic audio/visual tutorial, only one completed all five components of the test successfully. One was unable to locate the citation feature, while another failed to limit to peer-reviewed articles. Four of the participants limited the publication date from 2011 to the present instead of 2012 to the present. All participants correctly used the publication limiter. Although given the option, none chose to return to the dynamic tutorial after starting the task. This might be because of the length of the tutorial (more than three minutes) and the difficulty in navigating to specific sections. INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2015 44 As noted above, participants in all groups tended to make errors related to publication date, which may have stemmed from the wording of the task itself rather than misunderstanding the functionality of the database. The task required participants to find articles published after 2011, but many found articles published from 2011 onward. Clearer wording of the task probably would have alleviated this problem. Figure 3. Average Time on Task in Minutes, Group 2 Figure 4. Successful Task Completion, Group 2 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Control Text and Image Video STATIC VS. DYNAMIC TUTORIALS | TURNER, FUCHS, AND TODMAN | doi: 10.6017/ital.v34i4.5831 45 Tutorial Feedback After completing the task, participants were asked to provide anonymous, written feedback on the instruction they received. (Members of the control groups were not asked to provide feedback because the purpose of the study was to compare different types of library tutorials.) Participants were asked ten questions, eight of which were on a Likert scale and two of which were open- ended. Although the feedback for both the static and dynamic tutorials was generally positive, the text-and-image tutorials also received higher combined scores than the audio/visual tutorials on the Likert Scale questions (see figures 5 and 6). Participants’ written feedback on the text-and-image tutorials was generally more positive than for the video tutorials. Commenting on the text and image tutorial, one participant remarked that it was a “great resource” while another said that it was “very easy to use. Will become really helpful when put into full effect.” Another observed that the tutorial “was pretty precise.” Not all the comments on the text and image tutorials were positive, however. More than one participant noted that the images used in the tutorials were blurry. One even suggested that “more animations to the text would make it much more open to people with different learning styles.” The feedback on the video tutorials was generally positive, with comments such as “very straightforward,” “helpful,” “easy to follow,” and “I would use this for school assignments.” However, a common complaint about the dynamic tutorials was that the audio was not very clear. (This may be because the quality of the microphone used for the recordings.) Other participants seemed to criticize the layout of the database itself, saying that bigger size of words would have made it easier to follow. Another complained that the dynamic tutorial was too simple, and that it should cover more advanced and in-depth topics. Figure 5. Tutorial Feedback Likert Score Averages Group 1 0 1 2 3 4 5 Text Tutorial Group #1 Video Tutorial Group #1 INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2015 46 Figure 6. Tutorial Feedback Likert Score Averages Group 2 CONCLUSION This study suggests that library users benefit from online instruction library instruction at point- of-need, and that text-and-image tutorials are more effective than dynamic audio/visual tutorials for its provision. Librarians should not assume that instructional tutorials must use Flash or other video technology, especially given the learning curve, time, and financial commitments involved in creating video tutorial software. Although the researchers in this study used the free software Jing, learning to use it effectively was still a significant investment in time. More importantly, it is evident that the participants learned more and were more satisfied with text-and-image tutorials, which were more easily navigated than dynamic audio/video tutorials and which allowed users to more easily review tutorial content than did dynamic audio/video tutorials. This study corroborates the findings of Mestre, who found that text-and-image tutorials were more effective than audio/video tutorials in teaching library skills.37 It also lends credence to the work of Bowles-Terry, Hincliffe, and Hutchinson, who found that users preferred tutorials that allowed them to read quickly and navigate to pertinent sections rather than watch and listen.38 As Lim suggests, it is important to create instructional material that is clearly written.39 It further suggests that regardless of the technology used, librarians should focus on creating content that is relevant and helpful to our user population. Again, it is worth noting that the control group, without the aid of point-of-need instructional materials, achieved some success in completing the tasks. It is possible that the members of the control group gained important knowledge simply by being told about ASP EBSCO and that there was enough implied information in the tasks themselves to provide basic information about the content and functionalities of the database. This suggests that databases like ASP EBSCO are intuitive enough that people can learn how to use them independently. The higher number of serious errors, and the greater length of time members of the control group spent on tasks, 0 1 2 3 4 5 Text Tutorial Group #2 Video Tutorial Group #2 STATIC VS. DYNAMIC TUTORIALS | TURNER, FUCHS, AND TODMAN | doi: 10.6017/ital.v34i4.5831 47 however, shows that efforts to raise student awareness of databases and library resources should be coupled with point-of-need instruction. Although the usability tests generally went smoothly, researchers did encounter occasional difficulties with audio between the testing room and the observation room when it became difficult to hear what the participant was saying as he or she was completing the task. Fortunately, the researchers kept recordings of each test, which allowed them to review those where the audio quality was less than optimal. To save time and run the tests more efficiently, however, the researchers recommend purchasing a high-quality microphone like those used for teleconferences. Furthermore, this study shows the broader value of usability testing of library instructional material. Although participants who received the text-and-image tutorials performed better than either of the other two groups, the tests helped researchers identify two problems with the tutorials: users found the images blurry and often misinterpreted how to download citations in MLA format. Such information gleaned from the user’s perspective would be valuable in creating future library online point-of-need instructional tutorials. REFERENCES 1. I. Elaine Allen and Jeff Seaman, “Grade Change: Tracking Online Education in the United States, 2013 | The Sloan Consortium,” Sloanconsortium.org, 2013, sloanconsortium.org/publications/survey/grade-change-2013. 2. M. Walter, “As Online Journals Advance, New Challenges Emerge,” Seybold Report on Internet Publishing 3, no. 1 (1998). 3. Rebecca Miller, “Dramatic Growth,” Library Journal 136, no. 17 (October 15, 2011): 32, www.thedigitalshift.com/2011/10/ebooks/dramatic-growth-ljs-second-annual-ebook- survey. 4. Megan Von Isenburg, “Undergraduate Student Use of the Physical and Virtual Library Varies According to Academic Discipline,” Evidence Based Library & Information Practice 5, no. 1 (April 2010): 130. 5. Sharon Q. Yang and Min Chou, “Promoting and Teaching Information Literacy on the Internet: Surveying the Web Sites of 264 Academic Libraries in North America,” Journal of Web Librarianship 8, no. 1 (2014): 88–104, doi: 10.1080/19322909.2014.855586. 6. Tess Tobin and Martin Kesselman, “Evaluation of Web-Based Library Instruction Programs,” www.eric.ed.gov/ERICWebPortal/contentdelivery/servlet/ERICServlet?accno=ED441454. 7. Nancy H. Dewald, “Transporting Good Library Instruction Practices into the Web Environment: An Analysis of Online Tutorials,” Journal of Academic Librarianship 25, no. 1 (January 1999): 26–31. http://sloanconsortium.org/publications/survey/grade-change-2013 http://www.thedigitalshift.com/2011/10/ebooks/dramatic-growth-ljs-second-annual-ebook-survey http://www.thedigitalshift.com/2011/10/ebooks/dramatic-growth-ljs-second-annual-ebook-survey http://dx.doi.org/10.1080/19322909.2014.855586 http://www.eric.ed.gov/ERICWebPortal/contentdelivery/servlet/ERICServlet?accno=ED441454 INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2015 48 8. Ibid. 9. Nancy H. Dewald, “Web-Based Library Instruction: What Is Good Pedagogy?,” Information Technology & Libraries 18, no. 1 (March 1999): 26–31. 10. Kelly A. Donaldson, “Library Research Success: Designing an Online Tutorial to Teach Information Literacy Skills to First-year Students,” Internet & Higher Education 2, no. 4 (January 2, 1999): 237–51, doi: 10.1016/S1096-7516(00)00025-7. 11. Marion Churkovich and Christine Oughtred, “Can an Online Tutorial Pass the Test for Library Instruction? An Evaluation and Comparison of Library Skills Instruction Methods for First Year Students at Deakin University,” Australian Academic Research Libraries 33, no. 1 (March 2002): 25–38. 12. Stephanie Michel, “What Do They Really Think? Assessing Student and Faculty Perspectives of a Web-Based Tutorial to Library Research,” College & Research Libraries 62, no. 4 (July 2001): 317–32. 13. Brenda Battleson, Austin Booth, and Jane Weintrop, “Usability Testing of an Academic Library Web Site: A Case Study,” Journal of Academic Librarianship 27, no. 3 (May 2001): 194. 14. Barbara J. Cockrell and Elaine Anderson Jayne, “How Do I Find an Article? Insights from a Web Usability Study,” Journal of Academic Librarianship 28, no. 3 (May 2002): 122–32, doi: 10.1016/S0099-1333(02)00279-3. 15. Brian Detlor and Vivian Lewis, “Academic Library Web Sites: Current Practice and Future Directions,” Journal of Academic Librarianship 32, no. 3 (May 2006): 251–58, doi: 10.1016/j.acalib.2006.02.007. 16. Adriene Lim, “The Readability of Information Literacy Content on Academic Library Web Sites,” Journal of Academic Librarianship 36, no. 4 (July 2010): 296–303, doi: 10.1016/j.acalib.2010.05.003. 17. Janice Krueger, Ron L. Ray, and Lorrie Knight, “Applying Web Usability Techniques to Assess Student Awareness of Library Web Resources,” Journal of Academic Librarianship 30, no. 4 (July 2004): 285–93, doi: 10.1016/j.acalib.2004.04.002. 18. Amy Fry and Linda Rich, “Usability Testing for e-Resource Discovery: How Students Find and Choose E-resources Using Library Web Sites,” Journal of Academic Librarianship 37, no. 5 (September 2011): 386–401, doi: 10.1016/j.acalib.2011.06.003. 19. Rebeca Befus and Katrina Byrne, “Redesigned with Them in Mind: Evaluating an Online Library Information Literacy Tutorial,” Urban Library Journal 17, no. 1 (Spring 2011): 1–26. http://dx.doi.org/10.1016/S1096-7516(00)00025-7 http://dx.doi.org/10.1016/S0099-1333(02)00279-3 http://dx.doi.org/10.1016/j.acalib.2006.02.007 http://dx.doi.org/10.1016/j.acalib.2010.05.003 http://dx.doi.org/10.1016/j.acalib.2004.04.002 http://dx.doi.org/10.1016/j.acalib.2011.06.003 STATIC VS. DYNAMIC TUTORIALS | TURNER, FUCHS, AND TODMAN | doi: 10.6017/ital.v34i4.5831 49 20. Donald L Kirkpatrick, Evaluating Training Programs: The Four Levels (San Francisco: Berrett- Koehler; Publishers Group West [distributor], 1994). 21. Rebeca Befus and Katrina Byrne, “Redesigned with Them in Mind: Evaluating an Online Library Information Literacy Tutorial,” Urban Library Journal 17, no. 1 (Spring 2011): 1–26. 22. Sharon A. Weiner et al., “Biology and Nursing Students’ Perceptions of a Web-Based Information Literacy Tutorial,” Communications in Information Literacy 5, no. 2 (September 2011): 187–201. 23. Janet Martin, Jane Birks, and Fiona Hunt, “Designing for Users: Online Information Literacy in the Middle East,” portal: Libraries & the Academy 10, no. 1 (January 2010): 57–73. 24. Melissa Bowles-Terry, Merinda Kaye Hensley, and Lisa Janicke Hinchliffe, “Best Practices for Online Video Tutorials in Academic Libraries: A Study of Student Preferences and Understanding,” Communications in Information Literacy 4, no. 1 (March 2010): 17–28. 25. Paul Betty, “Creation, Management, and Assessment of Library Screencasts: The Regis Libraries Animated Tutorials Project,” Part of a Special Issue on the Proceedings of the Thirteenth Off-Campus Library Services Conference, Part 1 48, no. 3/4 (October 2008): 295– 315, doi: 10.1080/01930820802289342. 26. Lori S. Mestre, “Matching Up Learning Styles with Learning Objects: What’s Effective?,” Journal of Library Administration 50, no. 7/8 (December 2010): 808–29, doi: 10.1080/01930826.2010.488975. 27. Sara L. Thornes, “Creating an Online Tutorial to Support Information Literacy and Academic Skills Development,” Journal of Information Literacy 6, no. 1 (June 2012): 81–95. 28. Richard D. Jones and Simon Bains, “Using Macromedia Flash to Create Online Information Skills Materials at Edinburgh University Library,” Electronic Library & Information Systems 37, no. 4 (December 2003): 242–50, www.era.lib.ed.ac.uk/handle/1842/248. 29. Thomas P. Mackey and Jinwon Ho, “Exploring the Relationships Between Web Usability and Students’ Perceived Learning in Web-Based Multimedia (WBMM) Tutorials,” Computers & Education 50, no. 1 (January 2008): 386–409. 30. Amy Blevins and C. W. Elton, “An Evaluation of Three Tutorial-creating Software Programs: Camtasia, PowerPoint, and MediaSite,” Journal of Electronic Resources in Medical Libraries 6, no. 1 (March 2009): 1–7, doi: 10.1080/15424060802705095. 31. Ibid., 2. 32. Ibid. http://dx.doi.org/10.1080/01930820802289342 http://dx.doi.org/10.1080/01930820802289342 http://www.era.lib.ed.ac.uk/handle/1842/248 http://dx.doi.org/10.1080/15424060802705095 INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2015 50 33. Alyse Ergood, Kristy Padron, and Lauri Rebar, “Making Library Screencast Tutorials: Factors and Processes,” Internet Reference Services Quarterly 17, no. 2 (April 2012): 95–107, doi: 10.1080/10875301.2012.725705. 34. Lori S. Mestre, “Student Preference for Tutorial Design: a Usability Study,” Reference Services Review 40, no. 2 (May 2012): 258–76, http://dx.doi.org/10.1108/00907321211228318. 35. Steve Krug, Rocket Surgery Made Easy: The Do-It-Yourself Guide to Finding and Fixing Usability Problems (Berkeley, CA: New Riders, 2010), 13. 36. Jakob Nielsen, “Why You Only Need to Test with 5 Users,”Nielsen Norman Group, March 19, 2000, www.nngroup.com/articles/why-you-only-need-to-test-with-5-users. 37. Mestre, “Student Preference for Tutorial Design,” 258. 38. Bowles-Terry, Hensley, and Hinchliffe, “Best Practices for Online Video Tutorials in Academic Libraries,” 22. 39. Lim, “The Readability of Information Literacy Content on Academic Library Web Sites,” 302. APPENDIX I. TASK 1 In Academic Search Premier (EBSCO), find an article about climate change, published in Lancet. Then copy a citation to the article in MLA format. APPENDIX II. TASK 2 Complete the following task using Academic Search Premier (EBSCO). Take as long as you need. Remember also to “think out loud” through the process. a) Find an article about deepwater horizon oil spill published in a peer-reviewed journal after 2011, which includes color photographs. b) After you find an article, copy its citation in MLA format. APPENDIX III. Dynamic Audio/Video Tutorials Group 1 (Basic): http://screencast.com/t/5Uln4H8XR Group 2 (Advanced): http://screencast.com/t/c9kZkgOfx6 http://dx.doi.org/10.1080/10875301.2012.725705 http://www.nngroup.com/articles/why-you-only-need-to-test-with-5-users http://screencast.com/t/5Uln4H8XR http://screencast.com/t/c9kZkgOfx6 STATIC VS. DYNAMIC TUTORIALS | TURNER, FUCHS, AND TODMAN | doi: 10.6017/ital.v34i4.5831 51 APPENDIX IV. Text-and-Image Tutorial 1 INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2015 52 APPENDIX V. Text-and-Image Tutorial 2 STATIC VS. DYNAMIC TUTORIALS | TURNER, FUCHS, AND TODMAN | doi: 10.6017/ital.v34i4.5831 53 APPENDIX VI. Information Sheet St. John’s University Libraries Web Site Usability Study Information Sheet Thank you for participating in the SJ Libraries’ Usability Study! Before beginning the test, please read the following: • The computer screen, your voice, and the voice of the facilitator will be recorded. • The results of this study may be published in an article, but no identifying information will be included in the article. • Your participation in this study is totally confidential. • You may stop participating in the study at any time, and for any reason. INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2015 54 APPENDIX VII. Tutorial Questionnaire Thank you for participating in the St. John’s University Libraries’ Tutorial Usability Study. Please take a few moments to answer this brief survey. Please refer to the following scale when answering the questionnaire, and circle the correct response. 1 =no, not at all 2 = not likely 3 = neutral (not sure, maybe) 4 = likely 5 = yes, absolutely 1. The tutorial was easy to follow. 1 2 3 4 5 2. I felt comfortable using the tutorial. 1 2 3 4 5 3. The graphics on the tutorial were easy to use. 1 2 3 4 5 4. The language/text on the tutorial was easy to understand. 1 2 3 4 5 5. I would use StJ Libraries’ tutorials on my own in the future. 1 2 3 4 5 6. I would recommend the StJ Libraries’ tutorials to my friends. 1 2 3 4 5 7. I was able to complete the tasks with ease. 1 2 3 4 5 8. I would be able to repeat the task now without the aid of the tutorial. 1 2 3 4 5 9. What changes would you make to the tutorial? Additional comments and suggestions? LITERATURE REVIEW Early Development of Online Instruction Tutorials Library Website Usability Testing Evaluation of Online Information-Literacy Instruction Tutorials 5770 ---- Microsoft Word - June_ITAL_Ellern_final.docx User  Authentication  in  the  Public     Area  of  Academic  Libraries  in     North  Carolina   Gillian  (Jill)  D.  Ellern,     Robin  Hitch,  and   Mark  A.  Stoffan       INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  JUNE  2015     103         ABSTRACT   The  clash  of  principles  between  protecting  privacy  and  protecting  security  can  create  an  impasse   between  libraries,  campus  IT  departments,  and  academic  administration  over  authentication  issues   with  the  public  area  PCs  in  the  library.  This  research  takes  an  in-­‐depth  look  at  the  state  of   authentication  practices  within  a  specific  region  (i.e.,  all  the  academic  libraries  in  North  Carolina)  in   an  attempt  to  create  a  profile  of  those  libraries  that  choose  to  authenticate  or  not.    The  researchers   reviewed  an  extensive  amount  of  data  to  identify  the  factors  involved  with  this  decision.   INTRODUCTION   Concerns  surrounding  usability,  administration,  and  privacy  with  user  authentication  on  public   computers  are  not  new  issues  for  librarians.  However,  in  recent  years  there  has  been  increasing   pressure  on  all  types  of  libraries  to  require  authentication  of  public  computers  for  a  variety  of   reasons.  Since  the  9/11  tragedy,  there  has  been  increasing  legislation  such  as  the  Uniting  and   Strengthening  America  by  Providing  Appropriate  Tools  Required  to  Intercept  and  Obstruct   Terrorism  Act  of  2001  (USA  PATRIOT  Act)  and  Communications  Assistance  for  Law  Enforcement   Act  (CALEA).    In  response,  administrators  and  campus  IT  staff  have  become  increasingly   concerned  about  allowing  open  access  anywhere  on  their  campuses.    Restrictive  licensing   agreements  for  specialized  software  and  web  resources  are  also  making  it  necessary  or  attractive   to  limit  access  to  particular  academic  subgroups  and  populations.    Permitting  access  to  secured   campus  storage  from  these  computers  can  make  it  necessary  for  libraries  to  think  about  the   necessity  of  authentication.    And  finally,  the  general  state  of  the  economy  has  increased  the  user   traffic  to  libraries,  sometimes  making  it  necessary  to  control  the  use  of  limited  computer   resources.  Authenticating  can  often  make  these  changes  easier  to  implement  and  can  give  the   library  more  control  over  its  IT  environment.         That  being  said,  authentication  comes  at  a  price  for  librarians.  Authentication  often  creates  ethical   issues  with  regards  to  patron  privacy,  freedom  of  inquiry,  increasing  the  complexity  of  using   public  area  machines,  and  restricting  the  open  access  needs  of  public  or  guest  users.    Requiring  a   patron  to  log  into  a  computer  can  make  it  possible  for  organizations  outside  the  library’s  control     Gillian  (Jill)  D.  Ellern  (ellern@email.wcu.edu)  is  Systems  Librarian,  Robin  Hitch     (rhitch@email.wcu.edu)  is  Tech  Support  Analyst,  and  Mark  A.  Stoffan  (mstoffan@email.wcu.edu)   is  Head,  Digital,  Access,  and  Technology  Services,  Western  Carolina  University,  Cullowhee,  North   Carolina.     USER  AUTHENTICATION  IN  THE  PUBLIC  LIBRARY  AREA  OF  ACADEMIC  LIBRARIES  IN  NORTH  CAROLINA  |     104   ELLERN,  HITCH,  AND  STOFFAN   doi:  10.6017/ital.v34i2.5770   to  collect,  review  and  use  data  of  a  patron’s  searching  habits  or  online  behaviors.  Issues  associated   with  managing  patron  logins  can  also  create  barriers  for  access  as  well  as  being  time  consuming   and  frustrating  for  both  the  patron  and  the  library  staff.1  While  open,  anonymous  access  does  not   completely  protect  against  these  issues,  it  can  help  to  create  an  environment  of  free,  private  and   open  access  similar  to  the  longstanding  situation  with  the  book  collection  in  most  libraries.     The  Hunter  Library  Experience   While  working  on  the  implementation  of  a  new  campus-­‐wide  pay-­‐for-­‐print  solution  in  2009,   librarians  from  the  Hunter  Library  at  Western  Carolina  University  began  to  feel  pressured  by  the   campus  IT  department  to  change  its  practice  of  allowing  anonymous  logins  to  all  the  computers  in   the  public  areas  of  the  library.    Concerns  about  authenticating  users  on  library  public  area   machines  had  been  building  between  these  two  units  for  several  years.    The  resulting  clash  of   principles  between  protecting  privacy  and  protecting  security  came  to  a  head  over  this  project.   The  Hunter  Library  employees  perceived  that  there  needed  to  be  more  time  for  research  and   debate  before  implementing  the  preceded  mandate.  Initially,  there  was  great  resistance  from   campus  IT  staff  to  take  the  library’s  concerns  into  account,  but  eventually  a  compromise  was   worked  out  that  allowed  the  library  to  retain  anonymous  logins  on  its  public  computers.    The   confrontation  led  library  staff  to  investigate  the  practices  of  other  libraries,  particularly  within  the   University  of  North  Carolina  (UNC)  System  of  which  it  is  a  member.    It  seemed  a  logical   development  to  extend  the  initial  research  into  the  authentication  practices  throughout  the  state   of  North  Carolina.   The  Problem   One  of  the  first  questions  asked  by  Western  Carolina’s  library  administration  of  the  systems   department  was  what  other  libraries  in  the  area  were  doing.    In  our  case,  the  library  director   specifically  asked  how  many  of  West  Carolina’s  sister  universities  were  authenticating  and  why.   Anecdotally,  during  this  process,  it  seemed  that  many  other  University  of  North  Carolina  System   libraries  reported  being  pressured  to  authenticate  their  public  computers  by  organizations   outside  the  library,  most  often  the  campus  IT  department.   When  the  librarians  at  the  Hunter  Library  began  looking  at  research  to  support  their  position,   hard  data  and  practical  arguments  that  could  be  used  to  effectively  argue  their  case  against  this   change,  helpful  literature  seemed  to  be  lacking.  Some  items  were  found  such  as  Carlson,  writing  in   the  Chronicle  of  Higher  Education,  who  reported  on  the  divide  between  access  and  security.  He   confirmed  that  other  librarians  also  have  ambivalent  feelings  about  authentication  issues  but  that   there  was  also  growing  understanding  in  libraries  about  the  potential  vulnerability  of  networks  or   misuse  of  their  resources.2     It  seemed  that  the  speed  at  which  authenticating  computers  in  the  public  areas  of  libraries  was   happening  across  the  country  had  not  really  allowed  the  literature  on  the  subject  to  quite  catch  up.     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  JUNE  2015     105           Those  studies  that  existed  such  as  SPEC  Kits  seem  to  address  the  issue  from  the  perspective  of   larger  research  libraries  or  else  did  not  systematically  assess  other  specific  groups  of  libraries.3,4     There  were  questions  in  our  minds  about  whether  the  current  research  that  was  found  would   describe  the  trends  and  unique  situations  of  libraries  located  in  rural  areas  or  in  other  types  of   academic  libraries.  There  seemed  to  be  no  current  statewide  or  geographically  defined  analysis  of   authentication  practices  across  various  types  of  academic  libraries  in  a  specific  state  or  region,  nor   were  there  any  available  studies  creating  a  profile  of  libraries  more  likely  to  authenticate   computers  in  their  public  areas.  We  questioned  if  the  rural  nature  of  our  settings,  our  mission,  or   our  geographic  area  in  the  South  might  reinforce  or  hurt  our  position  with  IT.    Authentication   status  is  not  something  that  is  mentioned  in  the  ALA  directory  nor  is  this  kind  of  information  often   given  on  a  library’s  web  site.    We  found  that  individuals  usually  need  to  call  or  visit  the  library   directly  if  they  want  to  know  about  a  library’s  authentication  practices.   During  the  initial  investigation,  the  need  for  this  kind  of  information  to  support  the  library’s   perspective  became  clear.    This  question  led  to  the  creation  of  this  survey  of  authentication   practices  in  a  larger  geographical  area  and  across  various  kinds  of  academic  libraries.    The  goals  of   this  research  were  to  determine  some  answers  to  the  following  questions:   • What  is  the  current  state  of  authentication  practices  in  the  public  area  of  academic  libraries   in  North  Carolina?       • What  factors  caused  these  libraries  to  make  the  decisions  that  they  did  in  regards  to   authentication?   • Could  you  predict  whether  an  academic  library  would  require  users  to  authenticate?   LITERATURE  REVIEW   A  number  of  studies  have  discussed  various  other  aspects  of  user  authentication  in  libraries,   including  privacy  and  academic  freedom  concerns,  guest  access  policies,  differing  views  of  privacy   and  access  between  library  and  campus  IT  departments,  and  legislation  impacting  library   operations.  All  are  potential  factors  impacting  decisions  on  authentication  of  patron  accessible   computers  located  in  the  public  areas  of  library.   Privacy  and  academic  freedom  about  the  use  of  a  library’s  collection  have  long  been  major   concerns  for  librarians  even  before  information  technology  was  introduced.  The  impact  of  9/11   and  the  PATRIOT  Act  made  the  discussion  of  computers  and  network  security,  especially  in  the   library  environment  much  more  entwined.    Oblinger  discussed  online  access  concerns  in  the   context  of  academic  values,  focusing  on  unique  aspects  of  the  academic  mission.  She  discussed  the   results  of  an  EDUCAUSE/Internet2  Computer  and  Network  Security  Task  Force  invitational   workshop  that  established  a  common  set  of  principles  as  a  starting  point  for  discussion:  civility   and  community,  academic  and  intellectual  freedom,  privacy  and  confidentiality,  equity  of  access  to   resources,  fairness,  and  ethics.  All  of  these  principles,  she  argues,  are  integral  to  the  environment     USER  AUTHENTICATION  IN  THE  PUBLIC  LIBRARY  AREA  OF  ACADEMIC  LIBRARIES  IN  NORTH  CAROLINA  |     106   ELLERN,  HITCH,  AND  STOFFAN   doi:  10.6017/ital.v34i2.5770   of  a  university  and  concluded  that  security  is  a  complex  topic  and  that  written,  top-­‐imposed   policies  alone  will  not  adequately  address  all  concerns.5  While  not  directly  addressing  the  issues  of   the  library’s  public  computer  access  in  particular,  she  established  a  framework  of  values  on  how   security  issues  relate  to  the  university  culture  of  freedom  and  openness.   Dixon  in  an  article  written  for  library  administrators  discussed  privacy  practices  for  libraries   within  the  context  of  the  library  profession’s  ethical  concerns.  She  highlights  such  documents  as   the  Code  of  Ethics  of  the  American  Library  Association6,  the  Fair  Information  Practices  adopted  by   the  Organization  for  Economic  Cooperation  and  Development7,  and  the  NISO  Best  Practices  for   Designing  Web  Services  in  the  Library  Context8.    She  also  reviews  a  variety  of  ways  that  patron  data   may  be  misused  or  compromised.  She  stated  that  all  the  ways  that  patron  data  can  be  be  stored  or   tracked  by  local  networks,  IT  departments,  or  Internet  service  providers  may  not  be  fully   understood  by  librarians.  While  most  librarians  ardently  maintain  the  privacy  of  patron   circulation  records,  she  points  out  that  similar  usage  data  on  online  activities  may  be  collected   without  the  librarians  or  their  patrons  being  aware.  Dixon  studied  the  current  literature  and   maintained  that  libraries  need  to  be  closely  involved  in  decisions  about  the  collection  and   retention  of  patron  usage  data,  especially  when  patron  authentication  and  access  is  controlled  by   external  agencies  such  as  campus  or  city  IT  departments,  because  of  a  tendency  for  security  to   prevail  over  privacy  and  free  inquiry.9  This  theme  was  of  major  importance  to  us  in  preparing  the   present  study  as  it  shows  that  we  are  not  alone  in  these  concerns.   Carter  focused  on  the  balance  between  security  and  privacy  and  suggested  several  possible   scenarios  for  addressing  both  areas.  He  emphasized  librarian  values  involving  privacy  and   intellectual  freedom,  contrasting  the  librarian’s  focus  on  unrestricted  access  with  the  over-­‐arching   security  concerns  of  computing  professionals.  He  discussed  several  computer  access  policies  in   use  at  various  institutions  and  possible  approaches.  These  options  include  computer   authentication  (with  associated  privacy  concerns),  open  access  stations  visually  monitored  from   staffed  desks,  or  routine  purging  of  user  logs  at  the  end  of  each  session.  He  also  suggested   librarians  lobby  state  legislatures  to  have  computer  usage  logs  included  in  laws  governing  the   confidentiality  of  library  records.10   Still  and  Kassabian  provided  a  good  summary  of  Internet  access  issues  as  they  affected  academic   libraries  from  legal  and  ethical  perspectives.  They  suggested  that  librarians  focus  on  public   obligations,  free  speech  and  censorship,  and  potential  for  illegal  activities  occurring  on  library   workstations.  The  issues  highlighted  in  the  article  have  increased  in  the  15  years  since  the  article   was  written  but  it  remains  the  best  available  overview.11  The  arguments  put  forth  in  this  article   proved  relevant  for  us  in  understanding  the  multitude  of  viewpoints  regarding  authentication   even  before  9/11.     In  the  post-­‐9/11  era,  Essex  discussed  the  USA-­‐PATRIOT  Act  and  its  implications  for  libraries  and   patron  privacy.  Some  of  the  9/11  terrorists  were  reported  to  have  made  use  of  public  library   computers  in  the  days  before  the  attack.  This  has  led  to  heighted  concern  about  patron  privacy     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  JUNE  2015     107           among  librarians.  Accurate  assessment  of  its  impact  is  difficult  due  to  restrictions  placed  on   libraries  in  even  disclosing  that  they  have  been  subjected  to  search.12  While  not  directly   addressing  authentication,  the  article  highlights  privacy  issues  surrounding  library  records  of  all   types.     One  of  the  arguments  in  not  requiring  authentication  in  the  public  area  is  the  use  by  unaffiliated   users  of  academic  libraries.    This  is  especially  true  in  rural  areas  where  an  academic  library  might   be  some  of  the  best-­‐funded,  comprehensive  and  accessible  resources  in  a  geographical  area.    Even   in  urban  areas,  guest  access  by  unaffiliated  users  is  a  growing  issue  for  many  academic  libraries   because  of  limited  resources,  software  licensing  problems  and  public  access  to  campus   infrastructure.  While  most  institutions  have  traditionally  offered  basic  library  services  to   unaffiliated  patrons,  the  online  environment  has  raised  new  problems.  Weber  and  Lawrence   provided  one  of  the  best  studies  of  these  issues.    Their  work  surveyed  Association  of  Research   Libraries  (ARL)  member  libraries  to  determine  the  extent  of  mandatory  logins  to  computer   workstations  and  document  how  online  access  was  provided  to  non-­‐affiliated  guest  users.  They   concentrated  their  study  questions  on  Federal  and  Canadian  Depository  libraries  that  must   provide  some  type  of  access  to  online  government  information,  with  or  without  authentication.   Less  than  half  of  respondents  reported  having  any  written  policies  governing  open  access  on   computers  or  guest  access  policies.  Of  the  61  responding  libraries  to  the  survey,  32  required  that   affiliated  users  authenticate,  and  of  these  libraries  and  23  had  a  method  for  authenticating  guest   users.13  This  article,  which  was  published  just  as  this  study  was  testing  and  evaluating  the  survey   instrument,  proved  to  be  very  useful  as  we  worked  with  our  questions  in  Qualtrics™      and  dealt   with  the  IRB  requirements.     Courtney  explored  a  half-­‐century  of  changes  in  access  policies  for  unaffiliated  library  users.   Viewing  the  situation  from  somewhat  early  in  the  shift  from  print  to  electronic  resources,  she   foresaw  the  potential  for  significantly  reduced  access  to  library  resources  for  non-­‐affiliated   patrons.  These  barriers  would  be  created  by  access  policy  issues  with  computing  infrastructure   and  licensing  limitations  by  database  vendors.    This  is  especially  true  if  a  library’s  licenses  or   policies  did  not  specifically  address  use  by  unaffiliated  users.  She  concluded  that  decisions  about   guest  access  to  online  library  resources  should  be  made  by  librarians  and  not  be  handed  over  to   vendors  or  campus  computing  staff.14  Our  study  began  as  a  result  of  this  very  issue,  i.e.,  an  outside   entity  (campus  IT)  determining  how  access  to  library  resources  should  be  controlled,  without   input  by  librarians  or  library  staff.   Courtney  also  surveyed  814  academic  libraries  to  assess  their  policies  for  access  by  unaffiliated   users.    She  focused  on  all  library  services  including  building  access,  reference  assistance,  and   borrowing  privileges  in  addition  to  online  access.  Many  libraries  were  also  cancelling  print   subscriptions  in  favor  of  online  access  and  she  questioned  the  impact  this  might  have  on  use  by   unaffiliated  users.    While  suggesting  little  correlation  between  decisions  to  cancel  paper   subscriptions  and  requiring  authentication  of  computer  workstations,  she  concluded  that  reduced     USER  AUTHENTICATION  IN  THE  PUBLIC  LIBRARY  AREA  OF  ACADEMIC  LIBRARIES  IN  NORTH  CAROLINA  |     108   ELLERN,  HITCH,  AND  STOFFAN   doi:  10.6017/ital.v34i2.5770   access  by  unaffiliated  users  would  be  an  unintended  consequence  of  this  change.15  This  article   proved  valuable  to  us  in  framing  our  study,  as  it  gave  us  some  idea  of  what  we  might  expect  to  find   and  provided  some  concepts  to  use  when  we  formulated  our  survey.     Best-­‐Nichols  surveyed  public  use  policies  in  11  NC  tax-­‐supported  academic  libraries  and  asked   similar  questions  to  our  own.  This  study  was  dated  and  didn’t  address  computer  resources,  but   some  of  the  same  issues  were  addressed.16  Public  use  and  authentication  policies  have  the   potential  to  impact  one  another  and  how  the  library  responds.       Courtney  called  on  librarians  to  conduct  a  carefully  thought  out  discussion  of  user  authentication   because  of  the  implications  for  public  access  and  freedom  of  inquiry.  While  librarians  are   traditionally  passionate  at  protecting  patron  privacy  involving  print  resources,  many  are  unaware   of  related  concerns  involving  online  authentication.  She  advocated  for  more  education  and  open   debate  of  the  issues  because  of  the  potential  gravity  of  leaving  decision-­‐making  in  the  hands  of   database  vendors  or  campus  IT  departments.  Decisions  regarding  authentication  and  privacy   impact  library  services  and  access,  and  therefore  need  to  include  input  from  librarians.17  As  this   study  included  a  summary  of  the  reasons  for  authentication  as  provided  by  surveyed  libraries,  it   also  gave  us  another  reference  point  to  use  when  comparing  our  results  and  highlighted  the   intellectual  freedom  issues  that  were  often  missing  or  glossed  over  in  other  studies.   Barsun  surveyed  the  Web  sites  of  the  100  Association  of  Research  Libraries  to  assess  services  to   unaffiliated  users  in  four  areas:  building  access,  circulation  policies,  interlibrary  loan  services,  and   access  to  online  databases.  61  member  libraries  responded  to  requests  for  data.  She  explored  the   question  of  whether  the  policies  governing  these  services  would  be  found  on  a  library’s  web  site.   She  perceived  a  possible  disparity  between  increasing  demand  for  services  generated  by  members   of  the  public  who  are  discovering  a  library’s  resources  via  online  searching  and  the  library’s  ability   or  willingness  to  serve  outside  users.  While  she  did  not  address  computer  authentication  issues   directly,  she  did  find  that  a  significant  percentage  of  academic  library  web  sites  were  ambiguous   about  stating  the  availability  of  non-­‐authenticated  access  to  databases  from  onsite  computers.18   This  ambiguity  could  possibly  be  related  to  vague  usage  agreements  with  database  vendors  that   do  not  clearly  state  whether  non-­‐affiliated  users  may  obtain  onsite  access  to  these  resources.  In   “secret  shopper”  visits  done  as  part  of  our  own  research,  we  saw  a  disparity  between  what  was   stated  on  a  library’s  web  site  and  the  reality  of  access  offered.     METHOD   It  seemed  appropriate  to  start  this  project  with  a  regional  focus.      None  of  the  studies  available   looked  at  authentication  geographically.    Because  colleges  and  universities  within  a  state  are  all   subjected  to  the  same  economic,  political  and  environmental  factors,  looking  at  the  libraries  might   help  provide  some  continuity  for  creating  a  relevant  profile  of  current  practices.    North  Carolina   has  a  substantial  number  of  academic  libraries  (114)  with  a  wide  variety  of  demographics.     Historically,  the  state  supports  a  strong  educational  system  with  one  of  the  first  public  university     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  JUNE  2015     109           systems.    Together  with  the  17  universities  within  University  of  North  Carolina  system,  the  state   has  59  public  community  colleges,  36  private  colleges  and  universities,  and  3  religious  institutions.   Religious  colleges  are  identified  as  those  whose  primary  degree  is  in  divinity  or  theology.    (See   Chart  1.)     Chart  1.  Survey  participation  by  type  of  academic  library.   Work  had  been  started  to  identify  the  authentication  practices  of  other  UNC  System  libraries,  so   the  researchers  expanded  the  data  to  include  the  other  academic  libraries  within  the  state.  To   create  a  list  of  the  library’s  pertinent  information  for  this  investigation,  the  researchers  used  the   American  Library  Directory19,  the  NC  State  Library’s  online  directories  of  libraries20,  and  visited   each  library’s  web  page  to  create  a  database.  The  researchers  augmented  each  library’s  data  to   include  information  including  the  type  of  academic  library  (public,  private,  UNC  System  and   religious),  current  contact  information  on  personnel  who  might  be  able  to  answer  questions  on   authentication  policies  and  practices  in  that  library,  current  number  of  books,  institutional   enrollment  figures,  and  the  name  and  population  of  the  city  or  town  in  which  the  library  was   located.  The  library’s  responses  to  the  survey  were  also  tracked  in  the  database  with  SPSS  and   Excel  employed  in  evaluating  the  collected  data.   A  Western  Carolina  Institution  Review  Board  (IRB)  “Request  for  Review  of  Human  Subject   Research”  was  submitted  and  approved  using  the  following  statement:  “We  want  to  know  the   authentication  situation  for  all  the  college  libraries  in  North  Carolina.”    The  researchers  discovered   quickly  that  the  definition  of  “authentication”  would  have  to  be  explained  to  the  review  board  and   many  of  the  responding  librarians  that  filled  out  the  survey.  The  research  goal  was  further   simplified  with  the  explanation  of  authentication  as  “how  do  patrons  identify  themselves  to  get     USER  AUTHENTICATION  IN  THE  PUBLIC  LIBRARY  AREA  OF  ACADEMIC  LIBRARIES  IN  NORTH  CAROLINA  |     110   ELLERN,  HITCH,  AND  STOFFAN   doi:  10.6017/ital.v34i2.5770   access  to  a  computer  in  the  public  area  of  a  library”  because  many  librarians  might  not  realize  that   what  they  do  is  “authentication”.       During  the  approval  phase,  there  was  some  question  about  whether  the  researchers  needed   formal  approval  because  much  of  the  information  could  be  collected  by  just  visiting  the  libraries  in   person.    The  researchers  saw  no  risk  of  potentially  disclosing  confidential  data.  However,  it  was   decided  that  it  was  better  to  go  through  the  approval  process,  since  the  survey  asked  the  librarians   whether  they  were  being  required  to  authenticate  by  outside  entities.    There  might  also  be  a  need   to  do  some  follow-­‐up  calls  and  there  was  a  plan  to  do  site  visits  to  the  local  libraries  in  order  test   the  data  for  accuracy.     The  Qualtrics™  online  survey  system  was  used  to  create  the  survey  and  collect  the  responses.     Contact  information  from  the  database  was  uploaded  to  the  survey  system  with  the  IRB  approved   introductory  letter  to  each  library  contact  person  along  with  a  link  to  the  survey.    The  introductory   letter  described  the  goals  of  the  project  and  included  an  invitation  to  participate  as  well  as  refusal   language  as  required  by  the  IRB  request.  The  same  language  was  used  in  the  follow  up  emails  and   phone  calls.   The  initial  (16)  surveys  were  administered  to  the  UNC  System  libraries  in  October  –  December   2010  as  a  test  of  the  delivery  and  collection  system  on  Qualtrics™,  with  the  rest  of  the  libraries   being  sent  the  survey  mid-­‐December  2010.         In  the  spring  of  2011,  the  researchers  followed  initial  survey  with  a  second  letter  and  then  with   phone  calls  and  emails.  During  the  follow  up  calls,  some  librarians  chose  to  answer  the  survey   questions  with  the  researcher  filling  it  out  over  the  phone.    Most  filled  out  the  survey  themselves.     The  final  surveys  were  completed  in  April  2011.    Because  the  status  of  authentication  is  volatile,   this  survey  data  and  research  represents  a  snapshot  in  time  of  their  authentication  practices   between  October  2010  and  April  2011.  The  researchers  did  see  changes  happening  over  the   course  of  the  surveying  process  and  made  changes  to  any  data  collected  in  follow  up  contact  in   order  to  maintain  the  most  current  information  about  that  library  for  the  charts,  graphs  and   presentations  made  from  the  data.     In  Fall  2011,  the  researchers  did  a  “secret  shopper”  type  expedition  to  the  nearest  academic   libraries  by  visiting  in  person  as  a  guest  user.    The  main  purpose  of  these  visits  was  to  check  the   data,  take  pictures  of  the  library  public  areas,  get  a  firsthand  experience  with  the  variety  of   authentication  practices,  and  talk  to  and  thank  the  librarians  that  participated.   The  Survey   The  survey  asked  36  different  questions  using  a  variety  of  pull  down  lists,  check  boxes  and  fill  in   the  blank  questions.    Qualtrics™  allows  for  the  survey  to  have  seven  branches,  or  skip  logic,  that   asked  further  questions  depending  upon  the  answer  given.    These  branches  allowed  the  survey   software  to  skip  particular  sections  or  ask  for  additional  information  depending  on  the  answers     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  JUNE  2015     111           supplied.    Some  libraries,  especially  those  that  didn’t  authenticate  or  didn’t  know  specific  details,   might  be  asked  as  little  as  14  questions  while  others  received  all  36.  The  setup  of  computers  in  the   public  area  of  libraries  can  be  quite  variable,  especially  if  the  library  differentiates  between   student-­‐only  and  guest/public  use  only  workstations.  The  survey  questions  were  grouped  into   seven  basic  areas:  Descriptive,  Authentication,  Student-­‐only  PCs,  Guest/Public  PCs,  Wireless   Access,  Incident  Reports,  and  Computer  Activity  Logs.     The  full  survey  is  included  as  Appendix  A.   Initial  Hypothesis   Given  the  experience  at  the  Hunter  Library,  we  expected  the  following  factors  might  influence  a   decision  to  authenticate.    Some  of  these  basic  assumptions  did  influence  our  selection  of  questions   in  the  seven  areas  of  the  survey.     We  expected  to  find:   • When  the  workstations  were  under  the  control  of  campus  IT,  authentication  would  usually   be  required   • When  the  workstations  were  under  the  control  of  the  library,  authentication  would   probably  not  be  required   • That  factors  such  as  population,  enrollment,  and  book  volume  would  play  a  role  in   decisions  to  authenticate     • That  librarians  would  not  be  aware  of  what  user  information  was  being  logged  whether  or   not  authentication  was  required   • A  library  would  have  experienced  incidents  involving  the  computers  in  the  public  area  that   the  library  would  have  authentication     • That  authentication  increased  from  post-­‐  9/11  factors  and  its  legal  interpretations  to  force   libraries  to  authenticate   SURVEY  QUESTIONS,  RESPONSES,  AND  GENERAL  FINDINGS     The  data  collected  from  this  survey,  especially  from  those  libraries  that  did  authenticate,  produced   over  200  data  points  for  each  library.  Below  are  those  that  resulted  in  answers  to  questions  posed   at  the  outset  that  particularly  looked  at  overall  authentication  practices.  Further  articles  are   planned  to  look  at  areas  of  inquiry  with  regards  to  other  related  practices  in  the  public  areas  of   academic  libraries  geographically.   There  are  114  academic  libraries  in  North  Carolina.  As  a  result  of  the  follow  up  emails  and  phone   calls,  this  research  survey  got  an  exceptional  99.1%  response  rate  (113  out  of  114).    Once  the     USER  AUTHENTICATION  IN  THE  PUBLIC  LIBRARY  AREA  OF  ACADEMIC  LIBRARIES  IN  NORTH  CAROLINA  |     112   ELLERN,  HITCH,  AND  STOFFAN   doi:  10.6017/ital.v34i2.5770   appropriate  librarians  were  contacted  and  understood  the  scope  and  purpose  of  this  study,  they   were  very  cooperative  and  willing  to  fill  out  the  survey.    Those  who  were  contacted  via  phone   mentioned  that  the  original  email  was  overlooked  or  lost.    Only  one  library  refused  to  participate   in  the  study.     Individual  library’s  demographics  were  collected  in  a  database  by  using  directory  and  online   information.    The  data  was  matched  with  the  survey  data  provided  by  the  respondents  to  produce   more  in-­‐depth  analysis  and  create  a  profile  of  each  library.           How  many  libraries  in  North  Carolina  are  authenticating?  (Chart  2)   The  survey  asked:  “Is  any  type  of  authentication  required  or  mandated  for  using  any  of  the  PCs  in   the  library’s  public  area?”  66%  (or  75)  of  libraries  answered  yes  that  they  required  authentication   to  use  the  PCs.  (See  Chart  2.)         Chart  2.     Are  some  types  of  libraries  more  likely  to  authenticate?  (Chart  3)   While  each  type  of  library  had  a  different  overall  total  as  compared  to  the  other  types,  Chart  3   shows  how  the  percentages  of  authentication  hold  for  each  type.    Three  out  of  the  four  types  of   libraries  authenticate  more  often.    Of  the  58  community  college  libraries,  60%  (or  35)  of  them   require  users  to  authenticate.    Seventy-­‐eight  percent  (78%)  of  the  36  private  colleges  libraries   authenticate  and  11  of  the  16  (or  69%)  UNC  System  libraries  authenticate.    Only  the  religious   college  libraries  more  often  don’t  require  users  to  authenticate  (1  of  the  3  or  33%),  although  this   is  a  very  small  population  in  the  survey.    However,  percentagewise,  community  colleges  are  more   likely  to  not  require  users  to  authenticate  then  private  college  libraries  (40%  vs.  22%)  and  the   UNC  System  libraries,  that  are  public  institutions,  fall  in  the  middle  at  31%.     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  JUNE  2015     113             Chart  3.     How  many  academic  libraries  were  required  to  authenticate  PCs  in  their  public  areas?   (Chart  4)   Of  the  75  libraries  that  required  patrons  to  authenticate,  when  asked  if  “they  were  required  to  use   this  authentication”,  59  (52%)  replied  “yes”.    Putting  these  data  points  together  shows  that  16  (or   14%)  of  the  libraries  authenticate  even  though  they  were  not  required  to  do  so.      Some  clues  about   why  this  was  were  asked  in  the  next  question  and  during  the  follow  up  phone  calls.         Chart  4.       USER  AUTHENTICATION  IN  THE  PUBLIC  LIBRARY  AREA  OF  ACADEMIC  LIBRARIES  IN  NORTH  CAROLINA  |     114   ELLERN,  HITCH,  AND  STOFFAN   doi:  10.6017/ital.v34i2.5770   Why  was  Authentication  Used?   Libraries  were  asked,  “Do  you  know  the  reasons  why  authentication  is  being  used?”  If  they   answered  “prevent  misuse  of  resources”  or  “control  the  public’s  use  of  these  PCs”  then  an   additional  question  was  asked,  “What  led  the  library  to  control  the  use  of  PCs?”      This  option  had   two  check  boxes  (“inability  of  students  to  use  the  resources  due  to  overuse  by  the  public”  and   “computer  abuse”)  and  a  third  box  to  allow  free  text  entry.      A  library  could  check  more  than  one   box.   Of  those  75  libraries  that  authenticated,  60%  (or  45)  checked  “prevent  misuse  of  resources”  and   48%  (or  36)  cited  “controlling  the  public’s  use  of  these  PCs”  as  the  reasons  for  authenticating.   In  normalizing  the  data  from  the  two  questions  and  the  free  text  field,  Table  1  combines  all   answers  to  illustrate  the  number  and  percentages  of  each.     Table  1.     In  the  course  of  the  follow  up  calls  with  those  libraries  that  answered  the  survey  over  the  phone,   further  insight  was  provided.  One  librarian  said  that  their  IT  department  told  them   “authentication  was  the  law  and  they  had  to  do  it”.  Another  answered  that  they  were  “on  the  bus   line  and  so  the  public  used  their  resources  more  than  they  expected  and  so  they  had  to”.   To  get  a  better  understanding  of  the  scope  and  variety  of  these  answers,  here  are  some  examples   of  the  reasons  cited  in  the  free  text  space:  “all  IT's  idea  to  do  this”  “Best  practices”,  “Caution”,   “Concerned  they  would  be  used  for  the  wrong  reasons”,  “Control”,  “We  found  them  misusing   computer  resources  (porn,  including  child  porn)”,  “Control  over  college  students  searching  of   inappropriate  websites,  such  as  porn/explicit  sites”,  “Disruption”,  “Ease  of  distributing     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  JUNE  2015     115           applications”,  “Fear  of  abuse  on  the  part  of  legal”,  “Legal  issues  regarding  internet  access”,  “Making   students  accountable”,  “Monitor  use”,  “Policy”,  “Security  of  campus  network”,  “Security  of   machines  after  issues  were  raised  at  a  conference”,  and  “Time”.   Who  required  that  the  libraries  authenticate?  (Chart  5)   The  survey  asked,  “What  organization  or  group  required  or  mandated  the  library  to  use   authentication?”    Respondents  were  allowed  to  choose  more  than  one  of  the  5  boxes.    These   choices  included  “the  library  itself,”  “IT  or  some  unit  within  IT,”  “college  or  university   administration,”  “other”  (with  a  text  box  to  explain),  and  “not  sure”.    The  results  of  this  question   are  shown  in  Chart  5.    The  survey  revealed  that  the  decision  was  solely  the  library’s  choice  25%  of   the  time,  (or  28  libraries)    22%  of  the  time  the  library  was  mandated  or  required  to  authenticate   by  IT  or  some  unit  within  IT  (or  25  libraries)  and  4%  of  the  time  a  library’s  college  or  university   administration  required  or  mandated  authentication  (or  4  libraries).      Collaborative  decisions  in   14  libraries  involved  more  than  one  organization.    Of  the  39  libraries  that  were  involved  with  the   authentication  decision  (28  that  made  the  decision  by  themselves  and  11  that  were  part  of  a   collaborative  decision),  55%  (or  16)  authenticated  even  though  they  were  not  required  to  do  it.     Chart  5.   What  type  of  authentication  is  used?   Authentication  in  libraries  can  take  many  forms.    The  most  common  method  for  those  libraries   that  authenticate  was  by  using  centralized  or  networked  systems.    Almost  sixty  percent  of  the   libraries  used  some  form  of  this  identified  access  (Tables  2  and  3)  with  one  library  using  some   other  independent  system.    Twenty-­‐five  percent  (or  19)  of  libraries  that  authenticate  still  use   some  form  of  paper  sign-­‐in  sheets  and  21%  (or  16)  use  pre-­‐set  or  temporary  logins  or  guest  cards.     Fifteen  percent  (or  11)  use  PC  based  sign-­‐in  or  scheduling  software  and  8%  (or  6)  use  the  library     USER  AUTHENTICATION  IN  THE  PUBLIC  LIBRARY  AREA  OF  ACADEMIC  LIBRARIES  IN  NORTH  CAROLINA  |     116   ELLERN,  HITCH,  AND  STOFFAN   doi:  10.6017/ital.v34i2.5770   system  in  some  form  for  authentication.    A  few  libraries  indicated  that  they  bypass  their   authentication  systems  for  guests  by  either  having  staff  log  guests  in  or  disabling  the  system  on   selected  PCs.  We  saw  this  during  the  “secret  shopper”  visits  as  well.       Table  2.   Do  the  forms  of  authentication  used  in  libraries  allow  for  user  privacy?   When  asked  how  they  handle  user  privacy  in  authentication,  of  the  75  libraries  that  authenticate,   67%  (or  50)  use  a  form  of  authentication  that  can  identify  the  user.    In  other  words,  most  users  do   not  have  privacy  when  using  public  computers  in  an  academic  library  because  they  are  required  to   use  some  form  of  centralized  or  networked  authentication.  The  options  in  Table  3  were  presented   to  the  respondents  as  possible  forms  of  privacy  methods.  Thirty-­‐five  percent  (or  26)  libraries   indicated  that  they  provide  some  form  of  privacy  for  their  patrons.  Anonymous  access  accounted   for  28%  (or  21)  of  the  libraries.       Table  3.     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  JUNE  2015     117           Are  librarians  aware  of  the  computer  logging  activity  going  on  in  the  public  area?  (Table  4)   All  the  113  respondents  were  asked  two  questions  about  the  computer  logging  activities  of  their   libraries:  “Do  you  know  what  computer  activity  logs  are  kept”  and  “Do  you  know  how  long   computer  activity  logs  are  kept”.    The  second  question  was  only  asked  if  “unsure”  was  not  checked.   Besides  “unsure”,  responses  on  the  survey  included  “Authentication  logs  (who  logged  in)”,   “Browsing  history  (kept  on  PC  after  reboot)”,  “Browsing  history  (kept  in  centralized  log  files)”,   “Scheduling  logs  (manual  or  software)”,  “Software  use  logs”  and  “Other”.  The  respondents  could   select  more  than  one  answer.  However,  over  half  (52%)  of  the  respondents  were  unsure  if  the   library  kept  any  computer  logs  at  all.  Authentication  logs  of  who  logged  in  were  the  most  common,   but  those  were  kept  in  only  25%  of  the  total  libraries  surveyed.    A  high  percentage  of  libraries   kept  some  kind  of  logs  but  most  respondents  were  unsure  how  long  those  records  were  kept.    Of   the  various  types  of  logs,  respondents  that  use  scheduling  software  were  the  most  familiar  with   the  length  of  time  software  logs  were  kept.  In  one  case,  a  respondent  mentioned  that  the  manual   sign-­‐in  sheets  were  never  thrown  out  and  that  they  had  retained  them  for  years.     Table  4.  Log  retention.   Are  past  incidents  factors  in  authenticating?     Only  three  libraries  reported  breaches  of  privacy  and  all  those  libraries  reported  using   authentication.     Of  the  75  libraries  that  do  authenticate  (Chart  6,  3  bars  on  the  right),  36  reported  that  they  did   have  improper  use  of  the  PCs  while  29  of  the  libraries  reported  that  did  not  and  10  did  not  know.     Of  the  38  libraries  that  do  not  authenticate  (Chart  6,  3  bars  on  the  left),  23  reported  that  they  had   no  improper  use  of  the  PCs  while  13  stated  that  they  did  and  2  did  not  know.    The  overall  known   reports  of  improper  use  in  the  survey  are  higher  when  the  library  does  authenticate  and  is  lower   when  the  library  doesn’t  authenticate.   Computer  Activity  Logs Number Of  total   libraries Don't  know   how  long   data  is  kept   (unsure) Unsure 59 52% 100% Authentication  logs  (who  logged  in) 28 25% 60% None 21 19% -­‐-­‐ Browsing  history  (kept  in  centralized  log  files) 14 12% 86% Scheduling  logs  (manual  or  software) 10 9% 70% Browsing  history  (kept  on  PC  after  reboot) 7 6% 57% Software  use  logs 6 5% 33% Library  system 4 4% 75% Other 2 2% -­‐-­‐ What  Kind  and  For  How  Long  Computer  Logs  are  Kept (All  113  Libraries)   USER  AUTHENTICATION  IN  THE  PUBLIC  LIBRARY  AREA  OF  ACADEMIC  LIBRARIES  IN  NORTH  CAROLINA  |     118   ELLERN,  HITCH,  AND  STOFFAN   doi:  10.6017/ital.v34i2.5770     Chart  6.   When  did  libraries  begin  authenticating  in  their  public  areas?   Of  the  75  libraries  that  authenticate,  only  one  implemented  this  more  than  ten  years  prior  to  the   survey.  51  (or  67%)  of  the  responding  libraries  began  authenticating  between  3  and  10  years  ago.     10  libraries  implemented  authentication  in  the  year  before  the  survey.    This  is  consistent  with  the   growth  of  security  concerns  in  the  post  9/11  decade.  (Chart  7)     Chart  7.     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  JUNE  2015     119           DISCUSSION   Since  the  introduction  of  computer  technology  to  libraries,  library  staff  and  patrons  have  used   different  levels  of  authentication  depending  upon  the  application.    While  remote  access  to   commercial  services  such  as  OCLC  cataloging  subsystems  or  vendor  databases  have  always  used   some  form  of  authorization,  usually  username  and  password,  it  has  never  been  necessary  or   desirable  for  public  access  to  the  library’s  catalog  system  to  have  any  kind  of  authorization   requirements.    Most  of  the  collections  within  an  academic  library  have  traditionally  been  housed   in  open  access  stacks  where  anyone  can  freely  access  material  on  the  shelves.    Printed  indexes  and   other  tools  that  provide  in-­‐depth  access  to  these  collections  have  traditionally  been  open  as  well.     Today,  most  libraries  still  make  their  library  catalog  and  even  some  bibliographic  discovery  tools   open  access  and  available  over  the  web.  This  practice  naturally  extended  to  computer  technology   and  other  electronic  reference  tools  until  libraries  began  connecting  them  to  the  campus  and   public  networks.       The  principle  of  free  and  open  access  to  the  materials  and  resources  of  the  library,  within  the   library  walls,  has  been  a  fundamental  characteristic  of  most  public  and  academic  libraries.  There  is   an  ethical  commitment  of  librarians  to  a  user’s  privacy  and  confidentiality  that  has  deep  roots   based  in  the  First  and  Fourth  Amendment  of  the  US  Constitution,  state  laws,  and  the  Code  of  Ethics   of  the  ALA.    Article  II  of  the  ALA  Code  states  “We  protect  each  library  user's  right  to  privacy  and   confidentiality  with  respect  to  information  sought  or  received  and  resources  consulted,  borrowed,   acquired  or  transmitted.”  Traditionally,  library  staff  do  not  identify  patrons  that  walk  through  the   door;  they  don’t  ask  for  identification  when  answering  questions  at  the  reference  desk  nor  do  they   identify  patrons  reading  a  book  or  magazine  in  the  public  areas  of  a  library.  Schneider  has   empathized  that  librarians  have  always  valued  user  privacy  and  have  been  instrumental  in  the   passing  of  many  state’s  library  privacy  laws.23    Usually,  it  is  only  when  materials  are  checked  out   to  a  patron  that  a  user’s  affiliation  or  authorization  even  gets  questioned  directly.  Frequently   patrons  can  make  use  of  materials  within  the  library  building  with  no  record  of  what  was  accessed.   We  are  seeing  these  traditional  principles  of  open  access  to  materials  as  they  transition  to   electronic  formats.  It  is  becoming  more  common  for  patrons  to  have  to  authenticate  before  they   can  use  what  was  once  openly  available.  The  data  collected  from  this  survey  confirms  this  trend   with  66%  of  the  libraries  using  some  form  of  authentication  in  their  public  area.   The  widespread  use  of  personally  identifiable  information  is  making  it  more  difficult  for  librarians   to  protect  the  privacy  and  confidentiality  of  library  users.  Although  the  writing  was  on  the  wall   that  some  choices  would  have  to  be  made  with  regards  to  privacy  before  911,  no  easy  answer  to   the  problem  had  yet  been  identified.  Librarians  themselves  are  often  uncertain  about  what   information  is  collected  and  stored  as  evidenced  by  our  data  (Chart  6).    As  more  information   becomes  available  only  electronically,  because  computers  in  the  public  areas  are  now  used  for   much  more  than  just  accessing  library  catalog  functions,  it  is  becoming  difficult  to  uphold  the  code   of  ethics  and  protect  the  privacy  of  users.       USER  AUTHENTICATION  IN  THE  PUBLIC  LIBRARY  AREA  OF  ACADEMIC  LIBRARIES  IN  NORTH  CAROLINA  |     120   ELLERN,  HITCH,  AND  STOFFAN   doi:  10.6017/ital.v34i2.5770   Using  authentication  can  also  make  it  more  difficult  to  use  technology  in  the  library.    In  order  to   authenticate,  users  may  be  required  to  start  or  restart  a  computer  and/or,  log  into  or  out  of  the   computer.    This  can  take  time  to  do  as  well  as  require  the  user  to  remember  to  log  off  the   computer  when  finished.  Users  often  have  difficulty  keeping  track  of  their  user  information  and   may  require  increased  assistance  (Table  5).     Table  5.   Library  staff  or  scheduling  software  can  be  required  to  help  library  guests  obtain  access  to   computer  equipment.    North  Carolina,  like  other  states,  does  have  laws  governing  the   confidentiality  of  library  records.  Librarians  have  long  dealt  with  this  situation  by  keeping  as  little   data  as  possible.  For  example,  many  library  circulation  systems  do  not  store  data  beyond  the   current  checkout.  Access  logs  that  detail  what  resources  a  particular  user  has  accessed  would   seem  to  fall  under  this  legislation,  although  the  wording  in  the  law  is  vague.   Information  technology  departments,  legal  counsel,  and  administrators,  on  the  other  hand,  are   often  less  concerned  about  privacy  and  intellectual  freedom  issues.  More  often  their  focus  is  on   security,  limiting  access  to  those  users  affiliated  with  the  institution,  and  monitoring  use.  Being   ready  and  able  to  provide  data  in  response  to  subpoenas  and  court  orders  is  often  a  priority.  At   Western  Carolina  University,  illicit  use  of  an  unauthenticated  computer  in  the  student  center  led  to   an  investigation  by  campus  and  county  law  enforcement.  This  case  is  still  used  as  justification  for   needing  to  authenticate  and  monitor  campus  computer  use  even  though  the  incident  occurred   many  years  ago.  Being  able  to  track  an  individual’s  online  activity  is  believed  to  increase  security   by  ensuring  adherence  to  institutional  policies.  Authentication  with  individually  assigned  login   credentials  permits  online  activity  to  be  traced  to  that  specific  account  whose  owner  can  then  be   held  accountable  for  the  activity  performed.  Librarian’s  responses  to  the  survey  indicate  that  these   issues  play  a  role  in  a  library’s  decisions  to  authenticate  as  seen  in  the  free  text  responses  in  Table   6.     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  JUNE  2015     121           Tracking  use  through  IP  address,  individual  login,  and  transaction  logs  allows  scrutinizing  of  users   in  case  of  illegal  or  illicit  use  of  computer  resources.  In  many  cases,  this  action  is  justified  as  being   required  by  auditors  or  law  enforcement  agencies,  though  information  regarding  this  is  scarce.   The  authors  of  this  article  are  not  aware  of  any  laws  or  auditing  requirements  in  North  Carolina   that  require  detailed  tracking  of  library  computer  use.     Some  libraries  indicated  that  IT  departments  were  concerned  about  security  of  networks  and/or   computers.  Security  can  be  undermined  when  generic  accounts  are  used  or  when  no   authentication  is  required.  By  using  individual  logins,  users  can  be  restricted  to  specific  network   resources  and  can  be  monitored.  When  multiple  computers  use  the  same  account  for  logging  in  or   when  the  login  credentials  are  posted  on  each  computer,  it  can  compromise  security  because  use   cannot  be  tracked  to  a  specific  user.  In  some  libraries,  these  security  issues  have  trumped   librarian’s  concerns  about  intellectual  freedom  and  privacy.   Creating  a  profile  as  a  result  of  these  findings   Given  the  number  of  characteristics  collected  about  each  library,  it  was  assumed  there  were  some   factors  gathered  that  might  influence  a  decision  to  authenticate  and  allow  for  the  possibility  to   create  a  profile  for  prediction.  The  data  was  collected  from  libraries  within  a  fixed  geographic   region.  The  externally  collected  and  survey  data  was  coded,  put  into  SPSS™  and  a  number  of   statistical  tests  were  performed  to  find  what  factors  might  be  statistically  significant.    To  further   the  geographical  analysis  of  the  data,  the  data  was  also  put  into  ARCView™  to  produce  a  map  of   North  Carolina  with  the  libraries  given  different  colored  pins  for  those  academic  libraries  that   authenticated  vs.  non-­‐authenticated  to  see  if  there  were  any  pattern  to  the  choice.  (Map  1)     To  more  completely  explore  the  possible  role  that  geographic  information  might  play  in  the   decision  to  authenticate,  the  population  of  the  city  or  town  the  institution  was  located  in,   enrollment,  book  volume,  number  of  PCs  and  total  number  of  library  IT  staff  (scaled  variables)  as   well  as  ordinal  variables  such  as  “who  controlled  the  setup  of  the  PCs”,  “do  you  differentiate   between  student  and  public  PCs”,  and  “known  incidents  of  privacy  and  misuse”,  were  also   integrated  into  the  analysis.    The  data  collected  could  not  predict  whether  an  academic  library   would  authenticate  or  not  using  logistical  regression  techniques,  although  those  that  differentiate   between  student  and  public  PCs  did  have  a  higher  probability.    Based  on  all  our  collected  data  and   mapping,  it  is  impossible  to  predict  with  any  significance  whether  or  not  an  academic  library   would  authenticate.     So  the  short  answer  statistically  is  no.    Using  all  of  the  data  collected,  a  statistically  significant   profile  could  not  be  created,  however  there  are  general  tendencies  identified  that  the  data  was   able  to  suggest.       USER  AUTHENTICATION  IN  THE  PUBLIC  LIBRARY  AREA  OF  ACADEMIC  LIBRARIES  IN  NORTH  CAROLINA  |     122   ELLERN,  HITCH,  AND  STOFFAN   doi:  10.6017/ital.v34i2.5770     Map  1.   For  those  libraries  that  do  authenticate,  the  average  book  volume  is  almost  400,000,  the   enrollment  around  5,600,  the  city  population  where  the  institution  is  located  is  94,000,  the  total   number  of  PCs  in  the  public  area  is  54,  and  the  average  number  of  library  IT  staff  is  1.8.         For  those  libraries  that  do  not  authenticate,  the  average  book  volume  is  about  163,000,  enrollment   around  3,000,  the  population  is  53,000,  the  average  number  of  PCs  in  the  public  area  is  about  39   and  the  average  number  of  library  IT  staff  is  0.8.     Libraries  that  authenticate  tend  to  have  statistically  significant  differences  in  book  volume,  the   number  of  PCs  in  the  public  area,  which  has  a  t-­‐test  value  of  P<1.    Student  enrollment  was  the  most   statistically  significant  factor  in  those  that  authenticated,  with  a  t-­‐test  value  of  P<0.5.  Libraries  that   authenticate  had  many  more  students,  more  books  and  a  larger  number  of  PCs  in  their  public   areas  then  libraries  that  didn’t  authenticate.   Those  libraries  that  didn’t  authenticate  tended  to  be  in  smaller  towns,  more  often  their  PCs  in  the   public  areas  were  setup  by  non-­‐library  IT  staff,  and  had  fewer  library  IT  staff.  Sixty  percent  (60%)   of  the  libraries  that  don’t  authenticate  had  zero  library  IT  staff.           INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  JUNE  2015     123           While  it  was  assumed  at  the  outset  of  this  research  that  the  responsible  campus  department  for   the  setup  of  the  workstations  (the  library  or  IT)  in  the  public  area  would  be  a  factor  in  whether   authentication  was  used  in  the  library,  the  data  does  not  support  this  assumption  statistically.   Ethical  questions  about  authentication  as  a  result  of  these  findings   There  are  a  variety  of  reasons  why  a  library  might  choose  to  authenticate  despite  the  ethical  issues   associated  with  it.  The  protection  and  management  of  IT  resources  or  the  mission  of  the   institution  are  two  likely  scenarios.    A  library,  especially  one  with  lots  of  use  by  unaffiliated  users   or  guests,  might  chose  to  authenticate  regardless  of  concerns  in  order  to  make  sure  its  own  users   have  preference  to  the  PCs  in  the  public  area  of  their  library.  A  private  institution  may  choose  to   authenticate  in  order  to  limit  access  by  any  members  of  the  general  public.  Of  those  75  libraries   that  authenticate,  81%  cited  concerns  about  controlling  use,  overuse  and  misuse.  This  study  also   found  that  in  25%  of  the  total  academic  libraries,  the  library  itself  decided  to  authenticate  without   influence  from  external  groups.  This  was  a  higher  percentage  than  was  expected.  Given  librarian’s   professional  concerns  about  intellectual  freedom  and  privacy,  we  were  very  surprised  that  so   many  libraries  choose  to  authenticate  on  their  own.   We  suspected  that  many  librarians  might  not  have  a  full  understanding  of  the  privacy  issues   created  when  requiring  individual  logins.    Based  on  this  assumption,  we  expected  that  many  of  the   librarians  would  not  be  fully  aware  of  what  user  tracking  data  was  being  kept.    Examples  include   network  authentication,  tracking  cookies,  web  browser  history,  and  user  sign-­‐in  sheets.  The  study   found  that  librarians  are  often  unsure  of  what  data  is  being  logged  with  51  (or  45%)  of  113   libraries  reporting  this.    Only  19%  reported  knowing  with  certainly  that  no  tracking  data  was  kept.     Of  those  that  did  know  that  tracking  data  was  being  kept,  most  had  no  idea  how  long  this  data  was   retained.   CONCLUSION   This  study  found  that  66%  (or  75)  of  the  113  surveyed  North  Carolina  academic  libraries  required   some  form  of  user  authentication  on  their  public  computers.  The  researchers  reviewed  an   extensive  amount  of  data  to  identify  the  factors  involved  with  this  decision.    These  factors   included  individual  demographics,  such  as  city  population,  book  volume,  type  of  academic  library,   and  enrollment.    It  was  anticipated  that  by  looking  a  large  pool  of  academic  libraries  within  a   specific  region,  a  profile  might  emerge  that  would  predict  which  libraries  would  chose  to   authenticate.    Even  with  comprehensive  data  about  the  75  libraries  that  authenticated,  a  profile  of   a  “typical”  authenticated  library  could  not  be  developed.    The  data  did  show  two  factors  of  any   statistical  significance  (enrollment  and  book  volume)  in  determining  a  library’s  decision  to   authenticate.    However,  the  decision  to  authenticate  could  not  be  predicted.    Each  library’s   decision  to  authenticate  seems  to  be  based  on  the  unique  situation  of  that  library.     We  expected  to  find  that  most  libraries  would  authenticate  due  to  pressure  from  external  sources,   such  as  campus  IT  departments,  administrators,  or  in  response  to  incidents  involving  the     USER  AUTHENTICATION  IN  THE  PUBLIC  LIBRARY  AREA  OF  ACADEMIC  LIBRARIES  IN  NORTH  CAROLINA  |     124   ELLERN,  HITCH,  AND  STOFFAN   doi:  10.6017/ital.v34i2.5770   computers  in  the  public  area.  This  study  found  that  only  39%  (or  44)  libraries  surveyed   authenticated  due  to  these  factors  so  our  assumption  was  incorrect.      Surprisingly,  we  found  that   25%  (or  28)  libraries  did  choose  to  authenticate  on  their  own.  The  need  to  control  the  use  of  their   limited  resources  seemed  to  have  precedence  over  any  other  factors  including  user  privacy.  We   did  expect  to  see  a  rise  in  the  number  of  libraries  that  authenticated  in  the  aftermath  of  9/11.  This   we  found  to  be  true.  Looking  at  the  prior  research  that  define  an  actual  percentage  of   authentications  in  academic  libraries,  no  matter  how  limited  in  scope,  (for  example,  just  the  ARL   libraries,  responding  libraries,  etc.),  there  does  seem  to  be  a  strong  trend  for  academic  libraries  to   authenticate.   Our  results,  with  75%  of  academic  libraries  having  authentication,  support  the  conclusion  that   there  is  a  continued  trend  of  authentication  that  has  steadily  expanded  over  the  past  decade.  This   has  happened  in  spite  of  librarian’s  traditional  philosophy  on  access  and  academic  freedom.   Libraries  are  seemingly  relinquishing  their  ethical  stance  or  have  other  priorities  that  make   authentication  an  attractive  solution  to  controlling  use  of  limited  or  licensed  resources.    Our   survey  results  show  that  many  librarians  may  not  fully  understand  the  privacy  risks  inherent  in   authentication.    Slightly  over  half  (52%)  of  the  libraries  reported  that  they  did  not  know  if  any   computer  or  network  log  files  were  being  kept  nor  for  how  long  they  are  kept.   The  issues  surrounding  academic  freedom,  access  to  information,  and  privacy  in  the  face  of   security  concerns  continue  to  effect  library  users.  Academic  libraries  in  smaller  communities  are   often  the  only  nearby  source  of  scholarly  materials.  Traditionally  these  resources  have  been  made   available  to  community  members,  high  school  students,  and  others  who  require  materials  beyond   the  scope  of  the  resources  of  the  public  or  school  library.  As  pointed  out,  restrictive  authentication   policies  may  hamper  the  ability  of  these  groups  to  access  the  information  they  need.  However,  the   data  showed  very  little  consistency  to  support  this  idea  with  respect  to  authentication  in  small   towns  and  communities  throughout  the  state.   Some  of  the  surveyed  academic  libraries  made  a  strong  statement  that  they  are  not  authenticating   in  their  public  area  computers  and  have  every  intention  of  continuing  this  practice.  These  libraries   are  now  in  a  distinct  minority  and  we  expect  their  position  will  continually  be  challenged.    For   example,  at  Western  Carolina  University,  we  continue  to  employ  open  computers  in  the  public   areas  of  the  library  but  are  regularly  pressed  by  our  campus  IT  department  to  implement   authentication.  We  have  so  far  been  successful  in  resisting  this  pressure  because  of  the   commitment  of  our  dean  and  librarians  to  preserving  the  privacy  of  our  patrons.   FURTHER  STUDIES   As  a  follow-­‐up  to  this  study,  we  plan  to  contact  the  35  libraries  that  did  not  authenticate  to   determine  if  they  now  require  authentication  or  have  plans  to  do  so.  Based  on  responses  to  this   survey,  we  expect  that  many  librarians  are  unaware  of  the  degree  to  which  authentication  can   undermine  patron  privacy.  We  suggest  an  in-­‐depth  study  be  conducted  to  determine  the  degree  of     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  JUNE  2015     125           understanding  among  librarians  about  potential  privacy  issues  with  authentication  in  the  context   of  their  longstanding  professional  position  on  academic  freedom  and  patron  confidentiality.             USER  AUTHENTICATION  IN  THE  PUBLIC  LIBRARY  AREA  OF  ACADEMIC  LIBRARIES  IN  NORTH  CAROLINA  |     126   ELLERN,  HITCH,  AND  STOFFAN   doi:  10.6017/ital.v34i2.5770   APPENDIX  A.     Survey  questions   1.  Select  the  library  you  represent:   2.  Which  library  or  library  building  are  you  reporting  on?       • Main  Library  or  the  only  library  on  campus   • Medical  library   • Special  library   • Other   3.  How  many  total  PCs  do  you  have  in  your  library  public  area  for  the  building  you  are  reporting   on?   4.  How  many  Library  IT  or  Library  Systems  staff  does  the  library  have?   5.  Does  the  Library’s  IT/Systems  staff  control  the  setup  of  these  PCs  in  the  library  public  area?   • Yes   • Shared  with  IT  (Campus  Computing  Center)   • IT  (Campus  Computing  Center)   • No  (please  specify  who  does  control  the  setup  of  these  PCs)   Authentication   6.  Is  any  type  of  authentication  required  or  mandated  to  use  any  of  the  PCs  in  the  library’s  public   area?   7.  Were  you  required  to  use  this  authentication  on  any  of  the  PCs  in  the  library’s  public  area?   8.  What  organization  or  group  required  or  mandated  the  library  to  use  authentication  on  PC’s  in   the  library  public  area?   • The  library  itself   • IT  or  some  unit  within  IT   • Other  (please  explain)   • Not  sure   • College/University  administration         INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  JUNE  2015     127           9.  Do  you  know  the  reason’s  authentication  is  being  used?     • Mandated  by  parent  institution  or  group   • Prevent  misuse  of  resources   • Other  (please  specify)   • Control  the  public’s  use  of  these  PCs   10.  What  lead  the  library  to  control  the  use  of  PCs?       • Inability  of  students  to  use  the  resource  due  to  overuse  by  the  public   • Computer  abuse   • Other  (please  specify)   11.  How  are  the  users  informed  about  the  authentication  policy?   • Screen  saver   • Web  page   • Login  or  sign  on  screen   • Training  session  or  other  presentation   • Other  (please  specify)   12.  What  form  of  authentication  do  you  use?   • Manual  paper  sign-­‐in  sheets   • Individual  PC  based  sign-­‐in  or  scheduling  software   • Centralized  or  networked  authentication  such  as  Active  Directory,  Novell,  or  ERS   (Enterprise  Resource  Planning)  system  with  a  college/university  wide  identifier   • Pre-­‐set  or  temporary  authorization  logins  or  guest  cards  handed  out  (please  specify  the   length  of  time  this  is  good  for)   • Other  (please  specify)   13.  How  does  the  library  handle  user  privacy  of  authentication?   • Anonymous  access  (each  session  is  anonymous  with  repeat  users  not  identified)   • Anonymous  access  (each  session  is  anonymous  with  repeat  users  not  identified)   • Identified  access   • Pseudonymous  access  with  demographic  identification  (characteristics  of  users   determined  but  not  actual  identified)   • Pseudonymous  access  (repeat  users  identified  but  not  the  identity  of  a  particular  user)         USER  AUTHENTICATION  IN  THE  PUBLIC  LIBRARY  AREA  OF  ACADEMIC  LIBRARIES  IN  NORTH  CAROLINA  |     128   ELLERN,  HITCH,  AND  STOFFAN   doi:  10.6017/ital.v34i2.5770   14.  When  did  you  implement  authentication  of  the  PCs  in  the  library  public  area?   • This  year   • Last  year   • 3-­‐5  years  ago   • 5-­‐10  years  ago   • Don’t  know   Student  only  PCs   15.  Do  you  differentiate  between  Student  Only  PCs  and  Guest/Public  Use  PCs  in  the  library  public   area?   17.  How  many  PCs  are  designated  for  Student  Only  PCs  in  the  library’s  public  area?   18.  Do  you  require  authentication  to  access  Student  Only  PCs  in  the  library’s  public  area?   19.  What  does  authentication  provide  on  a  Student  Only  PC  once  an  affiliated  person  logs  in?   • Access  to  specialized  software   • Access  to  storage  space   • Printing   • Internet  access   • Other  (please  specify)   20.  Once  done  with  an  authenticated  session  on  a  Student  Only  PC,  how  is  authentication  on  a  PC   removed?     • User  is  required  to  log  out   • User  is  timed  out   • Other  (please  specify)    21  What  authentication  issue  have  you  seen  in  your  library  with  Student  Only  PCs?   • ID  management  issues  from  the  user  (e.g.,  like  forgetting  passwords)   • ID  management  issues  from  the  network  (e.g.,  updating  changes  in  timely  fashion)   • Timing  out  issues   • Authentication  system  become  not  available   • Other  (please  specify)   Guest/Public  PCs   22.  How  many  PCs  are  designated  for  guest  or  public  use  in  the  library’s  public  area?   23.  Describe  the  location  of  these  Guest/Public  Use  PCs.     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  JUNE  2015     129           • Line-­‐of-­‐sight  to  library  service  desk   • All  In  one  general  area   • Scattered  throughout  the  library   • Other  (please  specify)   • In  several  groups  around  the  library   24.  Do  you  require  authentication  to  access  guest/public  use  PCs  in  the  library’s  public  area?   25.  What  does  authentication  allow  for  guest  or  the  public  that  log  in?   • Limited  software   • Control,  limit  or  block  web  sites  that  can  be  accessed   • Limited  or  different  charge  for  printing   • Timed  or  scheduled  access   • Internet  access   • Other  (please  specify)   • Control,  limit  or  block  access  to  library  resources  (such  as  databases  or  other  subscription   based  services)   26.  Are  there  different  type  of  PCs  in  your  library  area?  Check  those  that  apply.   • All  PCs  are  the  same   • Some  have  different  type  of  software  (like  Browser  Only)   • Some  have  time  or  scheduling  limitation   • Some  have  printing  limitations   • Some  have  specialized  equipment  attached  (like  scanners,  microfiche  readers,  etc.)   • Some  control,  limit  or  block  web  sites  that  can  be  accessed   • Some  control,  limit  or  block  access  to  library  resources  (such  as  database  or  other   subscription  based  services)   • Other  (please  specify)   Wireless  access   27.  Do  you  have  wireless  access  in  your  library  public  area?   28.  Do  you  require  authentication  to  your  wireless  access  in  the  library  public  area?   29.  Does  the  library  have  its  own  wireless  policies  different  from  the  campus’s  policy?   30.  What  methods  are  used  to  give  guests  or  the  public  access  to  your  wireless  access?  Check   those  that  apply.   • No  access  to  guest  or  general  public   • Paperwork  and/or  signature  required  before  access  given     USER  AUTHENTICATION  IN  THE  PUBLIC  LIBRARY  AREA  OF  ACADEMIC  LIBRARIES  IN  NORTH  CAROLINA  |     130   ELLERN,  HITCH,  AND  STOFFAN   doi:  10.6017/ital.v34i2.5770   • Limited  access  by  time   • Open  access   • Limited  access  by  resource  (such  as  Internet  access  only)   • Other   Incident  Reports   31.  Has  your  library  had  any  known  incidents  of  breach  of  privacy  that  you  know  about?   32.  Has  your  library  had  any  incidents  of  improper  use  of  public  PCs  (such  as  cyber  stalking,  child   pornography,  terrorism,  etc.?)   33.  Have  these  incidents  required  investigation  or  digital  forensics  work  to  be  done?   34.  Who  handled  the  work  of  investigation?   • Library  IT  or  Library  Systems  staff   • IT  or  Campus  Computing  Center   • Campus  Police   • Other  Law  Enforcement   • Unsure   • Other  (please  specify)   Computer  Activity  Logs   35.  Do  you  know  what  computer  activity  logs  are  kept?  (if  unsure,  end,  if  not  ask)   • Authentication  logs  (who  logged  in)   • Browsing  history  (kept  on  PC  after  reboot)   • Browsing  history  (kept  in  centralized  log  files)   • Scheduling  logs  (manual  or  software)   • Software  use  logs   • None   • Unsure   • Other  (please  specify)   36  Do  you  know  how  long  computer  activity  logs  are  kept?   • 24  hours  or  less     • Week   • Month   • Year   • Unknown     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  JUNE  2015     131           REFERENCES   1.   Pam  Dixon,  "Ethical  Risks  and  Best  Practices,"  Journal  Of  Library  Administration  47,  no.  3/4   (May  2008):  157.     2.   Scott  Carlson,  “To  Use  That  Library  Computer,  Please  Identify  Yourself,”  Chronicle  of  Higher   Education,  June  25,  2004,  A39.   3.   Lori  Driscoll,  Library  Public  Access  Workstation  Authentication,  SPEC  Kit  277  (Washington,  D.C.:   Association  of  Research  Libraries,  2003).     4.   Martin  Cook  and  Mark  Shelton,  Managing  Public  Computing,  SPEC  Kit  302  (Washington,  D.C.:   Association  of  Research  Libraries,  2007).     5.   Diana  Oblinger,  “IT  Security  and  Academic  Values,”  in  Computer  and  Network  Security  in   Higher  Education,  ed.    Mark  Luker  and  Rodney  Petersen  (Jossey-­‐Bass,  2003):  1-­‐13.   6.   Code  of  Ethics  of  the  American  Library  Association,   http://www.ala.org/advocacy/proethics/codeofethics/codeethics     7.   Fair  Information  Practices  adopted  by  the  Organization  for  Economic  Cooperation  and   Development,  http://www.oecd.org/sti/security-­‐privacy     8.   ”NISO  Best  Practices  for  Designing  Web  Services  in  the  Library  Context,”  NISO  RP-­‐2006-­‐01   (Bethesda,  MD:  National  Information  Standards  Organization,  2006)   9.   Dixon,  “Ethical  Issues  Implicit  in  Library  Authentication  and  Access  Management.”   10.  Howard  Carter,  "Misuse  of  Library  Public  Access  Computers:  Balancing  Privacy,  Accountability,   and  Security,"  Journal  Of  Library  Administration  36,  no.  4    (April  2002):  29-­‐48.   11.   Julie  Still  and  Vibiana  Kassabian,  "The  Mole's  Dilemma:  Ethical  Aspects  of  Public  Internet   Access  in  Academic  Libraries,"  Internet  Reference  Services  Quarterly  4,  no.  3  (January  1,  1999):   7-­‐22.   12.   Don  Essex,  "Opposing  the  USA  Patriot  Act:  The  Best  Alternative  for  American  Librarians,"   Public  Libraries  43,  no.  6  (November  2004):  331-­‐340.   13.   Lynne  Weber  and  Peg  Lawrence,  "Authentication  and  Access:  Accommodating  Public  Users  in   an  Academic  World."  Information  Technology  &  Libraries  29,  no.  3(September  2010):  128-­‐140.   14.   Nancy  Courtney,  "Barbarians  at  the  Gates:  A  Half-­‐Century  of  Unaffiliated  Users  in  Academic   Libraries,"  Journal  of  Academic  Librarianship  27,  no.  6  (November  2001):  473.   15.   Nancy  Courtney,  "Unaffiliated  Users’  Access  to  Academic  Libraries:  A  Survey,"  The  Journal  Of   Academic  Librarianship  29,  no.  1  (2003):  3-­‐7.     USER  AUTHENTICATION  IN  THE  PUBLIC  LIBRARY  AREA  OF  ACADEMIC  LIBRARIES  IN  NORTH  CAROLINA  |     132   ELLERN,  HITCH,  AND  STOFFAN   doi:  10.6017/ital.v34i2.5770   16.   Barbara  Best-­‐Nichols,  “Community  Use  of  Tax-­‐Supported  Academic  Libraries  in  North   Carolina:  Is  Unlimited  Access  a  Right?”  North  Carolina  Libraries  51  (Fall  1993):  120-­‐125.   17.   Nancy  Courtney,  "Authentication  and  Library  Public  Access  Computers:  A  Call  for  Discussion,"   College  &  Research  Libraries  News  65,  no.  5  (May  2004):  269-­‐277.   18.   Rita  Barsun,  "Library  Web  Pages  and  Policies  Toward  “Outsiders”:  Is  the  Information  There?"   Public  Services  Quarterly  1,  no.  4    (October  2003):  11-­‐27.   19.   American  Library  Directory  :  a  Classified  List  of  Libraries  in  the  United  States  and  Canada,  with   Personnel  and  Statistical  Data,  62nd  ed.  (New  York:  Information  Today,  2009)   20.   http://statelibrary.ncdcr.gov/ld/aboutlibraries/NCLibraryDirectory2011.pdf.       21.   Karen  Schneider,  “So  They  Won’t  Hate  the  Wait:  Time  Control  for  Workstations,”  American   Libraries,  29  no.  11  (1998):  64.   22.   Code  of  Ethics  of  the  American  Library  Association.   23.   Karen  Schneider,  “Privacy:  The  Next  Challenge,”  American  Libraries,  30,  no.  7  (1999):  98.   5857 ---- Microsoft Word - December_ITAL_vacek_final.docx President’s  Message   Twitter  Nodes  to  Networks:   Thoughts  on  the  #litaforum   Rachel  Vacek     INFORMATION  TECHNOLOGIES  AND  LIBRARIES  |  DECEMBER  1014     1   One  thing  that  never  ceases  to  amaze  me  is  the  technological  talent  and  creativity  of  my  library   colleagues.  The  LITA  Forum  is  a  gathering  of  intelligent,  fun,  and  passionate  people  who  want  to   talk  about  technology  and  learn  from  one  another.    I  suppose  many  conferences  have  lots  of   opportunities  to  network,  but  the  size  and  friendliness  of  the  Forum  makes  it  feel  more  like  a   comfortable  place  among  friends.  However,  the  utilization  of  technology  always  inspires  me,  and   the  networking  and  reconnect  with  friends  is  rejuvenating.   So  many  more  people  are  sharing  their  research  and  their  presentations  through  Twitter,  and  it’s   fantastic  in  so  many  ways.  So  no  matter  what  concurrent  session  you  were  in,  or  if  you  couldn’t   even  make  it  to  Albuquerque  this  year,  you  can  still  view  most  of  the  presentations,  listen  to  the   keynotes,  see  pictures  of  attendees,  follow  the  backchannel,  and  engage  with  everyone  on  Twitter.   With  libraries  having  more  tight  budgets,  it’s  extremely  important  that  we  continue  to  learn   virtually.  There  are  plenty  of  online  workshops  and  webinars,  but  often  they  still  cost  money,  don’t   usually  encourage  much  communication  between  attendees,  and  “attending”  the  LITA  Forum  only   through  Twitter  is  not  only  free,  but  the  learning  and  sharing  is  more  organic.  You  have  the   opportunity  to  engage  with  attendees,  observers,  and  even  the  presenters  themselves.  Structured   workshops  have  their  place  for  focused,  more  in-­‐depth  learning  on  a  particular  topic,  and  they  are   definitely  still  needed  and  very  popular.  I  enjoy  our  LITA  educational  programs  and  highly   recommend  them.  However,  interacting  with  Twitter  throughout  the  Forum  was  like  a  giant  social   playground  for  me,  and  I  could  engage  as  much  as  or  as  little  as  I  liked.  It’s  a  different  user   experience  than  so  many  other  more  traditional  learning  environments.   Twitter  was  born  in  mid  2006  and  the  paradigm  shift  started  happening  a  few  years  later,  but  the   ways  people  are  socially  engaging  with  one  another  through  Twitter  has  changed  drastically  since   then.1    People  aren’t  just  regurgitating  what  the  presenters  are  saying,  but  are  responding  to   speakers  and  others  in  the  physical  and  virtual  audience.  People  are  talking  more  in  depth  about   what  they  are  learning  and  supplementing  talks  with  links  to  sites,  videos,  images,  and  reports   that  might  have  been  mentioned.  They  are  coding  and  sharing  their  code  while  at  the  conference.   They  are  blogging  about  their  experiences  and  sharing  those  links.  They  are  extending  their   networks.       The  conference  theme  this  year  was  “From  Node  to  Network”  and  reflecting  on  my  own   conference  experience  and  reviewing  all  the  Twitter  data,  I  don’t  think  the  2014  LITA  Forum     Rachel  Vacek  (revacek@uh.edu)  is  LITA  President  2014-­‐15  and  Head  of  Web  Services,  University   Libraries,  University  of  Houston,  Houston,  Texas.     PRESIDENT’S  MESSAGE  |  VACEK       2   Planning  Committee,  led  by  Ken  Varnum  from  the  University  of  Michigan,  could  have  chosen  a   better  theme.     As  previously  mentioned,  the  ways  in  which  we  are  using  Twitter  have  been  significantly  changing   the  way  we  learn  and  interact.  When  combing  through  the  #litaform  tweets  for  the  gems,  I  found   many  links  to  tools  that  analyze  and  visually  display  unique  information  about  tweets  from  the   Forum.  The  love  of  data  is  not  uncommon  in  libraries,  and  neither  is  the  analysis  of  that  data.     The  TAGSArchive2  contains  lots  of  Twitter  data  from  the  Forum.  As  you  can  see  in  Image  1,   between  November  1,  2013,  and  November  17,  2014,  (the  same  tag  for  the  Forum  was  used  for   the  2013  Forum)  there  were  5,454  tweets,  4,390  of  which  were  unique,  not  just  retweets.  There   were  1,394  links  within  those  tweets,  demonstrating  that  we  aren’t  just  repeating  what  the   speakers  are  saying;  we  are  enriching  our  networks  with  more  easily  accessible  information.     Image  1.  Archive  of  #litaforum  tweets  through  TAGS   The  data  also  tells  stories.  For  example,  @cm_harlow  by  far  tweeted  more  than  everyone  else  with   881  tweets,  @TheStacksCat  had  the  highest  retweet  rate  at  90%,  and  @varnum  with  the  lowest         INFORMATION  TECHNOLOGIES  AND  LIBRARIES  |  DECEMBER  1014         3   retweet  rate  at  1%.  I  was  able  to  look  at  every  single  tweet  in  a  Google  spreadsheet,  complete  with   timestamps  and  links  to  user  profiles.  All  this  is  rich  data  and  quite  informative,  but  TAGSExplorer,   developed  by  @mhawksey,  is  also  quite  an  impressive  data  visualization  tool  that  shows   connections  between  the  Twitter  handles.  (See  Image  2.)     Image  2.  TAGSExplorer  data  visualization  and  top  conversationalists   Additionally,  you  can  see  whom  you  retweeted  and  who  retweeted  you,3  again  demonstrating  the   power  of  rich,  structured  data.  (See  Image  3.)  All  of  these  tools  improve  our  ability  to  share,  reflect,   archive,  and  network  within  LITA  and  beyond  our  typical,  often  comfortable  library  boundaries.   Tweets  also  don’t  last  forever  on  the  web,  but  they  do  when  they  are  archived.4    One  conference   attendee,  @kayiwa,  used  a  tool  called  twarc  (https://github.com/edsu/twarc),  a  command-­‐line   tool  for  archiving  JSON  Twitter  search  results  before  they  disappear.  Looking  through  the  tweets,   you  will  learn  that  a  great  number  of  attendees  experienced  altitude  sickness  due  to   Albuquerque’s  elevation,  which  is  around  5,000  feet  above  sea  level.  The  most  popular  and   desired  food  to  were  enchiladas  with  green  chili.  Many  were  impressed  with  the  scenery,   mountains,  and  endless  blue  skies  of  the  city,  as  evidenced  by  the  number  of  images  of  outdoor   landscapes  and  sky  shots.       PRESIDENT’S  MESSAGE  |  VACEK       4     Image  3.  Connections  between  @vacekrae’s  retweets  and  who  she  was  retweeted  by   There  were  two  packed  pre-­‐conferences  at  the  LITA  Forum.  Dean  Krafft  and  Jon  Corson-­‐Rikert   from  Cornell  University  Library  taught  attendees  about  a  very  hot  topic:  linked  data  and  “how   libraries  can  make  use  of  Linked  Open  Data  to  share  information  about  library  resources  and  to   improve  discovery,  access,  and  understanding  for  library  users.”    The  hashtag  #linkeddata  was   used  382  times  across  all  the  Forum’s  tweets  –  clearly  conversation  went  beyond  the  workshop.   Also,  Francis  Kayiwa,  of  Kayiwa  Consulting,  and  Eric  Phetteplace  from  the  California  College  of   Arts,  helped  attendees  “Learn  Python  by  Playing  with  Library  Data”  in  the  second,  equally  as   popular  pre-­‐conference.  (See  Image  4.)     Image  4         INFORMATION  TECHNOLOGIES  AND  LIBRARIES  |  DECEMBER  1014         5   The  Forum  this  year  also  had  three  exceptional  keynote  speakers.    AnnMarie  Thomas,  @amptMN,   an  engineering  professor  from  the  University  of  St.  Thomas  in  Minnesota,  kicked  off  the  Forum   and  shared  her  enthusiasm  and  passion  for  Makerspaces,  squishy  circuits,  and  how  to  engage  kids   in  engineering  and  science  in  incredibly  creative  ways.  I  was  truly  inspired  by  her  passion  for   making  and  sharing  with  others.    She  reminded  us  that  all  children  are  makers,  and  as  adults  we   need  to  remember  to  be  curious,  explore,  and  play.    There  are  129  tweets  that  capture  not  only  her   fun  presentation  but  also  her  vision  for  making  in  the  future.  (See  image  5.)     Image  5   The  second  keynote  speaker  was  Lorcan  Dempsey,  @lorcand,  the  Vice  President,  OCLC  Research   and  Chief  Strategist.    He’s  known  primarily  for  the  research  he  presents  through  his  weblog,   http://orweblog.oclc.org,  where  he  makes  observations  on  the  way  users  interact  with  technology   and  the  discoverability  of  all  that  libraries  have  to  offer,  from  collections  to  services  to  expertise.   He  wants  to  make  library  data  more  usable.  In  his  talk,  he  explained  how  some  technologies  such   as  mobile  devices  and  IRs  are  having  huge  effects  on  user  behaviors.    “The  network  reshapes   society  and  society  reshapes  the  network.”  What  was  nice  also  is  that  Lorcan’s  talk  complimented   AnnMarie’s  talk  about  making  and  sharing.  Users  are  going  from  consumption  to  creation,  and  we,   as  libraries,  need  to  be  offering  our  services  and  content  in  the  users’  workflows.    We  need  to   share  our  resources,  make  them  more  discoverable.    Why?    “Discovery  often  happens  elsewhere.”     Check  out  the  123  posts  on  the  Twitter  archive,  which  includes  links  to  his  presentation.    (See   Image  6.)     Image  6     PRESIDENT’S  MESSAGE  |  VACEK       6   Kortney  Ryan  Ziegler,  @fakerapper,  is  the  Founder  Trans*h4ck  and  the  closing  keynote  speaker.     His  work  focuses  on  supporting  trans-­‐created  technology,  trans  entrepreneurs,  and  trans-­‐led   startups.  He’s  led  hackathons  and  helped  create  safe  spaces  for  the  trans  community.    His  work  is   so  important  and  many  of  the  apps  help  to  address  the  social  inequalities  that  the  trans   community  still  faces.    For  example,  he  mentioned  that  it’s  still  legal  in  36  states  to  be  fired  for   being  trans.    But  there  are  174  tweets  captured  at  the  Forum  that  give  examples  of  the  web  tools   created,  and  ideas  about  how  libraries  can  be  inclusive  and  more  supportive  of  the  trans   community.    (See  Image  7.)     Image  7   The  sessions  themselves  were  excellent,  and  many  sparked  conversations  long  after  the   presentation.  Lightning  talks  were  engaging,  fast,  and  fun.  Posters  were  both  beautiful  and   informative.  Overarching  terms  that  I  heard  repeatedly  and  saw  among  the  tweets  were:  Open   Graph,  OpenRefine,  social  media,  Makerspaces,  BIBFRAME,  library  labs,  leadership,  support,   community,  analytics,  assessment,  engagement,  inclusivity,  diversity,  agile  development,  open   access,  linked  data,  VIVO,  DataONE,  discovery  systems,  discoverability,  LibraryBox,  Islandora,  and   institutional  repositories.  Below  are  some  highlights:           INFORMATION  TECHNOLOGIES  AND  LIBRARIES  |  DECEMBER  1014         7       There  were  so  many  opportunities  to  network  at  sessions,  on  breaks,  at  the  networking  dinners,   and  even  at  game  night.  I  see  networking  as  a  huge  benefit  of  a  small  conference,  and  networking   can  lead  to  some  pretty  amazing  things.    For  example,  Whitni  Watkins,  @NimbleLibrarian  and  one   of  LITA’s  invaluable  volunteers  for  the  Forum,  was  so  inspired  by  a  conversation  on  OpenRefine   that  she  created  a  list  where  people  could  sign  up  to  learn  more  and  get  some  hands-­‐on  playing   time  with  the  tool.    On  her  blog,5  Whitni  says,  “…most  if  not  all  of  those  who  came  left  with  a  bit     PRESIDENT’S  MESSAGE  |  VACEK       8   more  knowledge  of  the  program  than  before  and  we  opened  a  door  of  possibility  for  those  who   hadn’t  any  clue  as  to  what  OpenRefine  could  do.”   Another  example  of  great  networking  is  where  Tabby  Farney,  @sharebrarian,  and  Cody  Behles,   @cbehles,  decided  to  create  a  LITA  Metrics  Interest  Group.    At  one  of  the  networking  dinners,  they   discussed  their  passion  for  altmetrics  and  web  analytics  but  noticed  that  there  wasn’t  an  existing   group,  and  felt  spurred  to  create  one.         The  technology  and  information  sharing,  the  networking,  the  collaborating,  and  the  strategizing  –   these  are  all  components  that  make  up  the  LITA  Forum.    Twitter  is  just  another  technology   platform  to  help  us  connect  with  one  another.    We  are  all  just  nodes,  and  technology  enables  us  to   both  become  the  network  and  to  network  more  effectively.   But  finally,  I  want  to  acknowledge  and  thank  our  sponsors,  many  of  which  are  also  LITA  members.   We  could  not  have  run  the  Forum  without  the  generous  funds  from  EBSCO,  Springshare,  @mire,   Innovative,  and  OCLC.  On  behalf  of  LITA,  I  truly  appreciate  their  support.   I  want  to  leave  you  with  one  more  image  that  was  created  by  @kayiwa  using  the  most  tweeted   words  from  all  the  posts.6  Next  year’s  Forum  is  in  Minneapolis,  and  I  hope  to  see  you  there.           INFORMATION  TECHNOLOGIES  AND  LIBRARIES  |  DECEMBER  1014         9   REFERENCES     1.  http://consumercentric.biz/wordpress/?p=106   2.https://docs.google.com/spreadsheet/pub?key=0AsyivMoYhk87dFNFX196V1E2M2ZQTVlhQ2J VS2FsdEE&output=html     3.  http://msk0.org/lita2014/litaforum-­‐directed-­‐retweets.html   4.  http://msk0.org/lita2014/lita2014.html   5.  http://nimblelibrarian.wordpress.com/2014/11/14/lita-­‐forum-­‐2014-­‐a-­‐recap/   6.  http://msk0.org/lita2014/litaforum-­‐wordcloud.html 5745 ---- Microsoft Word - June_ITAL_Deodato_final.docx Evaluating  Web-­‐Scale  Discovery:     A  Step-­‐by-­‐Step  Guide              Joseph  Deodato     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  JUNE  2015             19   ABSTRACT   Selecting a web-scale discovery service is a large and important undertaking that involves a significant investment of time, staff, and resources. Finding the right match begins with a thorough and carefully planned evaluation process. To be successful, this process should be inclusive, goal-oriented, data- driven, user-centered, and transparent. The following article offers a step-by-step guide for developing a web-scale discovery evaluation plan rooted in these five key principles based on best practices synthesized from the literature as well as the author’s own experiences coordinating the evaluation process at Rutgers University. The goal is to offer academic libraries that are considering acquiring a web-scale discovery service a blueprint for planning a structured and comprehensive evaluation process. INTRODUCTION As  the  volume  and  variety  of  information  resources  continue  to  multiply,  the  library  search   environment  has  become  increasingly  fragmented.  Instead  of  providing  a  unified,  central  point  of   access  to  its  collections,  the  library  offers  an  assortment  of  pathways  to  disparate  silos  of   information.  To  the  seasoned  researcher  familiar  with  these  resources  and  experienced  with  a   variety  of  search  tools  and  strategies,  this  maze  of  options  may  be  easy  to  navigate.  But  for  the   novice  user  who  is  less  accustomed  to  these  tools  and  even  less  attuned  to  the  idiosyncrasies  of   each  one’s  own  unique  interface,  the  sheer  amount  of  choice  can  be  overwhelming.  Even  if  the   user  manages  to  find  their  way  to  the  appropriate  resource,  figuring  out  how  to  use  it  effectively   becomes  yet  another  challenge.  This  is  at  least  partly  due  to  the  fact  that  the  expectations  and   behaviors  of  today’s  library  users  have  been  profoundly  shaped  by  their  experiences  on  the  web.   Popular  sites  like  Google  and  Amazon  offer  simple,  intuitive  interfaces  that  search  across  a  wide   range  of  content  to  deliver  immediate,  relevant,  and  useful  results.  In  comparison,  library  search   interfaces  often  appear  antiquated,  confusing,  and  cumbersome.  As  a  result,  users  are  increasingly   relying  on  information  sources  that  they  know  to  be  of  inferior  quality,  but  are  simply  easier  to   find.  As  Luther  and  Kelly  note,  the  biggest  challenge  academic  libraries  face  in  today’s  abundant   but  fragmented  information  landscape  is  “to  offer  an  experience  that  has  the  simplicity  of   Google—which  users  expect—while  searching  the  library’s  rich  digital  and  print  collections— which  users  need.”1  In  an  effort  to  better  serve  the  needs  of  these  users  and  improve  access  to   library  content,  libraries  have  begun  turning  to  new  technologies  capable  of  providing  deep   discovery  of  their  vast  scholarly  collections  from  a  single,  easy-­‐to-­‐use  interface.  These   technologies  are  known  as  web-­‐scale  discovery  services.     Joseph  Deodato  (jdeodato@rutgers.edu)  is  Digital  User  Services  Librarian  at  Rutgers  University,   New  Brunswick,  New  Jersey.     EVALUATING  WEB-­‐SCALE  DISCOVERY  SERVICES:  A  STEP-­‐BY-­‐STEP  GUIDE  |  DEODATO   doi:  10.6017/ital.v34i2.5745   20   To  paraphrase  Hoeppner,  a  web-­‐scale  discovery  service  is  a  large  central  index  paired  with  a   richly  featured  user  interface  providing  a  single  point  of  access  to  the  library’s  local,  open  access,   and  subscription  collections.2  Unlike  federated  search,  which  broadcasts  queries  in  real-­‐time  to   multiple  indexes  and  merges  the  retrieved  results  into  a  single  set,  web-­‐scale  discovery  relies  on  a   central  index  of  preharvested  data.  Discovery  vendors  contract  with  content  providers  to  index   their  metadata  and  full-­‐text  content,  which  is  combined  with  the  library's  own  local  collections   and  made  accessible  via  a  unified  index.  This  approach  allows  for  rapid  search,  retrieval,  and   ranking  of  a  broad  range  of  content  within  a  single  interface,  including  materials  from  the  library’s   catalog,  licensed  databases,  institutional  repository,  and  digital  collections.  Web-­‐scale  discovery   services  also  offer  a  variety  of  features  and  functionality  that  users  have  come  to  expect  from   modern  search  tools.  Features  such  as  autocorrect,  relevance  ranking,  and  faceted  browsing  make   it  easier  for  users  to  locate  library  materials  more  efficiently  while  enhanced  content  such  as  cover   images,  ratings,  and  reviews  offer  an  enriched  user  experience  while  providing  useful  contextual   information  for  evaluating  results.   Commercial  discovery  products  entered  the  market  in  2007  at  a  time  when  academic  libraries   were  feeling  pressure  to  compete  with  newer  and  more  efficient  search  tools  like  Google  Scholar.   To  improve  the  library  search  experience  and  stem  the  seemingly  rising  tide  of  defecting  users,   academic  libraries  were  quick  to  adopt  discovery  solutions  that  promised  improved  access  and   increased  usage  of  their  collections.  Yet  despite  the  significant  impact  these  technologies  have  on   staff  and  users,  libraries  have  not  always  undertaken  a  formal  evaluation  process  when  selecting  a   discovery  product.  Some  were  early  adopters  that  selected  a  product  at  a  time  when  there  few   other  options  existed  on  the  market.  Others  served  as  beta  sites  for  particular  vendors  or  simply   chose  the  product  offered  by  their  existing  ILS  or  federated  search  provider.  Still  others  had  a   selection  decision  made  for  them  by  their  library  director  or  consortium.  However,  despite  rapid   adoption,  the  web-­‐scale  discovery  market  has  only  just  begun  to  mature.  As  products  emerge  from   their  initial  release  and  more  information  about  them  becomes  available,  the  library  community   has  gained  a  better  understanding  of  how  web-­‐scale  discovery  services  work  and  their  particular   strengths  and  weaknesses.  In  fact,  some  libraries  that  have  already  implemented  a  discovery   service  are  currently  considering  switching  products.  Whether  your  library  is  new  to  the   discovery  marketplace  or  poised  for  reentry,  this  article  is  intended  to  help  you  navigate  to  the   best  product  to  meet  the  needs  of  your  institution.  It  covers  the  entire  process  from  soup  to  nuts   from  conducting  product  research  and  drafting  organizational  requirements  to  setting  up  local   trials  and  coordinating  user  testing.  By  combining  guiding  principles  with  practical  examples,  this   article  aims  to  offer  an  evaluation  model  rooted  in  best  practices  that  can  be  adapted  by  other   academic  libraries.   LITERATURE  REVIEW   As  the  adoption  of  web-­‐scale  discovery  services  continues  to  rise,  a  growing  body  of  literature  has   emerged  to  help  librarians  evaluate  and  select  the  right  product.  Moore  and  Greene  provide  a     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  JUNE  2015     21   useful  review  of  this  literature  summarizing  key  trends  such  as  the  timeframe  for  evaluation,  the   type  of  staff  involved,  the  products  being  evaluated,  and  the  methods  and  criteria  used  by   evaluators.3  Much  of  the  early  literature  on  this  subject  focuses  on  comparisons  of  product   features  and  functionality.  Rowe,  for  example,  offers  comparative  reviews  of  leading  commercial   services  on  the  basis  of  criteria  such  as  content,  user  interface,  pricing,  and  contract  options.4  Yang   and  Wagner  compare  commercial  and  open  source  discovery  tools  using  a  checklist  of  user   interface  features  that  includes  search  options,  faceted  navigation,  result  ranking,  and  Web  2.0   features.5  Vaughan  provides  an  in-­‐depth  look  at  discovery  services  that  includes  an  introduction   to  key  concepts,  detailed  profiles  on  each  major  service  provider,  and  a  list  of  questions  to   consider  when  selecting  a  product.6  A  number  of  authors  have  provided  useful  lists  of  criteria  to   help  guide  product  evaluations.  Hoeppner,  for  example,  offers  a  list  of  key  factors  such  as  breadth   and  depth  of  indexing,  search  and  refinement  options,  branding  and  customization,  and  tools  for   saving,  organizing,  and  exporting  results.7  Luther  and  Kelly  and  Hoseth  provide  a  similar  list  of   end-­‐user  features  but  also  include  institutional  considerations  such  as  library  goals,  cost,  vendor   support,  and  compatibility  with  existing  technologies.8     While  these  works  are  helpful  for  getting  a  better  sense  of  what  to  look  for  when  shopping  for  a   web-­‐scale  discovery  service,  they  do  not  offer  guidance  on  how  to  design  a  structured  evaluation   plan.  Indeed,  many  library  evaluations  have  tended  to  rely  on  what  can  be  described  as  the   checklist  method  of  evaluation.  This  typically  involves  creating  a  checklist  of  desirable  features   and  then  evaluating  products  on  the  basis  of  whether  they  provide  these  features.  For  example,  in   developing  an  evaluation  process  for  Rider  University,  Chickering  and  Yang  compiled  a  list  of   sixteen  user  interface  features,  examined  live  product  installations,  and  ranked  each  product   according  to  the  number  of  features  offered.9  Brubaker,  Leach-­‐Murray,  and  Parker  employed  a   similar  process  to  select  a  discovery  service  for  the  twenty-­‐three  members  of  the  Private   Academic  Library  Network  of  Indiana  (PALNI).10  These  types  of  evaluations  suffer  from  a  number   of  limitations.  First,  they  tend  to  rely  on  vendor  marketing  materials  or  reviews  of   implementations  at  other  institutions  rather  than  local  trials  and  testing.  Second,  product   requirements  are  typically  given  equal  weight  rather  than  prioritized  according  to  importance.   Third,  these  requirements  tend  to  focus  predominantly  on  user  interface  features  while  neglecting   equally  important  back  end  functionality  and  institutional  considerations.  Finally,  these   evaluations  do  not  always  include  input  or  participation  from  library  staff,  users,  and  stakeholders.   The  first  published  work  to  offer  a  structured  model  for  evaluating  web-­‐scale  discovery  services   was  Vaughan’s  “Investigations  into  Library  Web-­‐Scale  Discovery  Services.”11  Vaughan  outlines  the   evaluation  process  employed  at  University  of  Nevada,  Las  Vegas  (UNLV),  which,  in  addition  to   developing  a  checklist  of  product  requirements,  also  included  staff  surveys,  interviews  with  early   adopters,  vendor  demonstrations,  and  coverage  analysis.  The  author  also  provides  several  useful   appendixes  with  templates  and  documents  that  librarians  can  use  to  guide  their  own  evaluation.   Vaughan’s  work  also  appears  in  Popp  and  Dallis’  must-­‐read  compendium  Planning  and   Implementing  Resource  Discovery  Tools  in  Academic  Libraries.12  This  substantial  volume  presents     EVALUATING  WEB-­‐SCALE  DISCOVERY  SERVICES:  A  STEP-­‐BY-­‐STEP  GUIDE  |  DEODATO   doi:  10.6017/ital.v34i2.5745   22   forty  chapters  on  planning,  implementing,  and  maintaining  web-­‐scale  discovery  services,   including  an  entire  section  devoted  to  evaluation  and  selection.  In  it,  Vaughan  elaborates  on  the   UNLV  model  and  offers  useful  recommendations  for  creating  an  evaluation  team,  educating  library   staff,  and  communicating  with  vendors.13  Metz-­‐Wiseman  et  al.  offer  an  overview  of  best  practices   for  selecting  a  web-­‐scale  discovery  service  on  the  basis  of  interviews  with  librarians  from  fifteen   academic  institutions.14  Freivalds  and  Lush  of  Penn  State  University  explain  how  to  select  a  web-­‐ scale  discovery  service  through  a  Request  for  Proposal  (RFP)  process.15  Bietila  and  Olson  describe   a  series  of  tests  that  were  done  at  the  University  of  Chicago  to  evaluate  the  coverage  and   functionality  of  different  discovery  tools.16  Chapman  et  al.  explain  how  personas,  surveys,  and   usability  testing  were  used  to  develop  a  user-­‐centered  evaluation  process  at  University  of   Michigan.17     The  following  article  attempts  to  build  on  this  existing  literature,  combining  the  best  elements   from  evaluation  methods  employed  at  other  institutions  as  well  as  the  author’s  own,  with  the  aim   of  providing  a  comprehensive,  step-­‐by-­‐step  guide  to  evaluating  web-­‐scale  discovery  services   rooted  in  best  practices.   BACKGROUND   Rutgers,  The  State  University  of  New  Jersey,  is  a  public  research  university  consisting  of  thirty-­‐two   schools  and  colleges  offering  degrees  in  the  liberal  arts  and  sciences  as  well  as  programs  in   professional  and  continuing  education.  The  university  is  distributed  across  three  regional   campuses  serving  more  than  65,000  students  and  24,000  faculty  and  staff.  The  Rutgers  University   Libraries  comprise  twenty-­‐six  libraries  and  centers  with  a  combined  collection  of  more  than  10.5   million  print  and  electronic  holdings.  The  Libraries’  collections  and  services  support  the   curriculum  of  the  university’s  many  degree  programs  as  well  as  advanced  research  in  all  major   academic  disciplines.   In  January  2013,  the  Libraries  appointed  a  cross-­‐departmental  team  to  research,  evaluate,  and   recommend  the  selection  of  a  web-­‐scale  discovery  service.  The  impetus  for  this  initiative  derived   from  a  demonstrated  need  to  improve  the  user  search  experience  on  the  basis  of  data  collected   over  the  last  several  years  through  ethnographic  studies,  user  surveys,  and  informal  interactions   at  the  reference  desk  and  in  the  classroom.  Users  reported  high  levels  of  dissatisfaction  with   existing  library  search  tools  such  as  the  catalog  and  electronic  databases,  which  they  found   confusing  and  difficult  to  navigate.  Above  all,  users  demanded  a  simple,  intuitive  starting  point   from  which  to  search  and  access  the  library’s  collections.  Accordingly,  the  Libraries  began   investigating  ways  to  improve  access  with  web-­‐scale  discovery.  The  evaluation  team  examined   offerings  from  four  leading  web-­‐scale  discovery  providers,  including  EBSCO  Discovery  Service,   ProQuest’s  Summon,  Ex  Libris’  Primo,  and  OCLC’s  WorldCat  Local.  The  process  lasted   approximately  nine  months  and  included  extensive  product  and  user  research,  vendor   demonstrations,  an  RFP,  reference  interviews,  trials,  surveys,  and  product  testing.  See  appendix  A   for  an  overview  of  the  evaluation  plan.     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  JUNE  2015     23   By  the  time  it  began  its  evaluation,  Rutgers  was  already  a  latecomer  to  the  discovery  game.  Most  of   our  peers  had  already  been  using  web-­‐scale  discovery  services  for  many  years.  However,  Rutgers’   less-­‐than-­‐stellar  experience  with  federated  search  had  led  it  to  adopt  a  more  cautious  attitude   toward  the  latest  and  greatest  of  library  “holy  grails.”  This  wait-­‐and-­‐see  approach  proved  highly   beneficial  in  the  end  as  it  allowed  time  for  the  discovery  market  to  mature  and  gave  the  evaluation   team  an  opportunity  to  learn  from  the  successes  and  failures  of  early  adopters.  In  planning  its   evaluation,  the  Rutgers  team  was  able  to  draw  on  the  experiences  of  earlier  pioneers  such  as   UNLV,  Penn  State,  the  University  of  Chicago,  and  the  University  of  Michigan.  It  was  on  the   metaphorical  shoulders  of  these  library  giants  that  Rutgers  built  its  own  successful  evaluation   process.  What  follows  is  a  step-­‐by-­‐step  guide  for  evaluating  and  selecting  a  web-­‐scale  discovery   service  on  the  basis  of  best  practices  synthesized  from  the  literature  as  well  as  the  author’s  own   experiences  coordinating  the  evaluation  process  at  Rutgers.  Given  the  rapidly  changing  nature  of   the  discovery  market,  the  focus  of  this  article  is  on  the  process  rather  than  the  results  of  Rutgers’   evaluation.  While  the  results  will  undoubtedly  be  outdated  by  the  time  this  article  goes  to  press,   the  process  is  likely  to  remain  relevant  and  useful  for  years  to  come.   Form  an  Evaluation  Team   The  first  step  in  selecting  a  web-­‐scale  discovery  service  is  appointing  a  team  that  will  be   responsible  for  conducting  the  evaluation.  Composition  of  the  team  will  vary  depending  on  local   practice  and  staffing,  but  should  include  representatives  from  a  broad  cross  section  of  library   units,  including  collections,  public  services,  technical  services,  and  systems.  Institutions  with   multiple  campuses,  schools,  or  library  branches  will  want  make  sure  the  interests  of  these   constituencies  are  also  represented.  If  feasible,  the  library  should  consider  including  actual  users   on  the  evaluation  team.  These  may  be  members  of  an  existing  user  advisory  board  or  recruits   from  among  the  library’s  student  employees  and  faculty  liaisons.  Including  users  on  your   evaluation  team  will  keep  the  process  focused  on  user  needs  and  ensure  that  the  library  selects   the  best  product  to  meet  them.   There  are  many  reasons  for  establishing  an  inclusive  evaluation  team.  First,  discovery  tools  have   broad  implications  for  a  wide  range  of  library  services  and  functions.  Therefore  a  diversity  of   library  expertise  is  required  for  an  informed  and  comprehensive  evaluation.  Reference  and   instruction  librarians  will  need  to  evaluate  the  functionality  of  the  tool,  the  quality  of  results,  and   its  role  in  the  research  process.  Collections  staff  will  need  to  assess  scope  of  coverage  and   congruency  with  the  library’s  existing  subscriptions.  Access  services  will  need  to  assess  how  the   tool  handles  local  holdings  information  and  integrates  with  borrowing  and  delivery  services  like   interlibrary  loan.  Catalogers  will  need  to  evaluate  metadata  requirements  and  procedures  for   harvesting  local  records.  IT  staff  will  need  to  assess  technical  requirements  and  compatibility  with   existing  infrastructure  and  systems.   Second,  depending  on  the  size  and  goals  of  the  institution,  the  product  may  be  expected  to  serve  a   wide  community  of  users  with  different  needs,  skill  levels,  and  academic  backgrounds.  Large     EVALUATING  WEB-­‐SCALE  DISCOVERY  SERVICES:  A  STEP-­‐BY-­‐STEP  GUIDE  |  DEODATO   doi:  10.6017/ital.v34i2.5745   24   universities  that  include  multiple  schools,  offer  various  degree  programs,  or  have  specialized   programs  like  law  or  medicine  will  need  to  determine  if  and  how  a  new  discovery  tool  will  address   the  needs  of  all  these  users.  It  is  important  that  the  composition  of  the  evaluation  team  adequately   represents  the  interests  of  the  different  user  groups  the  tool  is  intended  to  serve.  The  evaluation  at   Rutgers  was  conducted  by  a  cross-­‐departmental  team  of  fifteen  members  and  included  experts   from  a  variety  of  library  units  and  representatives  from  all  campuses.   Finally,  because  web-­‐scale  discovery  brings  such  profound  changes  to  staff  and  user  workflows,   decisions  regarding  selection  and  implementation  are  often  fraught  with  controversy.  As  noted,   discovery  tools  impact  a  wide  range  of  library  services  and  therefore  require  careful  evaluation   from  the  perspectives  of  multiple  stakeholders.  Furthermore,  these  tools  dramatically  change  the   nature  of  library  research,  and  not  everyone  in  your  organization  may  view  this  change  as  being   for  the  better.  Despite  growing  rates  of  adoption,  debates  over  the  value  and  utility  of  web-­‐scale   discovery  continue  to  divide  librarians.18  According  to  one  survey,  securing  staff  buy-­‐in  is  the   biggest  challenge  academic  libraries  face  when  implementing  a  web-­‐scale  discovery  service.19   Ensuring  broad  involvement  early  in  the  process  will  help  to  secure  organizational  buy-­‐in  and   support  for  the  selected  product.   While  broad  representation  is  important,  having  a  large  and  diverse  team  can  sometimes  slow   down  the  process;  schedules  can  be  difficult  to  coordinate,  members  may  have  competing  views  or   demands  on  their  time,  meetings  can  lose  focus  or  wander  off  topic,  etc.  The  more  members  on   your  evaluation  team,  the  more  difficult  the  team  may  be  to  manage.  One  strategy  for  managing  a   large  group  might  be  to  create  a  smaller,  core  team  with  all  other  members  serving  on  an  ad  hoc   basis.  The  core  team  functions  as  a  steering  committee  to  manage  the  project  and  calls  on  the  ad   hoc  members  at  different  stages  in  the  evaluation  process  where  their  input  and  expertise  is   needed.  Another  strategy  would  be  to  break  the  larger  group  into  several  functional  teams,  each   responsible  for  evaluating  specific  aspects  of  the  discovery  tool.  For  example,  one  team  might   focus  on  functionality,  another  on  technology,  a  third  on  administration,  etc.  This  method  also  has   the  advantage  of  distributing  the  workload  among  team  members  and  breaking  down  a  complex   evaluation  process  into  discrete,  more  manageable  parts.   Like  any  other  committee  or  taskforce,  your  evaluation  team  should  have  a  charge  outlining  its   responsibilities,  timetable  of  deliverables,  reporting  structure,  and  membership.  The  charge   should  also  include  a  vision  or  goals  statement  that  explicitly  states  the  underlying  assumptions   and  premises  of  the  discovery  tool,  its  purpose,  and  how  it  supports  the  library’s  larger  mission  of   connecting  users  with  information.20  Although  frequently  highlighted  in  the  literature,  the   importance  of  defining  institutional  goals  for  discovery  is  often  overlooked  or  taken  for  granted.21   Having  a  vision  statement  is  crucial  to  the  success  of  the  project  for  multiple  reasons.  First,  it   frames  the  evaluation  process  by  establishing  mutually  agreed-­‐upon  goals  and  priorities  for  the   product.  Before  the  evaluation  can  begin,  the  team  must  have  a  clear  understanding  of  what   problems  the  discovery  service  is  expected  to  solve,  who  it  is  intended  to  serve,  and  how  it     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  JUNE  2015     25   supports  the  library’s  strategic  goals.  Is  the  service  primarily  intended  for  undergraduates,  or  is  it   also  expected  to  serve  graduate  students  and  faculty?  Is  it  a  one-­‐stop  shop  for  all  information   needs,  a  starting  point  in  a  multi-­‐step  research  process,  or  merely  a  useful  tool  for  general  and   interdisciplinary  research?  Second,  having  a  clear  vision  for  the  product  will  help  guide   implementation  and  assessment.  It  will  not  only  help  the  library  decide  how  to  configure  the   product  and  what  features  to  prioritize,  but  also  offer  explicit  benchmarks  by  which  to  evaluate   performance.  Finally,  aligning  web-­‐scale  discovery  with  the  library’s  strategic  plan  will  help  put   the  project  in  wider  context  and  secure  buy-­‐in  across  all  units  in  the  organization.  Having  a  clear   understanding  of  how  the  product  will  be  integrated  with  and  support  other  library  services  will   help  minimize  common  misunderstandings  and  ensure  wider  adoption.   Educate  Library  Stakeholders   Despite  the  quick  maturation  and  adoption  of  web-­‐scale  discovery  services,  these  technologies  are   still  relatively  new.  Many  librarians  in  your  organization,  including  those  on  the  evaluation  team,   may  only  possess  a  cursory  understanding  of  what  these  tools  are  and  how  they  function.  Creating   an  inclusive  evaluation  process  requires  having  an  informed  staff  that  can  participate  in  the   discussions  and  decision-­‐making  processes  leading  to  product  selection.  Therefore  the  first  task  of   your  evaluation  team  should  be  to  educate  themselves  and  their  colleagues  on  the  ins  and  outs  of   web-­‐scale  discovery  services.  This  should  include  performing  a  literature  review,  collecting   information  about  products  currently  on  the  market,  and  reviewing  live  implementations  at  other   institutions.   At  Rutgers,  the  evaluation  team  conducted  an  extensive  literature  review  that  resulted  in   annotated  bibliography  covering  all  aspects  of  web-­‐scale  discovery,  including  general   introductions,  product  reviews,  and  methodologies  for  evaluation,  implementation,  and   assessment.  All  team  members  were  encouraged  to  read  this  literature  to  familiarize  themselves   with  relevant  terminology,  products,  and  best  practices.  The  team  also  collected  product   information  from  vendor  websites  and  reviewed  live  implementations  at  other  institutions.  In  this   way,  members  were  able  to  familiarize  themselves  with  the  different  features  and  functionality   offered  by  each  vendor.   Once  the  team  has  done  its  research,  it  can  begin  sharing  its  findings  with  the  rest  of  the  library   community.  Vaughan  recommends  establishing  a  quick  and  easy  means  of  disseminating   information  such  as  an  internal  staff  website,  blog,  or  wiki  that  staff  can  visit  on  their  own  time.22   The  Rutgers  team  created  a  private  LibGuide  that  served  as  a  central  repository  for  all  information   related  to  the  evaluation  process,  including  a  brief  introduction  to  web-­‐scale  discovery,   information  about  each  product,  recorded  vendor  demonstrations,  links  to  live  implementations,   and  an  annotated  bibliography.  Also  included  was  information  about  the  team’s  ongoing  work,   including  the  group’s  charge,  timeline,  meeting  minutes,  and  reports.  In  addition  to  maintaining  an   online  presence,  the  team  also  held  a  series  of  public  forums  and  workshops  to  educate  staff  about   the  nature  of  web-­‐scale  discovery  as  well  as  provide  updates  on  the  evaluation  process  and     EVALUATING  WEB-­‐SCALE  DISCOVERY  SERVICES:  A  STEP-­‐BY-­‐STEP  GUIDE  |  DEODATO   doi:  10.6017/ital.v34i2.5745   26   respond  to  questions  and  concerns.  By  providing  staff  with  a  foundation  for  understanding  web-­‐ scale  discovery  and  the  process  by  which  these  products  were  to  be  evaluated,  the  team  sought  to   maximize  the  engagement  and  participation  of  the  larger  library  community.   Schedule  Vendor  Demonstrations   Once  everyone  has  a  conceptual  understanding  of  what  web-­‐scale  discovery  services  do  and  how   they  work,  it  is  time  to  begin  inviting  onsite  vendor  demonstrations.  These  presentations  give   library  staff  an  opportunity  to  see  these  products  in  action  and  ask  vendors  in-­‐depth  questions.   Sessions  are  usually  led  by  a  sales  representative  and  product  manager  and  typically  include  a   brief  history  of  the  product’s  development,  a  demonstration  of  key  features  and  functionality,  and   an  audience  question-­‐and-­‐answer  period.  To  provide  a  level  playing  field  for  comparison,  the   evaluation  team  may  wish  to  submit  a  list  of  topics  or  questions  for  each  vendor  to  address  in   their  presentation.  This  could  be  a  general  outline  of  key  areas  of  interest  identified  by  the   evaluation  team  or  a  list  of  specific  questions  solicited  from  the  wider  library  community.  Vaughan   offers  a  useful  list  of  questions  that  librarians  may  wish  to  consider  to  structure  vendor   demonstrations.23  One  tactic  used  by  the  evaluation  team  at  Auburn  University  involved  requiring   vendors  to  use  their  products  to  answer  a  series  of  actual  reference  questions.24  This  not  only   precluded  them  from  using  canned  searches  that  might  only  showcase  the  strengths  of  their   products,  but  also  gave  librarians  a  better  sense  of  how  these  products  would  perform  out  in  the   wild  against  real  user  queries.  Another  approach  might  be  to  invite  actual  users  to  the   demonstrations.  Whether  you  are  fortunate  enough  to  have  users  on  your  evaluation  team  or  able   to  encourage  a  few  library  student  workers  to  attend,  your  users  may  raise  important  questions   that  your  staff  has  overlooked.   Vendor  demonstrations  should  only  be  scheduled  after  the  evaluation  team  has  had  an   opportunity  to  educate  the  wider  library  community.  An  informed  staff  will  get  more  out  of  the   demos  and  be  better  equipped  to  ask  focused  questions.  As  Vaughan  suggests,  demonstrations   should  be  scheduled  in  close  proximity  (preferably  within  the  same  month)  to  sustain  staff   engagement,  facilitate  retention  of  details,  and  make  it  easier  to  compare  services.25  With  the   vendor’s  permission,  libraries  should  also  consider  recording  these  sessions  and  making  them   available  to  staff  members  who  are  unable  to  attend.  At  the  conclusion  of  each  demonstration,   staff  should  be  invited  to  offer  their  feedback  on  the  presentation  or  ask  any  follow-­‐up  questions.   This  can  be  accomplished  by  distributing  a  brief  paper  or  online  survey  to  the  attendees.   Create  an  Evaluation  Rubric   Perhaps  the  most  important  part  of  the  evaluation  process  is  developing  a  list  of  key  criteria  that   will  be  used  to  evaluate  and  compare  vendor  offerings.  Once  the  evaluation  team  has  a  better   understanding  of  what  these  products  can  do  and  the  different  features  and  functionality  offered   by  each  vendor,  it  can  begin  defining  the  ideal  discovery  environment  for  its  institution.  This  often   takes  the  form  of  a  list  of  desirable  features  or  product  requirements.  The  process  for  generating     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  JUNE  2015     27   these  criteria  tends  to  vary  by  institution.  In  some  cases,  they  are  defined  by  the  team  leader  or   based  on  criteria  used  for  past  technology  purchases.26  In  other  cases,  criteria  are  compiled   through  a  review  of  the  literature.27  In  yet  other  cases,  they  are  developed  and  refined  with  input   from  library  staff  through  staff  surveys  and  meetings.28   One  important  element  missing  from  all  of  these  approaches  is  the  user.  To  ensure  the  evaluation   team  selects  the  best  tool  for  library  users,  product  requirements  should  be  firmly  rooted  in  an   assessment  of  user  needs.  The  University  of  Michigan,  for  example,  used  persona  analysis  to   identify  common  user  needs  and  distilled  these  into  a  list  of  tangible  features  that  could  be  used   for  product  evaluation.29  Other  tactics  for  assessing  user  needs  and  expectations  might  include   user  surveys,  interviews,  or  focus  groups.  These  tools  can  be  useful  for  gathering  information   about  what  users  want  from  your  web-­‐scale  discovery  system.  However,  these  methods  should  be   used  with  caution,  as  users  themselves  don’t  always  know  what  they  want,  particularly  from  a   product  they  have  never  used.  Furthermore,  as  usability  experts  have  pointed  out,  what  users  say   they  want  may  not  be  what  they  actually  need.30  Therefore  it  is  important  to  validate  data   collected  from  surveys  and  focus  groups  with  usability  testing.  To  reliably  determine  whether  a   product  meets  the  needs  of  your  users,  it  is  best  to  observe  what  users  actually  do  rather  than   what  they  say  they  do.   If  the  evaluation  team  has  a  short  timeframe  or  is  unable  to  undertake  extensive  user  research,  it   may  be  able  to  develop  product  requirements  on  the  basis  of  existing  research.  At  Rutgers,  for   example,  the  Libraries’  department  of  planning  and  assessment  conducts  a  standing  survey  to   collect  information  about  users’  opinions  of  and  satisfaction  with  library  services.  The  evaluation   team  was  able  to  use  this  data  to  learn  more  about  what  users  like  and  don’t  like  about  the   library’s  current  search  environment.  The  team  analyzed  more  than  700  user  comments  collected   from  2009  to  2012  related  to  the  library’s  catalog  and  electronic  resources.  Comments  were   mapped  to  specific  types  of  features  and  functionality  that  users  want  or  expect  from  a  library   search  tool.  Since  most  users  don’t  typically  articulate  their  needs  in  terms  of  concrete  technical   requirements,  some  interpretation  was  required  on  the  part  of  the  evaluation  team.  For  example,   the  average  user  may  not  necessarily  know  what  faceted  browsing  is,  but  a  suggestion  that  there   be  “a  way  to  browse  through  books  by  category  instead  of  always  having  to  use  the  search  box”   could  reasonably  be  interpreted  as  a  request  for  this  feature.  Features  were  ranked  in  order  of   importance  by  the  number  of  comments  made  about  it.  Some  of  the  most  “requested”  features   included  single  point  of  access,  “smart”  search  functionality  such  as  autocorrect  and  autocomplete,   and  improved  relevance  ranking.   Of  course,  user  needs  are  not  the  only  criteria  to  be  considered  when  choosing  a  discovery  service.   Organizational  and  staff  needs  must  also  be  taken  into  account.  User  input  is  important  for   defining  the  functionality  of  the  public  interface,  but  staff  input  is  necessary  for  determining  back-­‐ end  functionality  and  organizational  fit.  To  the  list  of  user  requirements,  the  evaluation  team   added  institutional  requirements  related  to  factors  such  as  cost,  coverage,  customizability,  and     EVALUATING  WEB-­‐SCALE  DISCOVERY  SERVICES:  A  STEP-­‐BY-­‐STEP  GUIDE  |  DEODATO   doi:  10.6017/ital.v34i2.5745   28   support.  The  team  then  conducted  a  library-­‐wide  survey  inviting  all  staff  to  rank  these   requirements  in  order  of  importance  and  offer  any  additional  requirements  that  should  be   factored  into  the  evaluation.   Combining  the  input  from  library  staff  and  users,  the  evaluation  team  drafted  a  list  of  fifty-­‐five   product  requirements  (see  appendix  B),  which  became  the  basis  for  a  comprehensive  evaluation   rubric  that  would  be  used  to  evaluate  and  ultimately  select  a  web-­‐scale  discovery  service.  The   design  of  the  rubric  was  largely  modeled  after  the  one  developed  at  Penn  State.31  Requirements   were  arranged  into  five  categories:  content,  functionality,  usability,  administration,  and   technology.  Each  category  was  allocated  to  a  sub  team  according  to  area  of  expertise  that  would  be   responsible  for  that  portion  of  the  evaluation.  Each  requirement  was  assigned  a  weight  according   to  its  degree  of  importance:  3  =  mandatory,  2  =  desired,  1  =  optional.  Each  product  was  given  a   score  based  on  how  well  it  met  each  requirement:  3  =  fully  meets,  2  =  partially  meets,  1  =  barely   meets,  0  =  does  not  meet.  The  total  number  of  points  awarded  for  each  requirement  was   calculated  by  multiplying  weight  by  score.  The  final  score  for  each  product  was  calculated  by   summing  up  the  total  number  of  points  awarded  (see  appendix  C).   This  scoring  method  was  particularly  helpful  in  minimizing  the  influence  of  bias  on  the  evaluation   process.  Keep  in  mind  that  some  stakeholders  may  possess  personal  preferences  for  or  against  a   particular  product  because  of  current  or  past  relations  with  the  vendor,  their  experiences  with  the   product  while  at  another  institution,  or  their  perception  of  how  the  product  might  impact  their   own  work.  By  establishing  a  set  of  predefined  criteria,  rooted  in  local  needs  and  measured   according  to  clear  and  consistent  standards,  the  team  adopted  an  evaluation  model  that  was  not   only  user-­‐centered,  but  also  allowed  for  a  fair,  unbiased,  and  systematic  evaluation  of  vendor   offerings.  This  is  particularly  important  for  libraries  that  must  go  through  a  formal  procurement   process  to  purchase  a  web-­‐scale  discovery  service.   Draft  the  RFP   Once  the  evaluation  team  has  defined  its  product  requirements  and  established  a  method  for   evaluating  the  products  in  the  marketplace,  it  can  set  to  work  drafting  a  formal  RFP.  Some   institutions  may  be  able  to  forego  the  RFP  process.  Others,  like  Rutgers,  are  required  to  go  through   a  competitive  bidding  process  for  any  goods  and  services  purchased  over  a  certain  dollar  amount.   The  only  published  model  on  selecting  a  discovery  service  through  the  RFP  process  is  offered  by   Freivalds  and  Lush.32  The  authors  provide  a  brief  overview  of  the  pros  and  cons  of  using  an  RFP,   describe  the  process  developed  at  Penn  State,  and  offer  several  useful  templates  to  help  guide  the   evaluation.   The  RFP  lets  vendors  know  that  the  organization  is  interested  in  their  product,  outlines  the   organization’s  requirements  for  said  product,  and  gives  the  vendors  an  opportunity  to  explain  in   detail  how  their  product  meets  these  requirements.  RFPs  are  usually  written  in  collaboration  with     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  JUNE  2015     29   your  university’s  purchasing  department  who  typically  provides  a  template  for  this  purpose.  At  a   minimum,  your  RFP  should  include  the  following:   • background  information  about  the  library,  including  size,  user  population,  holdings,  and   existing  technical  infrastructure   • a  description  of  the  product  being  sought,  including  product  requirements,  services  and   support  expected  from  the  vendor,  and  the  anticipated  timeline  for  implementation   • a  summary  of  the  criteria  that  will  be  used  to  evaluate  proposals,  the  deadline  for   submission,  and  the  preferred  format  of  responses   • any  additional  terms  or  conditions  such  as  requiring  vendors  to  provide  references,  onsite   demonstrations,  trial  subscriptions,  or  access  to  support  and  technical  documentation   • information  about  who  to  contact  regarding  questions  related  to  the  RFP   RFPs  are  useful  not  only  because  they  force  the  library  to  clearly  articulate  its  needs  for  web-­‐scale   discovery,  but  also  because  they  produce  a  detailed,  written  record  of  product  information  that   can  be  referenced  throughout  the  evaluation  process.  The  key  component  of  Rutgers’  RFP  was  a   comprehensive,  135-­‐item  questionnaire  that  asked  vendors  to  spell  out  in  painstaking  detail  the   design,  technical,  and  functional  specifications  of  their  products  (see  appendix  D).  Many  of  the   questions  were  either  borrowed  from  the  existing  literature  or  submitted  by  members  of  the   evaluation  team.  All  questions  were  directly  mapped  to  criteria  from  the  team’s  evaluation  rubric.   The  responses  were  used  to  determine  how  well  each  product  met  these  criteria  and  factored  into   product  scoring.  Vendors  were  given  one  month  to  respond  to  the  RFP.   Interview  Current  Customers   While  vendor  marketing  materials,  demonstrations,  and  questionnaires  are  important  sources  of   product  information,  vendor  claims  should  not  simply  be  taken  at  face  value.  To  obtain  an   impartial  assessment  of  the  products  under  consideration,  the  evaluation  team  should  reach  out  to   current  customers.  There  are  several  ways  to  identify  current  discovery  service  subscribers.  Many   published  overviews  of  web-­‐scale  discovery  services  offer  lists  of  example  implementations  for   each  major  discovery  provider.33  Most  vendors  also  provide  a  list  of  subscribers  on  their  website   or  community  wiki  (or  will  provide  one  on  request).  And,  of  course,  there  is  also  Marshall   Breeding’s  invaluable  website,  Library  Technology  Guides,  which  provides  up-­‐to-­‐date  information   about  technology  products  used  by  libraries  around  the  world.34  The  advanced  search  allows  you   to  filter  libraries  by  criteria  such  as  type,  collection  size,  geographic  area,  and  ILS,  thereby  making   it  easier  to  identify  institutions  similar  to  your  own.   As  part  of  the  RFP  process,  all  four  vendors  were  required  to  provide  references  for  three  current   academic  library  customers  of  equivalent  size  and  classification  to  Rutgers.  These  twelve   references  were  then  invited  to  take  an  online  survey  asking  them  to  share  their  opinions  of  and   experiences  with  the  product  (see  appendix  E).  The  survey  consisted  of  a  series  of  Likert-­‐scale   questions  asking  each  reference  to  rate  their  satisfaction  with  various  functions  and  features  of     EVALUATING  WEB-­‐SCALE  DISCOVERY  SERVICES:  A  STEP-­‐BY-­‐STEP  GUIDE  |  DEODATO   doi:  10.6017/ital.v34i2.5745   30   their  discovery  service.  This  was  followed  by  many  in-­‐depth  written  response  questions  regarding   topics  such  as  coverage,  quality  of  results,  interface  usability,  customization,  and  support.  Follow-­‐ up  phone  interviews  were  conducted  in  cases  where  additional  information  or  clarification  was   needed.   The  surveys  permitted  the  evaluation  team  to  collect  feedback  from  current  customers  in  a  way   that  was  minimally  obtrusive  while  allowing  for  easy  analysis  and  comparison  of  responses.  It  also   provided  a  necessary  counterbalance  to  vendor  claims  by  giving  the  team  a  much  more  candid   view  of  each  product’s  strengths  and  weaknesses.  The  reference  interviews  helped  highlight   issues  and  areas  of  concern  that  were  frequently  minimized  or  glossed  over  in  communications   with  vendors  such  as  gaps  in  coverage,  inconsistent  metadata,  duplicate  results,  discoverability  of   local  collections,  and  problems  with  known-­‐item  searching.   Configure  and  Test  Local  Trials   Although  the  evaluation  team  should  strive  to  collect  as  much  product  information  from  as  many   sources  as  possible,  no  amount  of  research  can  effectively  substitute  for  a  good  old-­‐fashioned  trial   evaluation.  Conducting  trials  using  the  library’s  own  collections  and  local  settings  is  the  best  way   to  gain  first-­‐hand  insight  into  how  a  discovery  service  works.  For  some  libraries,  the  expenditure   of  time  and  effort  involved  in  configuring  a  web-­‐scale  discovery  service  can  make  the  prospect  of   conducting  trials  prohibitive.  As  a  result,  many  discovery  evaluations  tend  to  rely  on  testing   existing  implementations  at  other  institutions.  However,  this  method  of  evaluation  only  scratches   the  surface.  For  one  thing,  the  evaluation  team  is  only  able  to  observe  the  front-­‐end  functionality   of  the  public  interface.  But  setting  up  a  local  trial  gives  the  library  an  opportunity  to  peak  under   the  hood  and  learn  about  back-­‐end  administration,  explore  configuration  and  customization   options,  attain  a  deeper  understanding  of  the  composition  of  the  central  index,  and  get  a  better   feel  for  what  it  is  like  working  with  the  vendor.  Second,  discovery  services  are  highly  customizable   and  the  availability  of  certain  features,  functionality,  and  types  of  content  varies  by  institution.  As   Hoeppner  points  out,  no  individual  site  is  capable  of  demonstrating  the  “full  range  of  possibilities”   available  from  any  vendor.35  The  presence  or  absence  of  certain  features  has  as  much  to  do  with   local  library  decisions  as  they  do  with  any  inherent  limitations  of  the  product.  Finally,  establishing   trials  gives  the  evaluation  team  an  opportunity  to  see  how  a  particular  discovery  service  performs   within  its  own  local  environment.  The  ability  to  see  how  the  product  works  with  the  library’s  own   records,  ILS,  link  resolver,  and  authentication  system  allows  the  team  to  evaluate  the  compatibility   of  the  discovery  service  with  the  library’s  existing  technical  infrastructure.   At  Rutgers,  one  of  the  goals  of  the  RFP  was  to  help  narrow  the  pool  of  potential  candidates  from   four  to  two.  The  evaluation  team  was  asked  to  review  vendor  responses  and  apply  the  evaluation   rubric  to  assign  each  a  preliminary  score  on  the  basis  of  how  well  they  met  the  library’s   requirements.  The  two  top-­‐scoring  candidates  would  then  be  selected  for  a  trial  evaluation  that   would  allow  the  team  to  conduct  further  testing  and  make  a  final  recommendation.  However,  after   the  proposals  were  reviewed,  the  scores  for  three  of  the  products  were  so  close  that  the  team     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  JUNE  2015     31   decided  to  trial  all  three.  The  one  remaining  product  scored  notably  lower  than  its  competitors   and  was  dropped  from  further  consideration.   Configuring  trials  for  three  different  web-­‐scale  discovery  services  was  no  easy  task,  to  be  sure.  An   implementation  team  was  formed  to  work  with  the  vendors  to  get  the  trials  up  and  running.  The   team  received  basic  training  for  each  product  and  was  given  full  access  to  support  and  technical   documentation.  Working  with  the  vendors,  the  implementation  team  set  to  work  loading  the   library’s  records  and  configuring  local  settings.  For  the  most  part,  the  trials  were  basic  out-­‐of-­‐the-­‐ box  implementations  with  minimal  customization.  The  vendors  were  willing  to  do  much  of  the   configuration  work  for  us,  but  it  was  important  that  the  team  learn  and  understand  the   administrative  functionality  of  each  product,  as  this  was  an  integral  part  of  the  evaluation  process.   All  vendors  agreed  to  a  three-­‐month  trial  period  during  which  the  evaluation  team  ran  their   products  through  a  series  of  tests  assessing  three  key  areas:  coverage,  usability,  and  relevance   ranking.   The  importance  of  product  testing  cannot  be  overstated.  As  previously  mentioned,  web-­‐scale   discovery  affect  a  wide  variety  of  library  services  and,  in  most  cases,  will  likely  serve  as  the  central   point  of  access  to  the  library’s  collections.  Before  committing  to  a  product,  the  library  should  have   an  opportunity  to  conduct  independent  testing  to  validate  vendor  claims  and  ensure  that  their   products  function  according  to  the  library’s  expectations.  To  ensure  that  critical  issues  are   uncovered,  testing  should  strive  to  simulate  as  much  as  possible  the  environment  and  behavior  of   your  users  by  employing  sample  searches  and  strategies  that  they  themselves  would  use.  In  fact,   wherever  possible,  users  should  be  invited  to  participate  in  testing  and  offer  their  feedback  about   the  products  under  consideration.  Testing  checklists  and  scripts  must  also  be  created  to  guide   testers  and  ensure  consistency  throughout  the  process.  As  Mandernach  and  Condit  Fagan  point   out,  although  product  testing  is  time-­‐consuming  and  labor-­‐intensive,  it  will  ultimately  save  the   time  of  your  users  and  staff  who  would  otherwise  be  the  first  to  encounter  any  bugs  and  help   avoid  early  unfavorable  impressions  of  the  product.36   The  first  test  the  evaluation  team  conducted  aimed  at  evaluating  the  coverage  and  quality  of   indexing  of  each  discovery  product  (see  appendix  F).  Loosely  borrowing  from  methods  employed   at  University  of  Chicago,  twelve  library  subject  specialists  were  recruited  to  help  assess  coverage   within  their  discipline.37  Each  subject  specialist  was  asked  to  perform  three  search  queries   representing  popular  research  topics  in  their  discipline  and  compare  the  results  from  each   discovery  service  with  respect  to  breadth  of  coverage  and  quality  of  indexing.  In  scoring  each   product,  subject  specialists  were  asked  to  consider  the  following  questions:     • Do  the  search  results  demonstrate  broad  coverage  of  the  variety  of  subjects,  formats,   and  content  types  represented  in  the  library’s  collection?     • Do  any  particular  types  of  content  seem  to  dominate  the  results  (books,  journal  articles,   newspapers,  book  reviews,  reference  materials,  etc.)?     EVALUATING  WEB-­‐SCALE  DISCOVERY  SERVICES:  A  STEP-­‐BY-­‐STEP  GUIDE  |  DEODATO   doi:  10.6017/ital.v34i2.5745   32   • Are  the  library’s  local  collections  adequately  represented  in  the  results?   • Do  any  relevant  resources  appear  to  be  missing  from  the  search  results  (i.e.,  results   from  an  especially  relevant  database  or  journal)?   • Do  item  records  contain  complete  and  accurate  source  information?   • Do  item  records  contain  sufficient  metadata  (citation,  subject  headings,  abstracts,  etc.)   to  help  users  identify  and  evaluate  results?   Participants  were  asked  to  rate  the  performance  of  each  discovery  service  in  terms  of  coverage   and  indexing  on  a  scale  of  1  to  3  (1  =  poor,  2  =  average,  3  =  good).  Although  results  varied  by   discipline,  one  product  received  the  highest  average  scores  in  both  areas.  In  their  observations,   participants  frequently  noted  that  it  appeared  to  have  better  coverage  and  produce  a  greater   variety  of  sources  while  results  from  the  other  two  products  tended  to  be  dominated  by  specific   source  types  like  newspapers  or  reference  books.  The  same  product  was  also  noted  to  have  more   complete  metadata  while  the  other  two  frequently  produced  results  that  lacked  additional   information  like  abstracts  and  subject  terms.     The  second  test  aimed  to  evaluate  the  usability  of  each  discovery  service.  Five  undergraduate   students  of  varying  grade  levels  and  areas  of  study  were  invited  to  participate  in  a  task-­‐based   usability  test  (see  appendix  G).  The  purpose  of  the  test  was  to  assess  users’  ability  to  use  these   products  to  complete  common  research  tasks  and  determine  which  product  best  meet  their  needs.   Students  were  asked  to  use  all  three  products  to  complete  five  tasks  while  sharing  their  thoughts   aloud.  For  the  purposes  of  testing,  products  were  referred  to  by  letters  (A,  B,  C)  rather  than  name.   Because  participants  were  asked  to  complete  the  same  tasks  using  each  product,  it  was  assumed   that  they  their  ability  to  complete  tasks  might  improve  as  the  test  progressed.  Accordingly,   product  order  was  randomized  to  minimize  potential  bias.  Each  session  lasted  approximately   forty-­‐five  minutes  and  included  a  pre-­‐test  questionnaire  to  collect  background  information  about   the  participant  as  well  as  a  post-­‐test  questionnaire  to  ascertain  their  opinions  on  the  products   being  tested.  Because  users  were  being  asked  to  test  three  different  products,  the  number  of  tasks   was  kept  to  a  minimum  and  focused  only  on  basic  product  functionality.  More  comprehensive   usability  testing  would  be  conducted  after  selection  to  help  guide  implementation  and  improve   the  selected  product.     Using  each  product,  participants  were  asked  to  find  three  relevant  sources  on  a  topic,  email  the   results  to  themselves,  and  attempt  to  obtain  full  text  for  at  least  one  item.  Although  the  team  noted   potential  problems  in  users’  interaction  with  all  of  the  products,  participants  had  slightly  higher   success  rates  with  one  product  over  all  others.  Furthermore,  in  the  post-­‐test  questionnaire,  four   out  of  five  users  stated  that  they  preferred  this  product  to  the  other  two,  noting  that  they  found  it   easier  to  navigate,  obtained  more  relevant  results,  and  had  notably  less  difficulty  accessing  full   text.  A  follow-­‐up  question  asked  participants  how  these  products  compared  with  the  search  tools   currently  offered  by  the  library.  Almost  all  participants  cited  disappointing  previous  experiences     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  JUNE  2015     33   with  library  databases  and  the  catalog  and  suggested  that  a  discovery  tool  might  make  finding   materials  easier.  However,  several  users  also  suggested  that  none  these  tools  were  “perfect.”  And,   while  these  discovery  services  may  have  the  “potential”  to  improve  their  library  experience,  all   could  use  a  good  deal  of  improvement,  particularly  with  returning  relevant  results.    Therefore  the  evaluation  team  embarked  on  a  third  and  final  test  of  its  top  three  discovery   candidates,  the  goal  of  which  was  to  evaluate  relevance  ranking.  While  usability  testing  is  helpful   for  highlighting  problems  with  the  design  of  an  interface,  it  is  not  always  the  best  method  for   assessing  the  quality  of  results.  In  user  testing,  students  frequently  retrieved  or  selected  results   that  were  not  relevant  to  the  topic.  It  was  not  always  clear  whether  this  outcome  was  attributable   to  a  flaw  in  product  design  or  to  the  users’  own  ability  to  construct  effective  search  queries  and   evaluate  results.  Determining  relevance  is  a  subjective  process  and  one  that  requires  a  certain   level  of  expertise  in  the  relevant  subject  area.  Therefore,  to  assess  relevance  ranking  among  the   competing  discovery  services,  the  evaluation  team  turned  once  again  to  its  library  subject   specialists.   Echoing  countless  other  user  studies,  our  testing  indicated  that  most  users  do  not  often  scroll   beyond  the  first  page  of  results.  Therefore  a  discovery  service  that  harvests  content  from  a  wide   variety  of  different  sources  must  have  an  effective  ranking  algorithm  capable  of  surfacing  the  most   useful  and  relevant  results.  To  evaluate  relevance  ranking,  subject  specialists  were  asked  to   construct  a  search  query  related  to  their  area  of  expertise,  perform  this  search  in  each  discovery   tool,  and  rate  the  relevancy  of  the  first  ten  results.  Results  were  recorded  in  the  exact  order   retrieved  and  ranked  on  a  scale  of  0–3  (0  =  not  relevant,  1  =  somewhat  relevant,  2  =  relevant,  3  =   very  relevant).     Two  values  were  used  to  evaluate  the  relevance-­‐ranking  algorithm  of  each  discovery  service.   Relevance  was  assessed  by  calculating  cumulative  gain,  or  the  sum  of  all  relevance  scores.  For   example,  if  the  first  ten  results  returned  by  a  discovery  product  received  a  score  of  3  because  they   were  all  deemed  to  be  “very  relevant,”  the  product  would  receive  a  cumulative  gain  score  of  30.   Ranking  was  assessed  by  calculating  discounted  cumulative  gain,  which  discounts  the  relevance   score  of  results  on  the  basis  of  where  they  appear  in  the  rankings.  Assuming  that  the  relevance  of   results  should  decrease  with  rank,  each  result  after  the  first  was  associated  with  a  discount  factor   of  1/log2i    (where  i  =  rank).  The  relevance  for  each  result  is  multiplied  by  the  discount  factor  to   provide  the  discount  gain.  For  example,  a  result  with  a  relevance  score  of  3  but  a  rank  of  4  is   discounted  through  this  process  to  a  relevance  score  of  1.5.  Discounted  cumulative  gain   represents  the  sum  of  all  discount  gain  scores.38   Eighteen  librarians  conducted  a  total  of  twenty-­‐six  searches.  Using  a  Microsoft  Excel  worksheet,   participants  were  asked  to  record  their  search  query,  the  titles  of  the  first  ten  results,  and  the   relevance  score  of  each  result  (see  appendix  H).  Formulas  for  cumulative  gain  and  discount   cumulative  gain  were  embedded  in  the  worksheet  so  these  values  were  automatically  calculated.   After  all  the  values  were  calculated,  one  product  once  again  had  outperformed  all  others.  In  the     EVALUATING  WEB-­‐SCALE  DISCOVERY  SERVICES:  A  STEP-­‐BY-­‐STEP  GUIDE  |  DEODATO   doi:  10.6017/ital.v34i2.5745   34   majority  of  searches  conducted,  librarians  rated  its  results  as  being  more  relevant  than  its   competitors.  However,  librarians  were  quick  to  point  out  that  they  were  not  entirely  satisfied  with   the  results  from  any  of  the  three  products.  In  their  observations,  they  noted  many  of  the  same   issues  that  were  raised  in  previous  rounds  of  testing  such  as  incomplete  metadata,  duplicate   results,  and  overrepresentation  of  certain  types  of  content.     At  the  end  of  the  trial  period,  the  evaluation  team  once  again  invited  feedback  from  the  library   staff.  An  online  library-­‐wide  survey  was  distributed  in  which  staff  members  were  asked  to  rank   each  discovery  product  according  to  several  key  requirements  drawn  from  the  team’s  evaluation   rubric.  Each  requirement  was  accompanied  by  one  or  more  questions  for  participants  to  consider   in  their  evaluation.  The  final  question  asked  participants  to  rank  the  three  candidates  in  order  of   preference.  Links  to  the  trial  implementations  of  all  three  products  were  included  in  the  survey.   Included  in  the  email  announcement  was  also  a  link  to  the  team’s  website  where  participants   could  find  more  information  about  web-­‐scale  discovery.  Because  participating  in  the  survey   required  staff  to  review  and  interact  with  all  three  products,  the  team  estimated  that  it  would  take   forty-­‐five  minutes  to  an  hour  to  complete  (depending  on  the  staff  member’s  familiarity  with  the   products).  Given  the  amount  of  time  and  effort  required  for  participation,  relevant  committees   were  also  encouraged  to  review  the  trials  and  submit  their  evaluation  as  a  group.  Response  rate   for  the  survey  was  much  lower  than  expected,  possibly  because  of  the  amount  of  effort  involved  or   because  a  large  number  of  staff  did  not  feel  qualified  to  comment  on  certain  aspects  of  the   evaluation.  However,  among  the  staff  members  that  did  respond,  one  product  was  rated  more   highly  than  all  others.  Coincidentally,  it  was  also  the  same  product  that  had  received  the  highest   scores  in  all  three  rounds  of  testing.   Make  Final  Recommendation   At  this  stage  in  the  process,  your  evaluation  team  should  have  collected  enough  data  to  make  an   informed  selection  decision.  Your  decision  should  take  into  consideration  all  of  the  information   gathered  throughout  the  evaluation  process,  including  user  and  product  research,  vendor   demonstrations,  RFP  responses,  customer  references,  staff  and  user  feedback,  trials,  and  product   testing.  In  preparation  for  the  evaluation  team’s  final  meeting,  each  sub  team  was  asked  to  revisit   the  evaluation  rubric.  Using  all  of  the  information  that  had  been  collected  and  made  available  on   the  team’s  website,  each  sub  team  was  asked  to  score  the  remaining  three  candidates  based  on   how  well  they  met  the  requirements  in  their  assigned  category  and  submit  a  report  explaining  the   rationale  for  their  scores.  At  the  final  meeting,  a  representative  from  each  sub  team  presented   their  report  to  the  larger  group.  The  entire  team  reviewed  the  scores  awarded  to  each  product.   Once  a  consensus  was  reached  on  the  scoring,  the  final  results  were  tabulated  and  the  product   that  received  the  highest  total  score  was  selected.     Once  the  evaluation  team  has  reached  a  conclusion,  its  decision  needs  to  be  communicated  to   library  stakeholders.  The  team’s  findings  should  be  compiled  in  a  final  report  that  includes  a  brief   introduction  to  the  subject  of  web-­‐scale  discovery,  the  factors  motivating  the  library’s  decision  to     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  JUNE  2015     35   acquire  a  discovery  service,  an  overview  of  the  methods  that  were  used  evaluate  these  services,   and  a  summary  of  the  team’s  final  recommendation.  Of  course,  considering  that  few  people  in  your   organization  may  ever  actually  read  the  report,  the  team  should  seek  out  additional  opportunities   to  present  its  findings  to  the  community.  The  Rutgers  evaluation  team  presented  its   recommendation  report  on  three  different  occasions.  The  first  was  joint  meeting  of  the  library’s   two  major  governing  councils.  After  securing  the  support  of  the  councils,  the  group’s   recommendation  was  presented  at  a  meeting  of  library  administrators  for  final  approval.  Once   approved,  a  third  and  final  presentation  was  given  at  an  all-­‐staff  meeting  and  included  a   demonstration  of  the  selected  product.  By  taking  special  care  to  openly  communicate  the  team’s   decision  and  making  transparent  the  process  used  to  reach  it,  the  evaluation  team  not  only   demonstrated  the  depth  of  its  research  but  also  was  able  to  secure  organizational  buy-­‐in  and   support  for  its  recommendation.   CONCLUSION   Selecting  a  web-­‐scale  discovery  service  is  a  large  and  important  undertaking  that  involves  a   significant  investment  of  time,  staff,  and  resources.  Finding  the  right  match  begins  with  a  thorough   and  carefully  planned  evaluation  process.  The  evaluation  process  outlined  here  is  intended  as  a   blueprint  that  similar  institutions  may  wish  to  follow.  However,  every  library  has  different  needs,   means,  and  goals.  While  this  process  served  Rutgers  well,  certain  elements  may  not  be  applicable   to  your  institution.  Regardless  of  what  method  your  library  chooses,  it  should  strive  to  create  an   evaluation  process  that  is  inclusive,  goal-­‐oriented,  data-­‐driven,  user-­‐centered,  and  transparent.   Inclusive   Web-­‐scale  discovery  impacts  a  wide  variety  of  library  services  and  functions.  Therefore  a   complete  and  informed  evaluation  requires  the  participation  and  expertise  of  a  broad  cross   section  of  library  units.  Furthermore,  as  with  the  adoption  of  any  new  technology,  the   implementation  of  a  web-­‐scale  discovery  service  can  be  potentially  disruptive.  These  products   introduce  significant  and  sometimes  controversial  changes  to  staff  workflows,  user  behavior,  and   library  usage.  Ensuring  broad  involvement  in  the  evaluation  process  can  help  allay  potential   concerns,  reduce  tensions,  and  ensure  wider  adoption.   Goal-­‐Oriented     It  can  be  easy  to  be  seduced  by  new  technologies  simply  because  they  are  new.  But  merely   adopting  these  technologies  without  taking  to  the  time  to  reflect  on  and  communicate  their   purpose  and  goals  can  be  a  recipe  for  disaster.  To  select  the  best  discovery  tool  for  your  library,   evaluators  must  have  a  clear  understanding  of  the  problems  it  is  trying  to  solve,  the  audience  it   seeks  to  serve,  and  the  role  it  plays  within  the  library’s  larger  mission.  Articulating  the  library’s   vision  and  goals  for  web-­‐scale  discovery  is  crucial  for  establishing  an  evaluation  plan,  developing  a   prioritized  list  of  product  requirements,  understanding  what  questions  to  ask  vendors,  and  setting   benchmarks  by  which  to  evaluate  performance.     EVALUATING  WEB-­‐SCALE  DISCOVERY  SERVICES:  A  STEP-­‐BY-­‐STEP  GUIDE  |  DEODATO   doi:  10.6017/ital.v34i2.5745   36   Data-­‐Driven   To  ensure  an  informed,  fair,  and  impartial  evaluation,  evaluators  should  strive  to  incorporate   data-­‐driven  practices  into  all  of  their  decision-­‐making.  Many  library  stakeholders,  including   members  of  the  evaluation  team,  may  enter  the  evaluation  process  with  preexisting  views  on  web-­‐ scale  discovery,  untested  assumptions  about  user  behavior,  or  strong  opinions  about  specific   products  and  vendors.  To  minimize  the  influence  of  these  potential  biases  on  the  selection  process,   it  is  important  that  the  team  be  able  to  demonstrate  the  rationale  for  its  decisions  through   verifiable  data.  Evaluating  web-­‐scale  discovery  services  requires  extensive  research  and  should   include  data  collected  through  user  research,  staff  surveys,  collections  analysis,  and  product   testing.  All  of  this  data  should  be  carefully  collected,  analyzed,  and  used  to  inform  the  team’s  final   recommendation.     User-­‐Centered   If  the  purpose  of  adopting  a  web-­‐scale  discovery  service  is  to  better  serve  your  users,  then  you   should  try  as  much  as  possible  to  involve  users  in  the  evaluation  and  selection  process.  This   means  including  users  on  the  evaluation  team,  grounding  product  requirements  in  user  research,   and  gathering  user  feedback  through  surveys,  focus  groups,  and  product  testing.  This  last  step  is   especially  important.  No  other  piece  of  information  gathered  throughout  the  evaluation  process   will  be  as  helpful  or  revealing  as  actually  watching  users  use  these  products  to  complete  real-­‐life   research  tasks.  User  testing  is  the  best  and,  frankly,  only  way  to  validate  claims  from  both  vendors   and  librarians  about  what  your  users  want  and  need  from  your  library’s  search  environment.     Transparent   Because  web-­‐scale  discovery  impacts  library  staff  and  users  in  significant  ways,  its  reception   within  academic  libraries  has  been  somewhat  mixed.  As  previously  mentioned,  securing  staff  buy-­‐ in  is  often  one  of  the  most  difficult  obstacles  libraries  face  when  introducing  a  new  web-­‐scale   discovery  service.  While  encouraging  broad  participation  in  the  evaluation  process  helps  facilitate   buy-­‐in,  not  every  library  stakeholder  will  be  able  to  participate.  Therefore  it  is  important  that  the   evaluation  team  make  special  effort  to  communicate  its  work  and  keep  the  library  community   updated  on  its  progress.  This  can  be  done  by  creating  a  staff  website  or  blog  devoted  to  the   evaluation  process,  sending  periodic  updates  via  the  library’s  electronic  discussion  list,  holding   public  forums  and  demonstrations,  regularly  soliciting  staff  feedback  through  surveys  and  polls,   and  widely  distributing  the  team’s  findings  and  final  report.  These  communications  should  help   secure  organizational  support  by  making  clear  that  the  team  recommendations  are  based  on  a   thorough  evaluation  that  is  inclusive,  goal-­‐oriented,  data-­‐driven,  user-­‐centered,  and  transparent.         INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  JUNE  2015     37   Appendix  A.  Overview  of  Web-­‐Scale  Discovery  Evaluation  Plan                                                     Form an evaluation team Create an evaluation team representing a broad cross section of library units. Draft a charge outlining the library’s goals for web-scale discovery and the team’s responsibilities, timetable, reporting structure, and membership. 1 Educate library stakeholders Create a staff website or blog to disseminate information about web-scale discovery and the evaluation process. Host workshops and public forums to educate staff, share information, and maximize community participation. 2 Schedule vendor demonstrations Invite vendors for onsite product demonstrations. Schedule visits in close proximity and provide vendors with an outline or list of questions in advance. Invite all members of the library community to attend and offer feedback. 3 Create an evaluation rubric Create a comprehensive, prioritized list of product requirements rooted in staff and user needs. Develop a fair and consistent scoring method for determining how each product meets these requirements. 4 Draft the RFP If required, draft an RFP to solicit bids from vendors. Include information about your library, a summary of your product requirements and evaluation criteria, and any terms or conditions of the bidding process. 5 Interview current customers Obtain candid assessments of each product by interviewing current customers. Ask customers to share their experiences and offer assessments on factors such as coverage, design, functionality, customizability, and vendor support. 6 Configure and test local trials After narrowing down the options, select the top candidates for a trial evaluation. Test the products with users and staff to evaluate and compare coverage, functionality, and result quality. 7 Make final recommendation Make an informed recommendation based on all of the information collected. Compile the results of your research in a final report and communicate the team’s findings to the library community. 8   EVALUATING  WEB-­‐SCALE  DISCOVERY  SERVICES:  A  STEP-­‐BY-­‐STEP  GUIDE  |  DEODATO   doi:  10.6017/ital.v34i2.5745   38   Appendix  B.  Product  Requirements  for  a  Web-­‐Scale  Discovery  Service   #   Requirement   Description   Questions  to  Consider   1   Content           1.1   Scope   Provides  access  to  the  broadest   possible  spectrum  of  library   content  including  books,   periodicals,  audiovisual   materials,  institutional   repository  items,  digital   collections,  and  open  access   content   With  how  many  publishers  and   aggregators  does  the  vendor  have   license  agreements?  Are  there  any   notable  exclusions?  How  many   total  unique  items  are  included  in   the  central  index?  How  many  open   access  resources  are  included?   What  percentage  of  content  is   mutually  licensed?  What  is  the   approximate  disciplinary,  format,   and  date  breakdown  of  the  central   index?  What  types  of  local  content   can  be  ingested  into  the  index  (ILS   records,  institutional  repository   items,  digital  collections,  research   guides,  webpages,  etc.)?  Can  the   library  customize  what  content  is   exposed  to  its  users?   1.2   Depth   Provides  the  richest  possible   metadata  for  all  indexed  items,   including  citations,  descriptors,   abstracts,  and  full  text   What  level  of  indexing  is  provided?   What  percentage  of  items  contains   only  citations?  What  percentage   includes  abstracts?  What   percentage  includes  full  text?   1.3   Currency   Provides  regular  and  timely   updates  of  licensed  content  as   well  as  on-­‐demand  updates  of   local  content     How  frequently  is  the  central  index   updated?  How  frequently  are  local   records  ingested?  Can  the  library   initiate  a  manual  harvest  of  local   records?  Can  the  library  initiate  a   manual  harvest  of  a  specific  subset   of  local  records?     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  JUNE  2015     39   1.4   Data  quality   Provides  clear  and  consistent   indexing  of  records  from  a   variety  of  different  sources  and   in  a  variety  of  different  formats     What  record  formats  are   supported?  What  metadata  fields   are  required  for  indexing?  How  is   metadata  from  different  sources   normalized  into  a  universal   metadata  schema?  How  are   controlled  vocabularies  created?  To   what  degree  can  collections  from   different  sources  have  their  own   unique  field  information  displayed   and/or  calculated  into  the   relevancy-­‐ranking  algorithm  for   retrieval  purposes?   1.5   Language   Supports  indexing  and   searching  of  foreign-­‐language   materials  using  non-­‐Roman   characters   Does  the  product  support  indexing   and  searching  of  foreign-­‐language   materials  using  non-­‐Roman   characters?  What  languages  and   character  sets  are  supported?   1.6   Federated   searching   Supports  incorporation  of   content  not  included  in  the   central  index  via  federated   searching   Does  the  vendor  offer  federated   searching  of  sources  not  included   in  the  central  index?  How  are  these   sources  integrated  into  search   results?  Is  there  an  additional  cost   for  adding  connectors  to  these   sources?   1.7   Unlicensed  content   Includes  and  makes   discoverable  additional  content   not  owned  or  licensed  by  the   library   Are  local  collections  from  other   libraries  using  the  discovery   service  exposed  to  all  customers?   Are  users  able  to  search  content   that  is  included  in  the  central  index   but  not  licensed  or  owned  by  the   host  library?     2   Functionality           2.1   Smart  searching   Provides  “smart”  search   features  such  as  autocomplete,   autocorrect,  autostemming,   thesaurus  matching,  stop-­‐word   filtering,  keyword  highlighting,   etc.   What  “smart”  features  are  included   in  the  search  engine?  Are  these   features  customizable?  Can  they  be   enabled  or  disabled  by  the  library?     EVALUATING  WEB-­‐SCALE  DISCOVERY  SERVICES:  A  STEP-­‐BY-­‐STEP  GUIDE  |  DEODATO   doi:  10.6017/ital.v34i2.5745   40   2.2   Advanced  searching   Provides  advanced  search   options  such  as  field  searching,   Boolean  operators,  proximity   searching,  nesting,   wildcard/truncation,  etc.   What  types  of  advanced  search   options  are  available?  Are  these   options  customizable?  Can  they  be   enabled  or  disabled  by  the  library?   2.3   Search  limits   Provides  limits  for  refining   search  results  according  to   specified  criteria  such  as  peer-­‐ review  status,  full-­‐text   availability,  or  location   Does  the  product  include   appropriate  limits  for  filtering   search  results?     2.4   Faceted  browsing   Allows  users  to  browse  the   index  by  facets  such  as  format,   author,  subject,  region,  era,  etc.   What  types  of  facets  are  available   for  browsing?  Can  users  select   multiple  facets  in  different   categories?  Are  facets  easy  to  add   or  remove  from  a  search?  Are  facet   categories,  labels,  and  ordering   customizable?  Can  facets  be   customized  by  format  or  material   type  (e.g.,  music,  film,  etc.)?   2.5   Scoped  searching   Provides  discipline-­‐,  format-­‐,  or   location-­‐specific  search  options   that  allow  searches  to  be   limited  to  a  set  of  predefined   resources  or  criteria   Can  the  library  construct  scoped   search  portals  for  specific  campus   libraries,  disciplines,  or  formats?   Can  these  portals  be  customized   with  different  search  options,   facets,  relevancy  ranking,  or  record   displays?   2.6   Visual  searching   Provides  visual  search  and   browse  options  such  as  tag   clouds,  cluster  maps,  virtual   shelf  browsing,  geo-­‐browsing,   etc.   Does  the  product  provide  any   options  for  visualizing  search   results  beyond  text-­‐based  lists?  Can   data  visualization  tools  be   integrated  into  search  result   display  with  additional   programming?   2.7   Relevancy  ranking   Provides  useful  results  using  an   effective  and  locally   customizable  relevancy  ranking   algorithm   What  criteria  are  used  to  determine   relevancy  (term  frequency  and   placement,  format,  document   length,  publication  date,  user   behavior,  scholarly  value,  etc.)?   How  does  it  rank  items  with   varying  levels  of  metadata  (e.g.,   citation  only  vs.  citation  +  full  text)?   Is  relevancy  ranking  customizable     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  JUNE  2015     41   by  the  library?  By  the  user?     2.8   Deduplication   Has  an  effective  method  for   identifying  and  managing   duplicate  records  within  results   Does  the  product  employ  an   effective  method  of  deduplication?   2.9   Record  grouping   Groups  different  manifestations   of  the  same  work  together  in  a   single  record  or  cluster   Does  the  product  employ  FRBR  or   some  similar  method  to  group   multiple  manifestations  of  the  same   work?   2.10   Result  sorting   Provides  alternative  options   for  sorting  results  by  criteria   such  as  date,  title,  author,  call   number,  etc.   What  options  does  the  product   offer  for  sorting  results?   2.11   Item  holdings   Provides  real-­‐time  local   holdings  and  availability   information  within  search   results   How  does  the  product  provide  local   holdings  and  availability   information?  Is  this  information   displayed  in  real-­‐time?  Is  this   information  displayed  on  the   results  screen  or  only  within  the   item  record?   2.12   OpenURL   Supports  openURL  linking  to   facilitate  seamless  access  from   search  results  to  electronic  full   text  and  related  services   How  does  the  product  provide   access  to  the  library’s  licensed  full-­‐ text  content?  Are  openURL  links   displayed  on  the  results  screen  or   only  in  the  item  record?     2.13   Native  record   linking   Provides  direct  links  to  original   records  in  their  native  source   Does  the  product  offer  direct  links   to  original  records  allowing  users   to  easily  navigate  from  the   discovery  service  to  the  record   source,  whether  it  is  a  subscription   database,  the  library  catalog,  or  the   institutional  repository?   2.14   Output  options   Provides  useful  output  options   such  as  print,  email,  text,  cite,   export,  etc.   What  output  options  does  the   product  offer?  What  citation   formats  are  supported?  Which   citation  managers  are  supported?   Are  export  options  customizable?     EVALUATING  WEB-­‐SCALE  DISCOVERY  SERVICES:  A  STEP-­‐BY-­‐STEP  GUIDE  |  DEODATO   doi:  10.6017/ital.v34i2.5745   42   2.15   Personalization   Provides  personalization   features  that  allow  users  to   customize  preferences,  save   results,  bookmark  items,  create   lists,  etc.   What  personalization  features  does   the  product  offer?  Are  these   features  linked  to  a  personal   account  or  only  session-­‐based?   Must  users  create  their  own   accounts  or  can  accounts  be   automatically  linked  to  their   institutional  ID?   2.16   Recommendations   Provides  recommendations  to   help  users  locate  similar  items   or  related  resources   Does  the  product  provide  item   recommendations  to  help  users   locate  similar  items?  Does  the   product  provide  database   recommendations  to  help  users   identify  specialized  databases   related  to  their  topic?   2.17   Account   management   Allows  users  to  access  their   library  account  for  activities   such  as  renewing  loans,  placing   holds  and  requests,  paying   fines,  viewing  borrowing   history,  etc.   Can  the  product  be  integrated  with   the  library’s  ILS  to  provide   seamless  access  to  user  account   management  functions?  Does  the   vendor  provide  any  drivers  or   technical  support  for  this  purpose?   2.18   Guest  access   Allows  users  to  search  and   retrieve  records  without   requiring  authentication   Does  the  vendor  allow  for  “guest   access”  to  the  service?  Are  users   required  to  authenticate  to  search   or  only  when  requesting  access  to   licensed  content?   2.19   Context-­‐sensitive   services   Interacts  with  university   identity  and  course-­‐ management  systems  to  deliver   customized  services  on  the   basis  of  user  status  and   affiliation   Can  the  product  be  configured  to   interact  with  university  identity   and  course-­‐management  systems   to  deliver  customized  services  on   the  basis  of  user  status  and   affiliation?  Does  the  vendor   provide  any  drivers  or  technical   support  for  this  purpose?   2.20   Context-­‐sensitive   delivery  options   Displays  context  sensitive   delivery  options  based  on  the   item’s  format,  status,  and   availability   Can  the  product  be  configured  to   interact  with  the  library’s  ILL  and   consortium  borrowing  services  to   display  context-­‐sensitive  delivery   options  for  unavailable  local   holdings?  Does  the  vendor  provide   any  drivers  or  technical  support  for   this  purpose?     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  JUNE  2015     43       2.21   Location  mapping   Supports  dynamic  library   mapping  to  help  users   physically  locate  items  on  the   shelf   Can  the  product  be  configured  to   support  location  mapping  by   linking  the  call  numbers  of  physical   items  to  online  library  maps?  What   additional  programming  is   required?   2.22   Custom  widgets   Supports  the  integration  of   custom  library  widgets  such  as   live  chat   Can  the  library’s  chat  service  be   embedded  into  the  interface  to   provide  live  user  support?  Where   can  it  be  embedded?  Search  page?   Result  screen?     2.23   Featured  items   Highlights  new,  featured,  or   popular  items  such  as  recent   acquisitions,  recreational   reading,  or  heavily  borrowed  or   downloaded  items   Can  the  product  be  configured  to   dynamically  highlight  specific  items   or  collections  in  the  library?     2.24   Alerts   Provides  customizable  alerts  or   RSS  feeds  to  inform  users  about   new  items  related  to  their   research  or  area  of  study   Does  the  product  offer   customizable  alerts  or  RSS  feeds?   2.25   User-­‐submitted   content   Supports  user-­‐submitted   content  such  as  tags,  ratings,   comments,  and  reviews   What  types  of  user-­‐submitted   content  does  the  product  support?   Is  this  content  only  available  to  the   host  library  or  is  it  shared  among   all  subscribers  of  the  service?  Can   these  features  be  optionally   enabled  or  disabled?     2.26   Social  media   integration   Allows  users  to  seamlessly   share  items  via  social  media   such  as  Facebook,  Twitter,   Delicious,  etc.   What  types  of  social  media  sharing   does  the  product  support?  Can   these  features  be  enabled  or   disabled?       EVALUATING  WEB-­‐SCALE  DISCOVERY  SERVICES:  A  STEP-­‐BY-­‐STEP  GUIDE  |  DEODATO   doi:  10.6017/ital.v34i2.5745   44   3   Usability           3.1   Design   Provides  a  modern,   aesthetically  appealing  design   that  is  locally  customizable   Does  the  product  have  a  modern,   aesthetically  pleasing  design?  Is  it   easy  to  locate  all  important   elements  of  the  interface?  Are   colors,  graphics,  and  spacing  used   effectively  to  organize  content?   What  aspects  of  the  interface  are   locally  customizable  (color  scheme,   branding,  navigation  menus,  result   display,  item  records,  etc.)?  Can  the   library  apply  its  own  custom   stylesheets  or  is  customization   limited  to  a  set  or  predefined   options?   3.2   Navigation   Provides  an  interface  that  is   easy  to  use  and  navigate  with   little  or  no  specialized   knowledge     Is  the  interface  intuitive  and  easy  to   navigate?  Does  it  use  familiar   navigational  elements  and  intuitive   icons  and  labels?  Are  links  clearly   and  consistently  labeled?  Do  they   allow  the  user  to  easily  move  from   page  to  page  (forward  and  back)?   Do  they  take  the  user  where  he  or   she  expects  to  go?   3.3   Accessibility     Meets  ADA  and  Section  508   accessibility  requirements   Does  the  product  meet  ADA  and   Section  508  accessibility   requirements?   3.4   Internationalization   Provides  translations  of  the   user  interface  in  multiple   languages   Does  the  vendor  offer  translations   of  the  interface  in  multiple   languages?  Which  languages  are   supported?  Does  this  include   translations  of  customized  text?   3.5   Help   Provides  user  help  screens  that   are  thorough,  easy  to   understand,  context-­‐sensitive,   and  customizable   Are  product  help  screens  thorough,   easy  to  navigate,  and  easy  to   understand?  Are  help  screens   general  or  context-­‐sensitive  (i.e.,   relevant  to  the  user’s  current   location  within  the  system)?  Are   help  screens  customizable?       INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  JUNE  2015     45   3.6   Record  display   Provides  multiple  record   displays  with  varying  levels  of   information  (e.g.,  preview,  brief   view,  full  view,  staff  view,  etc.)   Are  record  displays  well  organized   and  easily  scannable?  Does  the   product  offer  multiple  record   displays  with  varying  levels  of   information?  What  types  of  record   displays  are  available?  Can  record   displays  be  customized  by  item   type  or  search  portal?   3.7   Enriched  content   Supports  integration  of   enriched  content  from  third-­‐ party  providers  such  as  cover   images,  table  of  contents,   author  biographies,  reviews,   excerpts,  journal  rankings,   citation  counts,  etc.   What  types  of  enriched  content   does  the  vendor  provide  or   support?  Is  there  an  additional  cost   for  this  content?   3.8   Format  icons   Provides  intuitive  icons  to   indicate  the  format  of  items   within  search  results   Does  the  product  provide  any  icons   or  visual  cues  to  help  users  easily   recognize  the  formats  of  the  variety   of  items  displayed  in  search   results?  Is  this  information   displayed  on  the  results  screen  or   only  within  the  item  record?  How   does  the  product  define  formats?   Are  these  definitions  customizable?   3.9   Persistent  URLs   Provides  short,  persistent  links   to  item  records,  search  queries,   and  browse  categories   Does  the  product  offer  persistent   links  to  item  records?  What  about   persistent  links  to  canned  searches   and  browse  categories?  Are  these   links  sufficiently  short  and  user-­‐ friendly?   4   Administration           4.1   Cost   Is  offered  at  a  price  that  is   within  the  library’s  budget  and   proportional  to  the  value  of  the   service   How  is  product  pricing  calculated?   What  is  the  total  cost  of  the  service   including  initial  upfront  costs  and   ongoing  costs  for  subscription  and   technical  support?  What  additional   costs  would  be  incurred  for  add-­‐on   services  (e.g.,  federated  search,   recommender  services,  enriched   content,  customer  support,  etc.)?   4.2   Implementation   Is  capable  of  being   implemented  within  the   What  is  the  estimated  timeframe   for  implementation,  including     EVALUATING  WEB-­‐SCALE  DISCOVERY  SERVICES:  A  STEP-­‐BY-­‐STEP  GUIDE  |  DEODATO   doi:  10.6017/ital.v34i2.5745   46       library’s  designated  timeframe   loading  of  local  records  and   configuration  and  customization  of   the  platform?   4.3   User  community   Is  widely  used  and  respected   among  the  library’s  peer   institutions   How  many  subscribers  does  the   product  have?  What  percentage  of   subscribers  are  college  or   university  libraries?  How  do   current  subscribers  view  the   service?   4.4   Support     Is  supported  by  high-­‐quality   customer  service,  training,  and   product  documentation   Does  the  vendor  provide  adequate   support,  training,  and  help   documentation?  What  forms  of   customer  support  are  offered?  How   adequate  is  the  vendor’s   documentation  regarding  content   agreements,  metadata  schema,   ranking  algorithms,  APIs,  etc.?  Does   the  vendor  provide  on-­‐site  and   online  training?  Is  there  any   additional  cost  associated  with   training?   4.5   Administrative   tools   Is  supported  by  a  robust,  easy-­‐ to-­‐use  administrative  interface   and  customization  tools   Does  the  product  have  an  easy  to   use  administrative  interface?  Does   it  support  multiple  administrator   logins  and  roles?  What  tools  are   provided  for  product  customization   and  administering  access  control?   4.6   Statistics  reporting   Includes  a  robust  statistical   reporting  modules  for   monitoring  and  analyzing   product  usage     Does  the  vendor  offer  a  means  of   capturing  and  reporting  system   and  usage  statistics?  What  kinds  of   data  are  included  in  such  reports?   In  what  formats  are  these  reports   available?  Is  the  data  exportable?     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  JUNE  2015     47       5   Technology           5.1   Development     Is  a  sufficiently  mature  product   supported  by  a  stable  codebase   and  progressive  development   cycle   Is  the  product  sufficiently  mature   and  supported  by  a  stable   codebase?  Is  development   informed  by  a  dedicated  user’s   advisory  group?  How  frequently   are  improvements  and   enhancements  made  to  the  service?   Is  there  a  formal  mechanism  by   which  customers  can  suggest,  rank,   and  monitor  the  status  of   enhancement  requests?  What   major  enhancements  are  planned   for  the  next  3–5  years?   5.2   Authentication   Is  compatible  with  the  library’s   authentication  protocols     Does  the  product  allow  for  IP-­‐ authentication  for  on-­‐site  users  and   proxy  access  for  remote  users?   What  authentication  methods  are   supported  (e.g.,  LDAP,  CAS,   Shibboleth,  etc.)?   5.3   Browser   compatibility   Is  compatible  with  all  major   web  browsers   What  browsers  does  the  vendor   currently  support?   5.4   Mobile  access   Is  accessible  on  mobile  devices   Is  the  product  accessible  on  mobile   devices  via  a  mobile  optimized  web   interface  or  app?  Does  the  mobile   version  include  the  same  features   and  functionality  of  the  desktop   version?     5.5   Portability   Can  be  embedded  in  external   platforms  such  as  library   research  guides,  course   management  systems,  or   university  portals   Can  custom  search  boxes  be   created  and  embedded  in  external   platforms  such  as  library  research   guides,  course  management   systems,  or  university  portals?     EVALUATING  WEB-­‐SCALE  DISCOVERY  SERVICES:  A  STEP-­‐BY-­‐STEP  GUIDE  |  DEODATO   doi:  10.6017/ital.v34i2.5745   48           5.6   Interoperability   Includes  a  robust  API  and  is   interoperable  with  other   major  library  systems  such  as   the  ILS,  ILL,  proxy  server,  link   resolver,  institutional   repository,  etc.     Is  the  product  interoperable   with  other  major  library  systems   such  as  the  ILS,  ILL,  proxy  server,   link  resolver,  institutional   repository,  etc.?  Does  the  vendor   offer  a  robust  API  that  can  be   used  to  extract  data  from  the   central  index  or  pair  it  with  a   different  interface?  What  types   of  data  can  be  extracted  with  the   API?   5.7   Consortia  support   Supports  multiple  product   instances  or  configurations  for   a  multilibrary  environment   Can  the  technology  support   multiple  institutions  on  the  same   installation,  each  with  its  own   unique  instance  and  configuration   of  the  product?  Is  there  an   additional  cost  for  this  service?     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  JUNE  2015     49   Appendix  C.  Sample  Web-­‐Scale  Discovery  Evaluation  Rubric   Category   Functionality   Product   Product  A     Requirement   Weight   Score   Points     Notes   2.1  Smart  searching           2.2  Advanced   searching           2.3  Search  limits           2.4  Faceted  browsing           2.5  Scoped  searching           2.6  Visual  searching           2.7  Relevancy  ranking           2.8  Deduplication           2.9  Record  grouping           2.10  Result  sorting           2.11  Item  holdings           2.12  OpenURL           2.13  Native  record   linking           2.14  Output  options           2.11  Item  holdings                 Weight  Scale   1  =  Optional   2  =  Desired   3  =  Mandatory   Scoring  Scale   0  =  Does  not  meet   1  =  Barely  meets   2  =  Partially  meets   3  =  Fully  meets   Points  =  Weight  ×  Score   Explanation  and   rationale  for  score     EVALUATING  WEB-­‐SCALE  DISCOVERY  SERVICES:  A  STEP-­‐BY-­‐STEP  GUIDE  |  DEODATO   doi:  10.6017/ital.v34i2.5745   50   Appendix  D.  Web-­‐Scale  Discovery  Vendor  Questionnaire     1.  Content       1.1  Scope   With  how  many  content  publishers  and  aggregators  have  you  forged  content  agreements?   Are  there  any  publishers  or  aggregators  with  whom  you  have  exclusive  agreements  that  prohibit   or  limit  them  from  making  their  content  available  to  competing  discovery  vendors?  If  so,  which   ones?   Does  your  central  index  exclude  any  of  the  publishers  and  aggregators  listed  in  appendix  Y  [not   reproduced  here]?  If  so,  which  ones?   How  many  total  unique  items  are  included  in  your  central  index?     What  is  the  approximate  disciplinary  breakdown  of  the  central  index?  What  percentage  of  content   pertains  to  subjects  in  the  humanities?  What  percentage  in  the  sciences?  What  percentage  in  the   social  sciences?   What  is  the  approximate  format  breakdown  of  the  central  index?  What  percentage  of  content   derives  from  scholarly  journals?  What  percentage  derives  from  magazines,  newspapers,  and  trade   publications?  What  percentage  derives  from  conference  proceedings?  What  percentage  derives   from  monographs?  What  percentage  derives  from  other  publications?   What  is  the  publication  date  range  of  the  central  index?  What  is  the  bulk  publication  date  range   (i.e.,  the  date  range  in  which  the  majority  of  content  was  published)?   Does  your  index  include  content  from  open  access  repositories  such  as  DOAJ,  HathiTrust,  and   arXiv?  If  so,  which  ones?   Does  your  index  include  OCLC  WorldCat  catalog  records?  If  so,  do  these  records  include  holdings   information?   What  types  of  local  content  can  be  ingested  into  the  index  (e.g.,  library  catalog  records,   institutional  repository  items,  digital  collections,  research  guides,  library  web  pages,  etc.)?   Can  your  service  host  or  provide  access  to  items  within  a  consortia  or  shared  catalog  like  the   Pennsylvania  Academic  Library  Consortium  (PALCI)  or  Committee  on  Institutional  Cooperation   (CIC)?   Are  local  collections  (ILS  records,  digital  collections,  institutional  repositories,  etc.)  from  libraries   that  use  your  discovery  service  exposed  to  all  customers?     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  JUNE  2015     51   Can  the  library  customize  its  holdings  within  the  central  index?  Can  the  library  choose  what   content  to  expose  to  its  users?   1.2  Depth   What  level  of  indexing  do  you  typically  provide  in  your  central  index?    What  percentage  of  items   contains  only  citations?  What  percentage  includes  abstracts?  What  percentage  includes  full  text?     1.3  Currency   How  frequently  is  the  central  index  updated?   How  often  do  you  harvest  and  ingest  metadata  for  the  library’s  local  content?  How  long  does  it   typically  take  for  such  updates  to  appear  in  the  central  index?   Can  the  library  initiate  a  manual  harvest  of  local  records?  Can  the  library  initiate  a  manual  harvest   of  a  specific  subset  of  local  records?   1.4  Data  quality   With  what  metadata  schemas  (MARC,  METS,  MODS,  EAD,  etc.)  does  your  discovery  platform  work?     Do  you  currently  support  RDA  records?  If  not,  do  you  have  any  plans  to  do  so  in  the  near  future?   What  metadata  is  required  for  a  local  resource  to  be  indexed  and  discoverable  within  your   platform?   How  is  metadata  from  different  sources  normalized  into  a  universal  metadata  schema?     To  what  degree  can  collections  from  different  sources  have  their  own  unique  field  information   displayed  and/or  calculated  into  the  relevancy-­‐ranking  algorithm  for  retrieval  purposes?   Do  you  provide  authority  control?  How  are  controlled  vocabularies  for  subjects,  names,  and  titles   established?     1.5  Language   Does  your  product  support  indexing  and  searching  of  foreign  language  materials  using  non-­‐ Roman  characters?  What  languages  and  character  sets  are  supported?   1.6  Federated  searching     How  does  your  product  make  provisions  for  sources  not  included  in  your  central  index?  Is  it   possible  to  incorporate  these  sources  via  federated  search?  How  are  federated  search  results     EVALUATING  WEB-­‐SCALE  DISCOVERY  SERVICES:  A  STEP-­‐BY-­‐STEP  GUIDE  |  DEODATO   doi:  10.6017/ital.v34i2.5745   52   displayed  with  the  results  from  the  central  index?  Is  there  an  additional  cost  for  implementing   federated  search  connectors  to  these  resources?     1.7  Unlicensed  content   Are  end  users  able  to  search  content  that  is  included  in  your  central  index  but  not  licensed  or   owned  by  the  library?  If  so,  does  your  system  provide  a  locally  customizable  message  to  the  user   or  does  the  user  just  receive  the  publisher/aggregator  message  encouraging  them  to  purchase  the   article?  Can  the  library  opt  not  to  expose  content  it  does  not  license  to  its  users?     2.  Functionality     2.1  “Smart”  searching   Does  your  product  include  autocomplete  or  predictive  search  functionality?  How  are   autocomplete  predictions  populated?   Does  your  product  include  autocorrect  or  “did  you  mean  .  .  .  ”  suggestions  to  correct  misspelled   queries?  How  are  autocorrect  suggestions  populated?     Does  your  product  support  search  query  stemming  to  automatically  retrieve  search  terms  with   variant  endings  (e.g.,  car/cars)?   Does  your  product  support  thesaurus  matching  to  retrieve  synonyms  and  related  words  (e.g.,   car/automobile)?         Does  your  product  support  stop  word  filtering  to  automatically  remove  common  stop  words  (e.g.,   a,  an,  on,  from,  the,  etc.)  from  search  queries?   Does  your  product  support  search  term  highlighting  to  automatically  highlight  search  terms  found   within  results?     How  does  your  product  handle  zero  result  or  “dead  end”  searches?  Please  describe  what  happens   when  a  user  searches  for  an  item  that  is  not  included  in  the  central  index  or  the  library’s  local   holdings  but  may  be  available  through  interlibrary  loan.   Does  your  product  include  any  other  “smart”  search  features  that  you  think  enhance  the  usability   of  your  product?   Are  all  of  the  above  mentioned  search  features  customizable  by  the  library?  Can  they  be  optionally   enabled  or  disabled?     2.2  Advanced  searching     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  JUNE  2015     53   Does  your  product  support  Boolean  searching  that  allows  users  to  combine  search  terms  using   operators  such  as  AND,  OR,  and  NOT?     Does  your  product  support  fielded  searching  that  allows  users  to  search  for  terms  within  specific   metadata  fields  (e.g.,  title,  author,  subject,  etc.)?   Does  your  product  support  phrase  searching  that  allows  users  to  search  for  exact  phrases?   Does  your  product  support  proximity  searching  that  allows  users  to  search  for  terms  within  a   specified  distance  from  one  another?   Does  your  product  support  nested  searching  to  allow  users  to  specify  relationships  between   search  terms  and  determine  the  order  in  which  they  will  be  searched?   Does  your  product  support  wildcard  and  truncation  searching  that  allow  users  to  retrieve   variations  of  their  search  terms?   Does  your  product  include  any  other  advanced  search  features  that  you  think  enhance  the   usability  of  your  product?   Are  all  of  the  above  mentioned  search  features  customizable  by  the  library?  Can  they  be  optionally   enabled  or  disabled?   2.3  Search  limits   Does  your  product  offer  search  limits  for  limiting  results  according  to  predetermined  criteria  such   as  peer-­‐review  status  or  full  text  availability?   2.4  Faceted  browsing   Does  your  product  support  faceted  browsing  of  results  by  attributes  such  as  format,  author,   subject,  region,  era,  etc.?  If  so,  what  types  of  facets  are  available  for  browsing?     Is  faceted  browsing  possible  before  as  well  after  the  execution  of  a  search?     Can  users  select  multiple  facets  in  different  categories?     Are  facet  categories,  labels,  and  ordering  customizable  by  the  library?     Can  specialized  materials  be  assigned  different  facets  in  accordance  with  their  unique  attributes   (e.g.,  allowing  users  to  browse  music  materials  by  unique  attributes  such  as  medium  of   performance,  musical  key/range,  recording  format,  etc.)?     2.5  Scoped  searching     EVALUATING  WEB-­‐SCALE  DISCOVERY  SERVICES:  A  STEP-­‐BY-­‐STEP  GUIDE  |  DEODATO   doi:  10.6017/ital.v34i2.5745   54   Does  your  product  support  the  construction  of  multiple  scoped  search  portals  for  specific  campus   libraries,  disciplines  (medicine),  or  formats  (music/video)?     If  so,  what  aspects  of  these  search  portals  are  customizable  (branding,  search  options,  facets,   relevancy  ranking,  record  displays,  etc.)?   2.6  Visual  searching   Does  your  product  provide  any  options  for  visualizing  search  results  beyond  text-­‐based  lists,  such   as  cluster  maps,  tag  clouds,  image  carousels,  etc.?     2.7  Relevancy  ranking   Please  describe  your  relevancy  ranking  algorithm.  In  particular,  please  describe  what  criteria  are   used  to  determine  relevancy  (term  frequency/placement,  item  format/length,  publication  date,   user  behavior,  scholarly  value,  etc.)  and  how  is  each  weighted?   How  does  your  product  rank  items  with  varying  levels  of  metadata  (e.g.,  citation  only  vs.  citation,   abstract,  and  full  text)?     Is  relevancy  ranking  customizable  by  the  library?     Can  relevancy  ranking  be  customized  by  end  users?   2.8  Deduplication   How  does  your  product  identify  and  manage  duplicate  records?   2.9  Record  grouping   Does  your  product  employ  a  FRBR-­‐ized  method  to  group  different  manifestations  of  the  same   work?   2.10  Result  sorting   What  options  does  your  product  offer  for  sorting  results?   2.11  Item  holdings   How  does  your  product  retrieve  and  display  availability  data  for  local  physical  holdings?  Is  there  a   delay  in  harvesting  this  data  or  is  it  presented  in  real  time?  Is  item  location  and  availability   displayed  in  the  results  list  or  only  in  the  item  record?       2.12  OpenURL     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  JUNE  2015     55   How  does  your  product  provide  access  to  the  library’s  licensed  full  text  content?   Are  openURL  links  displayed  on  the  results  screen  or  only  in  the  item  record?   2.13  Native  record  linking   Does  your  product  offer  direct  links  to  original  records  in  their  native  source  (e.g.,  library  catalog,   institutional  repository,  third-­‐party  databases,  etc.)?   2.14  Output  options   What  output  options  does  your  product  offer  (e.g.,  print,  save,  email,  SMS,  cite,  export)?     If  you  offer  a  citation  function,  what  citation  formats  does  your  product  support  (MLA,  APA,   Chicago,  etc.)?   If  you  offer  an  export  function,  which  citation  managers  does  your  product  support  (e.g.,  RefWorks,   EndNote,  Zotero,  Mendeley,  EasyBib,  etc.)?     Are  citation  and  export  options  locally  customizable?  Can  they  be  customized  by  search  portal?   2.15  Personalization   Does  your  product  offer  any  personalization  features  that  allow  users  to  customize  preferences,   save  results,  create  lists,  bookmark  items,  etc.?  Are  these  features  linked  to  a  personal  account  or   are  they  session-­‐based?   If  personal  accounts  are  supported,  must  users  create  their  own  accounts  or  can  account  creation   be  based  on  the  university’s  CAS/LDAP  identity  management  system?   2.16  Recommendations   Does  your  product  provide  item  recommendations  to  help  users  locate  similar  items?  On  what   criteria  are  these  recommendations  based?   Is  your  product  capable  of  referring  users  to  specialized  databases  based  on  their  search  query?   (For  example,  can  a  search  for  “autism”  trigger  database  recommendations  suggesting  that  the   user  try  their  search  in  PsycINFO  or  PubMed?)  If  so,  does  your  product  just  provide  links  to  these   resources  or  does  it  allow  the  user  to  launch  a  new  search  by  passing  their  query  to  the   recommended  database?     2.17  Account  management   Can  your  product  be  integrated  with  the  library’s  ILS  (SirsiDynix  Symphony)  to  provide  users   access  to  its  account  management  functions  (e.g.,  renewing  loans,  placing  holds/requests,  viewing   borrowing  history,  etc.)?  If  so,  do  you  provide  any  drivers  or  technical  support  for  this  purpose?     EVALUATING  WEB-­‐SCALE  DISCOVERY  SERVICES:  A  STEP-­‐BY-­‐STEP  GUIDE  |  DEODATO   doi:  10.6017/ital.v34i2.5745   56   2.18  Guest  access   Are  users  permitted  “guest  access”  to  the  service?  Are  users  required  to  authenticate  in  order  to   search  or  only  when  requesting  access  to  licensed  content?   2.19  Context-­‐sensitive  services   Could  your  product  be  configured  to  interact  with  our  university  course  management  systems   (Sakai,  Blackboard,  and  eCollege)  to  deliver  customized  services  based  on  user  status  and   affiliation?  If  so,  do  you  provide  any  drivers  or  technical  support  for  this  purpose?   2.20  Context-­‐sensitive  delivery  options   Could  your  product  be  configured  to  interact  with  the  library’s  interlibrary  loan  (ILLiad)  and   consortium  borrowing  services  (EZBorrow  and  UBorrow)  to  display  context-­‐sensitive  delivery   options  for  unavailable  local  holdings?  If  so,  do  you  provide  any  drivers  or  technical  support  for   this  purpose?   2.21  Location  mapping   Could  your  product  be  configured  to  support  location  mapping  by  linking  the  call  numbers  of   physical  items  to  library  maps?   2.22  Custom  widgets   Does  your  product  support  the  integration  of  custom  library  widgets  such  as  live  chat?  Where  can   these  widgets  be  embedded?   2.23  Featured  items   Could  your  product  be  configured  to  highlight  specific  library  items  such  as  recent  acquisitions,   popular  items,  or  featured  collections?     2.24  Alerts   Does  your  product  offer  customizable  alerts  or  RSS  feeds  to  inform  users  about  new  items  related   to  their  research  or  area  of  study?   2.25  User-­‐submitted  content   Does  your  product  support  user-­‐generated  content  such  as  tags,  ratings,  comments,  and  reviews?     Is  user-­‐generated  content  only  available  to  the  host  library  or  is  it  shared  among  all  subscribers  of   your  service?   Can  these  features  be  optionally  enabled  or  disabled?       INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  JUNE  2015     57   2.26  Social  media  integration   Does  your  product  allow  users  to  seamlessly  share  items  via  social  media  such  as  Facebook,   Google+,  and  Twitter?     Can  these  features  be  optionally  enabled  or  disabled?   3.  Usability       3.1  Design   Describe  how  your  product  incorporates  established  best  practices  in  usability.  What  usability   testing  have  you  performed  and/or  do  you  conduct  on  an  ongoing  basis?   What  aspects  of  the  interface’s  design  are  locally  customizable  (e.g.,  color  scheme,  branding,   display,  etc.)?     Can  the  library  apply  its  own  custom  stylesheets  or  is  customization  limited  to  a  set  or  predefined   options?   3.2  Navigation   What  aspects  of  the  interface’s  navigation  are  locally  customizable  (e.g.,  menus,  pagination,  facets,   etc.)?     3.3  Accessibility   Does  your  product  meet  ADA  and  Section  508  accessibility  requirements?  What  steps  have  you   taken  beyond  Section  508  requirements  to  make  your  product  more  accessible  to  people  with   disabilities?     3.4  Internationalization   Do  you  offer  translations  of  the  interface  in  multiple  languages?  Which  languages  are  supported?   Does  this  include  translation  of  any  locally  customized  text?   3.5  Help   Does  your  product  include  help  screens  to  assist  users  in  using  and  navigating  the  system?     Are  help  screens  general  or  context-­‐sensitive  (i.e.,  relevant  to  the  user’s  current  location  within   the  system)?     Are  help  screens  locally  customizable?   3.6  Record  display     EVALUATING  WEB-­‐SCALE  DISCOVERY  SERVICES:  A  STEP-­‐BY-­‐STEP  GUIDE  |  DEODATO   doi:  10.6017/ital.v34i2.5745   58   Does  your  product  offer  multiple  record  displays  with  varying  levels  of  information?  What  types   of  record  displays  are  available  (e.g.,  preview,  brief  view,  full  view,  staff  view,  etc.)?   Can  record  displays  be  customizable  by  item  type  or  metadata  (e.g.,  MARC-­‐based  book  record  vs.   MODS-­‐based  repository  record)?   Can  record  displays  be  customizable  by  search  portal  (e.g.,  a  biosciences  search  portal  that   displays  medical  rather  than  LC  subject  headings  and  call  numbers)?   3.7  Enriched  content   Does  your  product  provide  or  support  the  integration  of  enriched  content  such  as  cover  images,   tables  of  contents,  author  biographies,  reviews,  excerpts,  journal  rankings,  citation  counts,  etc.?  If   so,  what  types  of  content  does  this  include?  Is  there  an  additional  cost  for  this  content?   3.8  Format  icons   Does  your  product  provide  any  icons  or  visual  cues  to  help  users  easily  recognize  the  formats  of   the  variety  of  items  displayed  in  search  results?   How  does  your  product  define  formats?  Are  these  definitions  readily  available  to  end  users?  Are   these  definitions  customizable?   3.10  Persistent  URLs   Does  your  product  offer  persistent  links  to  item  records?   Does  your  product  offer  persistent  links  to  search  queries  and  browse  categories?   4.  Administration     4.1  Cost   Briefly  describe  your  product  pricing  model  for  academic  library  customers.   4.2  Implementation   Can  you  meet  the  timetable  defined  in  appendix  Z  [not  reproduced  here]?    If  not,  which  milestones   cannot  be  met  or  which  conditions  must  the  Libraries  address  in  order  to  meet  the  milestones?   Are  you  currently  working  on  web-­‐scale  discovery  implementations  at  any  other  large  institutions?     4.3  User  community   How  many  live,  active  installations  (i.e.,  where  the  product  is  currently  available  to  end-­‐users)  do   you  currently  have?     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  JUNE  2015     59   How  many  additional  customers  have  committed  to  the  product?   How  many  of  your  total  customers  are  college  or  university  libraries?   4.4  Support   What  customer  support  services  and  hours  of  availability  do  you  provide  for  reporting  and/or   troubleshooting  technical  problems?   Do  you  have  a  help  ticket  tracking  system  for  monitoring  and  notifying  clients  of  the  status  of   outstanding  support  issues?     Do  you  offer  a  support  website  with  up-­‐to-­‐date  product  documentation,  manuals,  tutorials,  and   FAQs?     Do  you  provide  on-­‐site  and  online  training  for  library  staff?   Do  you  provide  on-­‐site  and  online  training  for  end  users?   Briefly  describe  any  consulting  services  you  may  provide  above  and  beyond  support  services   included  with  subscription  (e.g.,  consulting  services  related  to  harvesting  of  a  unique  library   resource  for  which  an  ingest/transform/normalize  routine  does  not  already  exist).     Do  you  have  regular  public  meetings  for  users  to  share  experiences  and  provide  feedback  on  the   product?    If  so,  where  and  how  often  are  these  meetings  held?   What  other  communication  avenues  do  you  provide  for  users  to  communicate  with  your  company   and  also  with  each  other  (e.g.,  listserv,  blog,  social  media)?   4.5  Administration     What  kinds  of  tools  are  provided  for  local  administration  and  customization  of  the  product?   Does  your  product  support  multiple  administrator  logins  and  roles?       4.6  Statistics  reporting   What  statistics  reporting  capabilities  are  included  with  your  product?  What  kinds  of  data  are   available  to  track  and  assess  collection  management  and  product  usage?  In  what  formats  are  these   reports  available?  Is  the  data  exportable?   Is  it  possible  to  integrate  third-­‐party  analytic  tools  such  as  Google  Analytics  in  order  to  collect   usage  data?           EVALUATING  WEB-­‐SCALE  DISCOVERY  SERVICES:  A  STEP-­‐BY-­‐STEP  GUIDE  |  DEODATO   doi:  10.6017/ital.v34i2.5745   60   5.  Technology     5.1  Development   In  what  month  and  year  did  product  development  begin?   What  key  features  differentiate  your  product  from  those  of  your  competitors?   How  frequently  are  enhancements  and  upgrades  made  to  the  service?   Please  describe  the  major  enhancements  you  expect  to  implement  in  the  next  year.   Please  describe  the  future  direction  or  major  enhancements  you  envision  for  the  product  in  the   next  3–5  years.   Is  there  a  formal  mechanism  by  which  customers  may  make,  rank,  and  monitor  the  status  of   enhancement  requests?   Do  you  have  a  dedicated  user’s  advisory  group  to  test  and  provide  feedback  on  product   development?   5.2  Authentication     What  authentication  methods  does  your  product  support  (e.g.,  LDAP,  CAS,  Shibboleth,  etc.)?   5.3  Browser  compatibility   Please  provide  a  list  of  currently  supported  web  browsers.   5.4  Mobile  access   Is  the  product  accessible  on  mobile  devices  via  a  mobile  optimized  web  interface  or  app?     Does  the  mobile  version  include  the  same  features  and  functionality  of  the  desktop  version?   5.5  Portability   Can  custom  search  boxes  be  created  and  embedded  in  external  platforms  such  as  the  library’s   research  guides,  course  management  systems,  or  university  portals?   5.6  Interoperability   Does  your  product  include  an  API  that  can  be  used  extract  data  from  the  central  index  or  pair  it   with  a  different  interface?  What  types  of  data  can  be  extracted  with  the  API?  Do  you  provide   documentation  and  instruction  on  the  functionality  and  use  of  your  API?     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  JUNE  2015     61   Are  there  any  known  compatibility  issues  with  your  product  and  any  of  the  following  systems  or   platforms?   • Drupal   • VuFind   • SirsiDynix  Symphony   • Fedora  Commons   • EZProxy   • ILLiad   5.7  Consortia  support   Can  your  product  support  multiple  institutions  on  the  same  installation,  each  with  its  own  unique   instance  and  configuration  of  the  product?  Is  there  any  additional  cost  for  this  service?         EVALUATING  WEB-­‐SCALE  DISCOVERY  SERVICES:  A  STEP-­‐BY-­‐STEP  GUIDE  |  DEODATO   doi:  10.6017/ital.v34i2.5745   62   Appendix  E.  Web-­‐Scale  Discovery  Customer  Questionnaire     Institutional  Background   Please  tell  us  a  little  bit  about  your  library.   What  is  the  name  of  your  college  or  university?   Which  web-­‐scale  discovery  service  is  currently  in  use  at  your  library?   � EBSCO  Discovery  Service  (EDS)   � Primo  Central  (Ex  Libris)   � Summon  (ProQuest)   � WorldCat  Local  (OCLC)   � Other  ________________   When  was  your  current  web-­‐scale  discovery  service  selected  (month,  year)?   How  long  did  it  take  to  implement  (even  in  beta  form)  your  current  web-­‐scale  discovery   service?   Which  of  the  following  types  of  content  are  included  in  your  web-­‐scale  discovery  service?   (Check  all  that  apply)   � Library  catalog  records   � Periodical  indexes  and  databases   � Open  access  content     � Institutional  repository  records   � Local  digital  collections  (other  than  your  institutional  repository)   � Library  research  guides   � Library  web  pages   � Other  ________________           INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  JUNE  2015     63   Rate  Your  Satisfaction   On  a  scale  of  1  (low)  to  5  (high),  please  rate  your  satisfaction  with  the  following  aspects  of   your  web-­‐scale  discovery  service.     Content   How  satisfied  are  you  with  the  scope,  depth,  and  currency  of  coverage  provided  by  your   web-­‐scale  discovery  service?  [Are  the  question  marks  below  the  wrong  font?]   ◌  1     ◌  2   ◌  3   ◌  4   ◌  5     Functionality   How  satisfied  are  you  with  the  search  functionality,  performance,  and  result  quality  of  your   web-­‐scale  discovery  service?   ◌  1   ◌  2   ◌  3   ◌  4   ◌  5   Usability   How  satisfied  are  you  with  the  design,  layout,  navigability,  and  overall  ease  of  use  of  your   web-­‐scale  discovery  interface?   ◌  1   ◌  2   ◌  3   ◌  4   ◌  5   Administration   How  satisfied  are  you  with  the  administrative,  customization,  and  reporting  tools  offered   by  your  web-­‐scale  discovery  service?   ◌  1   ◌  2   ◌  3   ◌  4   ◌  5   Technology   How  satisfied  are  you  with  the  level  of  interoperability  between  your  web-­‐scale  discovery   service  and  other  library  systems  such  as  your  ILS,  knowledge  base,  link  resolver,  and   institutional  repository?   ◌  1   ◌  2   ◌  3   ◌  4   ◌  5   Overall   Overall,  how  satisfied  are  you  with  your  institution’s  web-­‐scale  discovery  service?   ◌  1   ◌  2   ◌  3   ◌  4   ◌  5       EVALUATING  WEB-­‐SCALE  DISCOVERY  SERVICES:  A  STEP-­‐BY-­‐STEP  GUIDE  |  DEODATO   doi:  10.6017/ital.v34i2.5745   64     Questions   Please  share  your  experiences  with  your  web-­‐scale  discovery  service  by  responding  to  the   following  questions.   Briefly  describe  your  reasons  for  implementing  a  web-­‐scale  discovery  service.  What  role   does  this  service  play  at  your  library?  How  is  it  intended  to  benefit  your  users?  What  types   of  users  is  it  intended  to  serve?   Does  your  web-­‐scale  discovery  service  have  any  notable  gaps  in  coverage?  If  so,  how  do   you  compensate  for  those  gaps  or  make  users  of  aware  of  resources  that  are  not  included  in   the  service?   Are  you  satisfied  with  the  relevance  of  the  results  returned  by  your  web-­‐scale  discovery   service?  Have  you  noticed  any  particular  anomalies  within  search  results?     Does  your  web-­‐scale  discovery  service  lack  any  specific  features  or  functions  that  you  wish   were  available?   Are  there  any  particular  aspects  of  your  web-­‐scale  discovery  service  that  you  wish  were   customizable  but  are  not?       Did  you  face  any  particular  challenges  integrating  your  web-­‐scale  discovery  service  with   other  library  systems  such  as  your  ILS,  knowledge  base,  and  link  resolver?   How  responsive  has  the  vendor  been  in  providing  technical  support,  resolving  problems,   and  responding  to  enhancement  requests?  Have  they  provided  adequate  training  and   documentation  to  support  your  implementation?   In  general,  how  have  users  responded  to  the  introduction  of  this  service?  Has  their   response  been  positive,  negative,  or  mixed?     In  general,  how  have  librarians  responded  to  the  introduction  of  this  service?  Has  their   response  been  positive,  negative,  or  mixed?   What  has  been  the  impact  of  implementing  a  web-­‐scale  discovery  service  on  the  overall   usage  of  your  collection?  Have  you  noticed  any  fluctuations  in  circulation,  full  text   downloads,  or  usage  of  subject-­‐specific  databases?     Has  your  institution  conducted  any  assessment  or  usability  studies  of  your  web-­‐scale   discovery  service?  If  so,  please  briefly  describe  the  key  findings  of  these  studies.   Please  share  any  additional  thoughts  or  advice  that  you  think  might  be  helpful  to  other   libraries  currently  exploring  web-­‐scale  discovery  services.             INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  JUNE  2015     65   Appendix  F.  Sample  Worksheet  for  Web-­‐Scale  Discovery  Coverage  Test   Instructions   Construct  3  search  queries  representing  commonly  researched  topics  in  your  discipline.   Test  your  queries  in  each  discovery  product  and  compare  the  results.  For  each  product,   record  the  number  of  results  retrieved  and  rate  the  quality  of  coverage  and  indexing.  Use   the  space  below  your  ratings  to  explain  your  rationale  and  record  any  notes  or   observations.   Rate  coverage  and  indexing  a  scale  of  1  to  3  (1  =  POOR,  2  =  AVERAGE,  3  =  GOOD).  In  your   evaluation,  please  consider  the  following:       Coverage   Indexing   •  Do  the  search  results  demonstrate  broad   coverage  of  the  variety  of  subjects,  formats,   and  content  types  represented  in  the   library’s  collection?  (Hint:  Use  facets  to   examine  the  breakdown  of  results  by   source  type  or  collection).     •  Do  any  particular  types  of  content  seem   to  dominate  the  results  (books,  journal   articles,  newspapers,  book  reviews,   reference  materials,  etc.)?   •  Are  the  library’s  local  collections   adequately  represented  in  the  results?   •  Do  any  relevant  resources  appear  to  be   missing  from  the  search  results  (e.g.,   results  from  an  especially  relevant   database  or  journal)?   •  Do  item  records  contain  complete  and   accurate  source  information?   •  Do  item  records  contain  sufficient   metadata  (citation,  subject  headings,   abstracts,  etc.)  to  help  users  identify  and   evaluate  results?               EVALUATING  WEB-­‐SCALE  DISCOVERY  SERVICES:  A  STEP-­‐BY-­‐STEP  GUIDE  |  DEODATO   doi:  10.6017/ital.v34i2.5745   66     Example   Product   Product  B   Reviewer   Reviewer  #2   Discipline   History   Query   Results   Coverage   Indexing   KW:  slavery  AND   “united  states”   181,457   1  (POOR)   3  (GOOD)   The  majority  of  results   appear  to  be  from   newspapers  and  periodicals.   Some  items  designated  as   “journals”  are  actually   magazines.  There  are  a  large   number  of  duplicate  records.   Some  major  works  on  this   subject  are  not  represented  in   the  results.     Depth  of  indexing  varies  by   publication  but  most  include   abstracts  and  subject   headings.  Some  records  only   include  citations,  but   citations  appear  to  be   complete  and  accurate.             INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  JUNE  2015     67   Appendix  G.  Sample  Worksheet  for  Web-­‐Scale  Discovery  Usability  Test   Pre-­‐Test  Questionnaire     Before  beginning  the  test,  ask  the  user  for  the  following  information.   Status       �  Undergraduate     �  Graduate     �  Faculty   �  Staff     �  Other   Major/Department   ___________________________   What  resource  do  you  use  most  often  for  scholarly  research?          ___________________________   On  a  scale  of  1  to  5,  how  would  you  rate  your  ability  to  find  information  using  library   resources?   Low   �  1       �  2       �  3     �  4     �  5   High   On  a  scale  of  1  to  5,  how  would  you  rate  your  ability  to  find  information  using  Google  or   other  search  engines?   Low   �  1       �  2       �  3     �  4     �  5   High     Scenarios   Ask  the  user  to  complete  the  following  tasks  using  each  product  while  sharing  their   thoughts  aloud.   1.  You  are  writing  a  research  paper  for  your  communications  course.  You’ve  recently  been   discussing  how  social  media  sites  like  Facebook  collect  and  store  large  amounts  of  personal   data.  You  decide  to  write  a  paper  that  answers  the  question:  “Are  social  networking  sites  a   threat  to  privacy?”  Use  the  search  tool  to  find  sources  that  will  help  you  support  your   argument.   2.  From  the  first  10  results,  select  those  that  you  would  use  to  learn  more  about  this  topic   and  email  them  to  yourself.  If  none  of  the  results  seem  useful,  do  not  select  any.   3.  If  you  were  writing  a  paper  on  this  topic,  how  satisfied  would  you  be  with  these  results?            �  Very  dissatisfied              �  Dissatisfied    �  No  opinion            �  Satisfied              �  Very  satisfied   4.  From  the  first  10  results,  attempt  to  access  an  item  for  which  full  text  is  available  online.     EVALUATING  WEB-­‐SCALE  DISCOVERY  SERVICES:  A  STEP-­‐BY-­‐STEP  GUIDE  |  DEODATO   doi:  10.6017/ital.v34i2.5745   68   5.  Now  that  you’ve  seen  the  first  10  results,  what  would  you  do  next?   � Decide  you  have  enough  information  and  stop     � Continue  and  review  the  next  set  of  results   � Revise  your  search  and  try  again   � Exit  and  try  your  search  in  another  library  database  (which  one?)   � Exit  and  try  your  search  in  Google  or  another  search  engine     � Other  (please  explain)       Post-­‐Test  Questionnaire   After  the  user  has  used  all  three  products,  ask  them  about  their  experiences.     Based  on  your  experience,  please  rank  the  three  search  tools  you’ve  seen  in  order  of   preference.     How  would  you  compare  these  search  tools  with  the  search  options  currently  offered  by   the  library?                           INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  JUNE  2015     69   Appendix  H.  Sample  Worksheet  for  Web-­‐Scale  Discovery  Relevance  Test   Instructions   Conduct  the  same  search  query  in  each  discovery  product  and  rate  the  relevance  of  the  first  10   results  using  the  scale  provided.  For  each  query,  record  your  search  condition,  terms,  and   limiters.  For  each  product,  record  the  first  10  results  in  the  exact  order  they  appear,  rank  the   relevance  of  each  result  using  the  relevance  scale,  and  explain  the  rationale  for  your  score.  All   calculations  will  be  tabulated  automatically.     Relevance  Scale   0  =  Not  relevant   Not  at  all  relevant  to  the  topic,  exact  duplicate  of  a  previous  result,  or   not  enough  information  in  the  record  or  full  text  to  determine  relevance   1  =  Somewhat  relevant   Somewhat  relevant  but  does  not  address  all  of  concepts  or  criteria   specified  in  the  search  query,  e.g.,  addresses  only  part  of  the  topic,  is  too   broad  or  narrow  in  scope,  is  not  in  the  specified  format,  etc.       2  =  Relevant   Relevant  to  the  topic,  but  the  topic  may  not  be  the  primary  or  central   subject  of  the  work,  or  the  work  is  too  brief  or  dated  to  be  useful;  a   resource  that  the  user  might  select     3  =  Very  relevant   Completely  relevant;  exactly  on  topic;  addresses  all  concepts  and   criteria  included  in  the  search  query;  a  resource  that  the  user  would   likely  select       Calculations     Cumulative  Gain   Measure  of  overall  relevance  based  on  the  sum  of  all  relevance  scores.     Discount  Factor   (1/log2i)   Penalization  of  relevance  based  on  ranking.  Assuming  that  relevance   decreases  with  rank,  each  result  after  the  first  is  associated  with  a   discount  factor  based  on  log  factor  2.  Discount  factor  is  calculated  as   1/log2i  where  i  =  rank.  The  discount  factor  of  result  #6  is  calculated  as  1   divided  by  the  logarithm  of  6  with  base  2,  or  1/log(6,2)  =  0.39.   Discounted  Gain   Discounted  relevance  score  based  on  ranking.  Discounted  gain  is   calculated  by  multiplying  a  result’s  relevance  score  by  its  discount   factor.  The  discounted  gain  of  a  result  with  a  relevance  score  of  3  and   discount  factor  of  0.39  is  3  ×  0.39,  or  1.17.   Discounted   Cumulative  Gain   Measure  of  overall  discounted  gain  based  on  the  sum  of  all  discount  gain   scores.       EVALUATING  WEB-­‐SCALE  DISCOVERY  SERVICES:  A  STEP-­‐BY-­‐STEP  GUIDE  |  DEODATO   doi:  10.6017/ital.v34i2.5745   70   Example   Product     Product  C   Reviewer   Reviewer  #3   Search   Condition   Seeking  peer  reviewed  articles  about  the  impact  of  media  violence  on  children   Search  Terms   “mass  media”  AND  violence  AND  children   Limits   Peer  reviewed     R esult   R elevan ce   N otes   C.  G ain   R an k   R elevan ce   1/Log 2 i   D .  G ain   D .C.  G ain   Effects  of   Media  Ratings   on  Children   and   Adolescents:  A   Litmus  Test  of   the  Forbidden   Fruit  Effect   0   Research  article   suggesting  that  ratings  do   not  influence  children’s   perceptions  of  films  or   video  games.  Not  relevant;   does  not  discuss  impact  of   media  violence  on   children.   19   1   0   1.00   0   9.65   Media  Violence   Associations   with  the  Form   and  Function  of   Aggression   among   Elementary   School   Children   3   Research  article   demonstrating  a  positive   association  between   media  violence  exposure   and  levels  of  physical  and   relational  aggression  in   grade  school  students.   Very  relevant.     2   3   1.00   3       INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  JUNE  2015     71   Harmful  Effects   of  Media  on   Children  and   Adolescents   2   Review  article  discussing   the  influence  of  media  on   negative  child  behaviors   such  as  violence,   substance  abuse,  and   sexual  promiscuity.   Relevant  but  does  not   focus  exclusively  on  media   violence.       3   2   0.63   1.26     The  Influence   of  Media   Violence  on   Children   3   Review  article  examining   opposing  views  on  media   violence  and  its  impact  on   children.  Very  relevant.       4   3   0.50   1.5     Remote   Control   Childhood:   Combating  the   Hazards  of   Media  Culture   in  Schools   1   Review  article  discussing   the  harmful  effects  of  mass   media  on  child  behavior   and  learning  as  well  as   strategies  educators  can   use  to  counteract  them.   Somewhat  relevant  but   does  not  focus  exclusively   on  media  violence  and   discussion  is  limited  to  the   educational  context.     5   1   0.43   0.43     Media  Violence,   Physical   Aggression,   and  Relational   Aggression  in   School  Age   Children   3   Research  article  on  the   impact  of  media  violence   on  childhood  aggression  in   relation  to  different  types   of  aggression,  media,  and   time  periods.  Very   relevant.       6   3   0.39   1.17     Do  You  See   What  I  See?   Parent  and   Child  Reports   of  Parental   2   Research  article   examining  the   effectiveness  of  parental   monitoring  of  children’s   violent  media     7   2   0.36   0.72       EVALUATING  WEB-­‐SCALE  DISCOVERY  SERVICES:  A  STEP-­‐BY-­‐STEP  GUIDE  |  DEODATO   doi:  10.6017/ital.v34i2.5745   72   Monitoring  of   Media   consumption.  Relevant  but   focused  less  on  the  effects   of  media  violence  than   strategies  for  mitigating   them.   Exposure  to   Media  Violence   and  Young   Children  with   and  Without   Disabilities:   Powerful   Opportunities   for  Family-­‐ Professional   Partnerships   2   Review  article  discussing   the  impact  of  media   violence  on  children  with   and  without  disabilities   and  recommendations  for   addressing  this  through   family-­‐professional   partnerships.  Relevant  but   slightly  more  specific  than   required.       8   2   0.33   0.66     KITLE   ILETISIM   ARAÇLARINDA N   TELEVIZYONU N  3-­‐6  YAS   GRUBUNDAKI   ÇOCUKLARIN   DAVRANISLAR I  ÜZERINE   ETKISI.     1   Research  article   demonstrating  a  positive   correlation  between   media  violence  exposure   and  aggressive  behavior  in   grade  school  students.   Seems  very  relevant  but   article  is  in  Turkish.     9   1   0.32   0.32     Sex  and   Violence:  Is   Exposure  to   Media  Content   Harmful  to   Children?   2   Review  article  discussing   how  exposure  to  violent  or   sexually  explicit  media   influences  child  behavior   and  what  librarians  can  do   about  it.  Relevant  but  less   than  two  pages  long.       10   2   0.30   0.60         INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  JUNE  2015     73   REFERENCES     1.     Judy  Luther  and  Maureen  C.  Kelly,  “The  Next  Generation  of  Discovery,”  Library  Journal  136,  no.   5  (2011):  66.     2.     Athena  Hoeppner,  “The  Ins  and  Outs  of  Evaluating  Web-­‐Scale  Discovery  Services,”  Computers   in  Libraries  32,  no.  3  (2012):  8.   3.     Kate  B.  Moore  and  Courtney  Greene,  “Choosing  Discovery:  A  Literature  Review  on  the   Selection  and  Evaluation  of  Discovery  Layers,”  Journal  of  Web  Librarianship  6,  no.  3  (2012):   145–63,  http://dx.doi.org/10.1080/19322909.2012.689602.     4.     Ronda  Rowe,  “Web-­‐Scale  Discovery:  A  Review  of  Summon,  EBSCO  Discovery  Service,  and   WorldCat  Local,”  Charleston  Advisor  12,  no.  1  (2010):  5-­‐10,   http://dx.doi.org/10.5260/chara.12.1.5;  Ronda  Rowe,  “Encore  Synergy,  Primo  Central,”   Charleston  Advisor  12,  no.  4  (2011):  11–15,  http://dx.doi.org/10.5260/chara.12.4.11.   5.     Sharon  Q.  Yang  and  Kurt  Wagner,  “Evaluating  and  Comparing  Discovery  Tools:  How  Close  Are   We  towards  the  Next  Generation  Catalog?”  Library  Hi  Tech  28,  no.  4  (2010):  690–709,   http://dx.doi.org/10.1108/07378831011096312.   6.     Jason  Vaughan,  “Web  Scale  Discovery  Services,”  Library  Technology  Reports  47,  no.  1.  (2011):   5–61,  http://dx.doi.org/10.5860/ltr.47n1.   7.     Hoeppner,  “The  Ins  and  Outs  of  Evaluation  Web-­‐Scale  Discovery  Services.”   8.     Luther  and  Kelly,  “The  Next  Generation  of  Discovery”;  Amy  Hoseth,  “Criteria  To  Consider   When  Evaluating  Web-­‐Based  Discovery  Tools,”  in  Planning  and  Implementing  Resource   Discovery  Tools  in  Academic  Libraries,  ed.  Mary  P.  Popp  and  Diane  Dallis  (Hershey,  PA:   Information  Science  Reference,  2012),  90–103,  http://dx.doi.org/10.4018/978-­‐1-­‐4666-­‐1821-­‐ 3.ch006.   9.     F.  William  Chickering  and  Sharon  Q.  Yang,  “Evaluation  and  Comparison  of  Discovery  Tools:  An   Update,”  Information  Technology  &  Libraries  33,  no.  2  (2014):  5–30,   http://dx.doi.org/10.6017/ital.v33i2.3471.   10.    Noah  Brubaker,  Susan  Leach-­‐Murray,  and  Sherri  Parker,  “Shapes  in  the  Cloud:  Finding  the   Right  Discovery  Layer,”  Online  35,  no.  2  (2011):  20–26.   11.    Jason  Vaughan,  “Investigations  into  Library  Web-­‐Scale  Discovery  Services,”  Information   Technology  &  Libraries  31,  no.  1  (2012):  32–82,  http://dx.doi.org/10.6017/ital.v31i1.1916.   12.    Mary  P.  Popp  and  Diane  Dallis,  eds.,  Planning  and  Implementing  Resource  Discovery  Tools  in   Academic  Libraries  (Hershey,  PA:  Information  Science  Reference,  2012),   http://dx.doi.org/10.4018/978-­‐1-­‐4666-­‐1821-­‐3.   13.    Jason  Vaughan,  “Evaluating  and  Selecting  a  Library  Web-­‐Scale  Discovery  Service,”  in  Planning   and  Implementing  Resource  Discovery  Tools  in  Academic  Libraries,  ed.  Mary  P.  Popp  and  Diane     EVALUATING  WEB-­‐SCALE  DISCOVERY  SERVICES:  A  STEP-­‐BY-­‐STEP  GUIDE  |  DEODATO   doi:  10.6017/ital.v34i2.5745   74     Dallis  (Hershey,  PA:  Information  Science  Reference,  2012),  59–76,   http://dx.doi.org/10.4018/978-­‐1-­‐4666-­‐1821-­‐3.ch004.   14.    Monica  Metz-­‐Wiseman  et  al.,  “Best  Practices  for  Selecting  the  Best  Fit,”  in  Planning  and   Implementing  Resource  Discovery  Tools  in  Academic  Libraries,  ed.  Mary  P.  Popp  and  Diane   Dallis  (Hershey,  PA:  Information  Science  Reference,  2012),  77–89,   http://dx.doi.org/10.4018/978-­‐1-­‐4666-­‐1821-­‐3.ch005.   15.    David  Freivalds  and  Binky  Lush,  “Thinking  Inside  the  Grid:  Selecting  a  Discovery  System   through  the  RFP  Process,”  in  Planning  and  Implementing  Resource  Discovery  Tools  in  Academic   Libraries,  ed.  Mary  P.  Popp  and  Diane  Dallis  (Hershey,  PA:  Information  Science  Reference,   2012),  104–21,  http://dx.doi.org/10.4018/978-­‐1-­‐4666-­‐1821-­‐3.ch007.     16.    David  Bietila  and  Tod  Olson,  “Designing  an  Evaluation  Process  for  Resource  Discovery  Tools,”   in  Planning  and  Implementing  Resource  Discovery  Tools  in  Academic  Libraries,  ed.  Mary  P.  Popp   and  Diane  Dallis  (Hershey,  PA:  Information  Science  Reference,  2012),  122–36,   http://dx.doi.org/10.4018/978-­‐1-­‐4666-­‐1821-­‐3.ch008.   17.    Suzanne  Chapman  et  al.,  “Developing  a  User-­‐Centered  Article  Discovery  Environment,”  in   Planning  and  Implementing  Resource  Discovery  Tools  in  Academic  Libraries,  ed.  Mary  P.  Popp   and  Diane  Dallis  (Hershey,  PA:  Information  Science  Reference,  2012),  194–224,   http://dx.doi.org/10.4018/978-­‐1-­‐4666-­‐1821-­‐3.ch012.   18.    Lynn  D.  Lampert  and  Katherine  S.  Dabbour,  “Librarian  Perspectives  on  Teaching  Metasearch   and  Federated  Search  Technologies,”  Internet  Reference  Services  Quarterly  12,  no.3/4  (2007):   253–78,  http://dx.doi.org/10.1300/J136v12n03_02;  William  Breitbach,  “Web-­‐Scale   Discovery:  A  Library  of  Babel?”  in  Planning  and  Implementing  Resource  Discovery  Tools  in   Academic  Libraries,  ed.  Mary  P.  Popp  and  Diane  Dallis  (Hershey,  PA:  Information  Science   Reference,  2012),  637–45,  http://dx.doi.org/10.4018/978-­‐1-­‐4666-­‐1821-­‐3.ch038.   19.    Metz-­‐Wiseman  et  al.,  “Best  Practices  for  Selecting  the  Best  Fit,”  81.   20.    Meris  A.  Mandernach  and  Jody  Condit  Fagan,  “Creating  Organizational  Buy-­‐In:  Overcoming   Challenges  to  a  Library-­‐Wide  Discovery  Tool  Implementation,”  in  Planning  and  Implementing   Resource  Discovery  Tools  in  Academic  Libraries,  ed.  Mary  P.  Popp  and  Diane  Dallis  (Hershey,   PA:  Information  Science  Reference,  2012),  422,  http://dx.doi.org/10.4018/978-­‐1-­‐4666-­‐1821-­‐ 3.ch024.   21.    David  P.  Brennan,  “Details,  Details,  Details:  Issues  in  Planning  for,  Implementing,  and  Using   Resource  Discovery  Tools,”  in  Planning  and  Implementing  Resource  Discovery  Tools  in   Academic  Libraries,  ed.  Mary  P.  Popp  and  Diane  Dallis  (Hershey,  PA:  Information  Science   Reference,  2012),  44–56,  http://dx.doi.org/10.4018/978-­‐1-­‐4666-­‐1821-­‐3.ch003;  Hoseth,   “Criteria  To  Consider  When  Evaluating  Web-­‐Based  Discovery  Tools”;  Mandernach  and  Condit   Fagan,  “Creating  Organizational  Buy-­‐In.”     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  JUNE  2015     75     22.    Vaughan,  “Evaluating  and  Selecting  a  Library  Web-­‐Scale  Discovery  Service,”  64.   23.    Ibid.,  81.   24.    Nadine  P.  Ellero,  “An  Unexpected  Discovery:  One  Library’s  Experience  with  Web-­‐Scale   Discovery  Service  (WSDS)  Evaluation  and  Assessment,”  Journal  of  Library  Administration  53,   no.  5–6  (2014):  323–43,  http://dx.doi.org/10.1080/01930826.2013.876824.   25.    Vaughan,  “Evaluating  and  Selecting  a  Library  Web-­‐Scale  Discovery  Service,”  66.   26.    Hoseth,  “Criteria  To  Consider  When  Evaluating  Web-­‐Based  Discovery  Tools.”   27.    Yang  and  Wagner,  “Evaluating  and  Comparing  Discovery  Tools”;  Chickering  and  Yang,   “Evaluation  and  Comparison  of  Discovery  Tools”;  Bietila  and  Olson,  “Designing  an  Evaluation   Process  for  Resource  Discovery  Tools.”   28.    Vaughan,  “Investigations  into  Library  Web-­‐Scale  Discovery  Services”;  Vaughan,  “Evaluating   and  Selecting  a  Library  Web-­‐Scale  Discovery  Service”;  Freivalds  and  Lush,  “Thinking  Inside   the  Grid”l  Brubaker,  Leach-­‐Murray,  and  Parker,  “Shapes  in  the  Cloud.”     29.    Chapman  et  al.,  “Developing  a  User-­‐Centered  Article  Discovery  Environment.”     30.    Jakob  Nielsen,  “First  Rule  of  Usability?  Don't  Listen  to  Users,”  Nielsen  Norman  Group,  last   modified  August  5,  2001,  accessed,  August  5  2014,  http://www.nngroup.com/articles/first-­‐ rule-­‐of-­‐usability-­‐dont-­‐listen-­‐to-­‐users.     31.    Freivalds  and  Lush,  “Thinking  Inside  the  Grid.”   32.    Ibid.   33.    Matthew  B.  Hoy,  “An  Introduction  to  Web  Scale  Discovery  Systems,”  Medical  Reference  Services   Quarterly  31,  no.  3  (2012):  323–29,  http://dx.doi.org/10.1080/02763869.2012.698186;   Vaughan,  “Web  Scale  Discovery  Services”;  Vaughan,  “Investigations  into  Library  Web-­‐Scale   Discovery  Services”;  Hoeppner,  “The  Ins  and  Outs  of  Evaluating  Web-­‐Scale  Discovery   Services”;  Chickering  and  Yang,  “Evaluation  and  Comparison  of  Discovery  Tools.”   34.    Marshall  Breeding,  “Major  Discovery  Products,”  Library  Technology  Guides,  accessed  August   5,  2014,  http://librarytechnology.org/discovery.   35.    Hoeppner,  “The  Ins  and  Outs  of  Evaluating  Web-­‐Scale  Discovery  Services,”  40.   36.    Mandernach  and  Condit  Fagan,  “Creating  Organizational  Buy-­‐In,”  429.   37.    Bietila  and  Olson,  “Designing  an  Evaluation  Process  for  Resource  Discovery  Tools.”   38.    Special  thanks  to  Rutgers’  associate  university  librarian  for  digital  library  systems,  Grace   Agnew,  for  designing  this  testing  method.   5869 ---- Digital Collections Are a Sprint, Not a Marathon: Adapting Scrum Project Management Techniques to Library Digital Initiatives Michael J. Dulock and Holley Long INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2015 5 ABSTRACT This article describes a case study in which a small team from the digital initiatives group and metadata services department at the University of Colorado Boulder (CU-Boulder) Libraries conducted a pilot of the Scrum project management framework. The pilot team organized digital initiatives work into short, fixed intervals called sprints—a key component of Scrum. Working for more than a year in the modified framework yielded significant improvements to digital collection work, including increased production of digital objects and surrogate records, accelerated publication of digital collections, and an increase in the number of concurrent projects. Adoption of sprints has improved communication and cooperation between participants, reinforced teamwork, and enhanced their ability to adapt to shifting priorities. INTRODUCTION Libraries in recent years have freely adapted methodologies from other disciplines in an effort to improve library services. For example, librarians have • employed usability testing techniques to enhance users’ experience with digital libraries interfaces,1 improve the utility of library websites,2 and determine the efficacy of a visual search interface for a commercial library database;3 • adopted participatory design methods to identify information visualizations that could augment digital library services4 and determine user needs in new library buildings;5 and • utilized principles of continuous process improvement to enhance workflows for book acquisition and implementation of serial title changes in a technical services unit.6 Librarians often come to the profession with disciplinary knowledge from an undergraduate degree unrelated to librarianship, so it should come as no surprise that they bring some of that disciplinary knowledge to their work. The interdisciplinary nature of librarianship also creates an environment that is amenable to adoption or adaptation of techniques from a variety of sources, not only those originating in library science. In this paper, the authors describe their experiences Michael J. Dulock (michael.dulock@colorado.edu) is Assistant Professor and Metadata Librarian, University of Colorado Boulder. Holley Long (longh@uncw.edu), previously Assistant Professor and Systems Librarian for Digital Initiatives at University of Colorado, Boulder, is Digital Initiatives Librarian, Randall Library, University of North Carolina Wilmington. mailto:michael.dulock@colorado.edu mailto:longh@uncw.edu DIGITAL COLLECTIONS ARE A SPRINT, NOT A MARATHON | DULOCK AND LONG | doi: 10.6017/ital.v34i4.5869 6 in applying a modified Scrum management framework to facilitate digital collection production. They begin by elucidating the fundamentals of Scrum and then describes a pilot project using aspects of the methodology. They discuss the outcomes of the pilot and posit additional features of Scrum that may be adopted in the future. Fundamentals of Scrum Project Management The Scrum project management framework—one of several techniques under the rubric of agile project management—originated in software development, and has been applied in a variety of library contexts including the development of digital library platforms7 and library web applications.8 Scrum’s salient characteristics include self-managing teams that organize their work into “short iterations of clearly defined deliverables” and focus on “communication over documentation.”9 The Scrum Primer: A Lightweight Guide to the Theory and Practice of Scrum describes the roles, tools, and processes involved in this project management technique.10 Scrum teams are cross-functional and consist of five to nine members who are cross-trained to perform multiple tasks. In addition to the team, two individuals serve specialized roles, Scrum Master and Product Owner. The Scrum Master is responsible for ensuring that Scrum principles are followed and for removing any obstacles that hinder the team’s productivity. Hence the Scrum Master is not a project manager, but a facilitator. The Product Owner’s role is to manage the product by identifying and prioritizing its features. This individual represents the stakeholders’ interests and is ultimately responsible for the product’s value. The team divides their work into short, fixed intervals called sprints that typically last two to four weeks and are never extended. At the beginning of each sprint, the team meets to select and commit to completing a set of deliverables. Once these goals are set, they remain stable for the duration; course corrections can occur in later sprints. In software development, the Scrum team aims to complete a unit of work that stands on its own and is fully functional, known as a potentially shippable increment. It is selected from an itemized list of product features called the product backlog. The backlog is established at the outset of development and consists of a comprehensive list of tasks that must occur to complete the product. A well-constructed backlog has four characteristics. First, it is prioritized with the features that will yield the highest return on investment at the top of the list. Second, the backlog is appropriately detailed, so that the tasks at the top of the list are well-defined whereas those at the bottom may be more vaguely demarcated. Third, each task receives an estimation for the amount of effort required to complete it, which helps the team to project a timeline for the product. Finally, the backlog evolves in response to new developments. Individual tasks may be added, deleted, divided, or reprioritized over the life of the project. During the course of a sprint, team members meet to plan the sprint, check-in on a daily basis, and then debrief at the conclusion of the sprint. They begin with a two-part planning meeting in which the Product Owner reviews the highest priority tasks with the team. In the second half of the meeting, the team and the Scrum Master determine how many of the tasks can be accomplished in INFORMATION TECHNOLOGIES AND LIBRARIES |DECEMBER 2015 7 the given timeframe, thus defining the goals for the sprint. This meeting generally lasts no longer than four hours for a two-week sprint. Every day, the team holds a brief meeting to get organized and stay on track. During these “daily Scrums,” each team member shares three pieces of information: what has been accomplished since the previous meeting, what will be accomplished before the next meeting, and what, if any, obstacles are impeding the work. These fifteen-minute meetings provide the team with a valuable opportunity to communicate and coordinate their efforts. Sprints conclude with two meetings, a review and retrospective. During the review, the team inspects the deliverables that were produced during that sprint. The retrospective provides an opportunity to discuss the process, what is working well, and what needs to be adjusted. Figure 1. Typical Meeting Schedule for a Two-Week Sprint Evidence in the literature suggests that Scrum improves both outcomes and process. One meta- analysis of 274 programming case studies found that implementing Scrum led to improved productivity as well as greater customer satisfaction, product quality, team motivation, and cost reduction.11 Proponents of this project management technique find that it leads to a more flexible and efficient process. Scrum’s brief iterative work cycles and evolving product backlog promote adaptability so the team can address the inevitable changes that occur over the life of a project. By contrast, traditional project management techniques have been criticized for requiring too much time upfront on planning and being too rigid to respond to changes in later stages of the project.12 Scrum also promotes communication over documentation,13 resulting in less administrative overhead as well as increased accountability and trust between team members. Scrum Pilot at University of Colorado Boulder Libraries The University of Colorado Boulder (CU-Boulder) Libraries digital initiatives team was interested in adopting Scrum because of its incremental approach to completing large projects, its focus on communication, and its flexibility. These attributes meshed well with the group’s goals to publish larger collections more quickly and to more effectively multitask the production of multiple high DIGITAL COLLECTIONS ARE A SPRINT, NOT A MARATHON | DULOCK AND LONG | doi: 10.6017/ital.v34i4.5869 8 priority collections. The group’s staffing model and approach to collection building prior to the Scrum pilot is described here to provide some context for this choice of project management tool. Digital collection proposals are vetted by a working group composed of ten members, the Digital Library Management Group (DLMG), to ensure that major considerations such as copyright status are fully investigated before undertaking the collection. Approved proposals are prioritized by the appropriate collection manager as high, medium, or low and then placed in a queue for scanning and metadata provisioning. A core group of individuals generally works on all digital collections, including the metadata librarian, the digital initiatives librarian, and one or both of the digitization lab managers. Additionally, the team frequently includes the subject specialist who nominated the collection for digitization, staff catalogers, and other library staff members whose expertise is required. At any given time, the queue may contain as many as fifteen collections, and the core team works on several of them concurrently to address the separate needs of participating departments. While this approach allows the teams to distribute resources more equitably across departments, progress on individual collections can be slower than if they are addressed one at a time. Prior to implementing aspects of Scrum, the team also completed the scanning and metadata records for every object in the collection before it was published. As a result, publication of larger collections trailed behind smaller collections. The details of digital collection production vary depending of the nature of the project, but the process usually follows the same broad outline. Unless the entire collection will be digitized, the collection manager chooses a selection of materials on the basis of criteria such as research value, rarity, curatorial considerations, copyright status, physical condition, feasibility for scanning, and availability of metadata. Photographs and paper-based materials are then evaluated by the preservation department to ensure that they are in suitable condition for scanning. Likewise, the media lab manager evaluates audio and video media for condition issues such as sticky shed syndrome, which will affect digitization.14 Depending on format, the material is then digitized by the digitization lab manager or the media lab manager and their student assistants according to locally established workflows that conform to nationally recognized best practices. Once digitized, student assistants apply post-processing procedures as appropriate and required, such as running OCR (optical character recognition) software to convert images to text or equalizing levels on an audio file. The lab managers then check the files for quality assurance and move the files to the appropriate location on the server. The metadata librarian creates a metadata template appropriate to the material being digitized by using industry standards such as Visual Resources Association Core (VRA Core), Metadata Object Description Schema (MODS), PBCore, and Dublin Core (DC). Metadata creation methods depend on the existence of legacy metadata for the analog materials and in what format legacy metadata is contained. The metadata librarian, along with his staff and/or student assistants, adapts legacy metadata into a format that can be ingested by the digital library software or creates records directly in the software when there is no legacy metadata. Metadata is formatted or created in accordance with existing input standards such as Cataloging Cultural Objects (CCO) and Resource Description and Access (RDA), and it is enhanced INFORMATION TECHNOLOGIES AND LIBRARIES |DECEMBER 2015 9 as much as possible using controlled vocabularies such as the Art and Architectural Thesaurus (AAT) and Library of Congress Subject Headings. The metadata librarian performs quality assurance on the metadata records during creation and before the collection is published. In the final stages, the collection is created in the digital library software, at which time search and display options are established: thumbnail labels, default collection sorting, faceted browsing fields, etc. Then the files and metadata are uploaded and published online. The highlight of the CU-Boulder Digital Library is the twenty-seven collections drawn from local holdings in Archives, Special Collections Department, Music Library, and Earth Sciences and Map Library, among others. The Library also contains purchased content and “LUNA Commons” collections created by institutions that use the same digital library platform, for a total of more than 185,000 images, texts, maps, audio recordings, and videos. The following four collections were created during the Scrum pilot and illustrate the types of materials available in the CU- Boulder Digital Library: The Colorado Coal Project consists of video and audio interviews, transcripts, and slides collected between 1974 and 1979 by the University of Colorado Coal Project. The project was funded by the Colorado Humanities Program and the National Endowment for the Humanities to create an ethnographic record of the history of coal mining in the western United States from immigration and daily life in the coal camps to labor conditions and strikes, including Ludlow (1913–14) and Columbine (1927). The Mining Maps Collection provides access to scanned maps of various mines, lodes, and claims in Colorado from the late 1800s to the early 1900s. These maps come from a variety of creators, including private publishers and US government agencies. DIGITAL COLLECTIONS ARE A SPRINT, NOT A MARATHON | DULOCK AND LONG | doi: 10.6017/ital.v34i4.5869 10 The Vasulka Media Archive showcases the work of pioneering video artists Steina and Woody Vasulka and contains some of their cutting-edge studies in video that experiment with form, content, and presentation. Steina, an Icelander, educated in music at the Prague Conservatory of Music, and Woody, a graduate of Prague's Film Academy, arrived in New York City just in time for the new media explosion. They brought with them their experience of the European media awakening, which helped them blend seamlessly into the youth media revolution of the late sixties and early seventies in the United States. The 3D Natural History Collection comprises one hundred archaeology and paleontology specimens from the Rocky Mountain and Southwest regions, including baskets, moccasins, animal figurines, game pieces, jewelry, tools, and other everyday objects from the Freemont, Clovis, and Ancestral Puebloan cultures as well as a selection of vertebrate, invertebrate, and track paleontology specimens from the Mesozoic through the Cenozoic Eras (250 Ma to the present). The diffusion of effort across multiple collections and a slower publication rate for larger collections offered opportunities for improvement. After attending a conference session on Scrum project management for web development projects, one of the team members recognized Scrum’s potential to improve production processes since the technique divides large projects into manageable subtasks that can be accomplished in regular, short intervals.15 This approach would allow the team to switch between different high priority collections at regularly defined intervals to facilitate steady progress on competing priorities. Working in sprints would also make it easier to publish smaller portions of a large collection at regular intervals. Thus Scrum held the potential to increase the production rate for larger collections and make the team’s progress more transparent to users and colleagues. In April 2013, a small team of CU-Boulder librarians and staff initiated a pilot to assess the effect on processes and outcomes for digital collection production. Rather than involving individuals from all affected units, regardless of their level of engagement in a particular project, the Scrum pilot was limited to the three individuals who were involved in most, if not all, of the projects INFORMATION TECHNOLOGIES AND LIBRARIES |DECEMBER 2015 11 undertaken: the digital initiatives librarian, metadata librarian, and digitization lab manager.16 By including these three individuals, the major functions of metadata provision, digitization, and publication were covered in the trial with no disruption to the existing workflows or organizational structures. Selecting this group also ensured that Scrum would be tested in a broad range of scenarios and on collections from several different departments. To begin, the team met to review the Scrum project management framework and considered how best to pilot the technique. Taking a pragmatic approach, they only adopted those aspects of Scrum that were deemed most likely to result in improved outcomes. If the pilot were successful, other aspects of Scrum could be incrementally incorporated later. The group discussed how Scrum roles, processes, and tools could be adapted to digital collection workflows and determined that sprints would likely have the highest return on investment. They also chose to adapt and hybridize certain aspects of the planning meeting and daily scrum to achieve goals that were not being met by other existing meetings. Sprint planning and end meetings were combined so that all three participants knew what each had completed and what was targeted for the next sprint. Select activities of sprint planning and end meetings were already a part of the monthly DLMG meetings, making additional sprint meetings redundant. Daily Scrum meetings were excluded as the team felt that daily meetings would not produce enough benefit to justify the costs. In addition, two of the three participants have numerous responsibilities that lie outside of projects subject to the Scrum pilot, so each person does not necessarily perform Scrum-related work every day. However, the short meeting time was adopted into the planning/end meeting, as were elements of the three core questions of the daily Scrum meeting, with some modifications. The questions addressed in the biweekly meetings are: What have you done since the last meeting? What are you planning for the next meeting? What impediments, if any, did you encounter during the sprint? The latter question was sometimes addressed mid-sprint through emails, phone calls, or one-off meetings that include a larger or different group of stakeholders. The team adopted the two-week duration typical of Scrum sprints for the pilot. This has proven to be a good medium-term timeframe. It was short enough that the team could adjust priorities quickly, but long enough to complete significant work. The team chose to combine the sprint planning and sprint review meetings into a single meeting. Part of the motivation for a trial of the Scrum technique was to minimize additional time away from projects while maximizing information transfer during the meetings. A single biweekly planning/review meeting was determined to be sufficient to report accomplishments and set goals yet substantial and free of irrelevant content without being overly burdensome as “yet another meeting.” At each sprint meeting, each participant reported on results from the previous sprint. Work that was completed allowed the next phase of a project to proceed. Based on the results of the last sprint, each team member set measurable goals that could be realistically met in the next two- week sprint. There has been a concerted effort to keep the meetings short, limited to about twenty DIGITAL COLLECTIONS ARE A SPRINT, NOT A MARATHON | DULOCK AND LONG | doi: 10.6017/ital.v34i4.5869 12 to twenty-five minutes. To enforce this habit, the sprint meetings were scheduled to begin twenty minutes before other regularly scheduled meetings for most or all of the participants. This helped keep participants on-topic and reinforced the transfer-of-information aspect of the meetings, with minimal leeway for extraneous topics. REFLECTION The modified Scrum methodology described above has been in place for more than a year. There have been several positive outcomes resulting from this practice. Beginning with the most practical, production has become more regular than it was before Scrum was implemented. The nature of digital initiatives in this environment dictates that many projects are in progress at once, in various stages of completion. The production work, such as digitizing media or creating metadata records, has become more consistent and regular. Instead of production peaks and valleys, there is more of a straight line as portions of projects are finished and others come online. This in turn has resulted in faster publication of collections. In 2013, the team published six new collections, twice as many as the previous year. The ability to put all hands on deck for a project for a two-week period can increase productivity. Since sprints allow for short, concentrated bursts of work on a single project, smaller projects can be completed in a few sprints and larger projects can be divided into “potentially shippable units” and thus published incrementally. Another benefit of Scrum is that the variability of the two-week sprint cycle allows the team to work on more collections concurrently. For example, during a given sprint, scanning is underway for one collection, a metadata template is being constructed for another, the analog material in a third is being examined for pre-scanning preservation assessment, and a fourth collection is being published. While this type of multitasking occurred before the team piloted sprints, the Scrum project management framework lends more structure and coordination to the various team members’ efforts. Collection building activities can be broken down into subtasks that are accomplished in nonconsecutive sprints without undercutting the team’s concerted efforts. As a result, the team can juggle competing priorities much more effectively. The team is working with multiple stakeholders at any given time, each of whom may have several projects planned or in progress. As focus shifts among stakeholders and their respective projects, the Scrum team is able to adjust quickly to align with those priorities, even if only for a single sprint. This also makes it easier to respond to emerging requests or address small, focused projects on the basis of events such as exhibits or course assignments. Additional benefits of the Scrum methodology pertain to communication and work style among the three Scrum participants. The frequent, short meetings are densely packed and highly focused. Each person has only a few minutes to describe what has been accomplished, explain problems encountered, troubleshoot solutions, and share plans for the next sprint. The return on the time investment of twenty minutes every two weeks is significant—there is no time to waste on issues that do not pertain directly to the projects underway, just completed, or about to start. A further result is that the group’s sense of itself as a team is enhanced. As stated above, the three Scrum INFORMATION TECHNOLOGIES AND LIBRARIES |DECEMBER 2015 13 participants do not all work in the same administrative unit within the library. Though they shared frequent communication by email as projects progressed, regular sprint meetings have fostered a closer sense of team. The participants know from sprint to sprint what the others are doing; they can assist one another with problems face-to-face and coordinate with one another so that work segments progress toward production in a logical sequence. With more than a year of experience with Scrum, the pilot team has determined that several aspects of the methodology have worked well in our environment. In general, the sprint pattern fits well with existing operating modes. The monthly DLMG meeting, which includes a large and diverse group, provides an opportunity to discuss priorities, review project proposals, establish standards, and make strategic decisions. The bi-weekly sprint meetings dovetail nicely, with one meeting taking place at a midpoint between DLMG meetings, and one just prior to DLMG meetings. This allows the three Scrum participants to focus on strategic items during the DLMG meeting but keep a close eye on operational items in between. The Scrum methodology has also accommodated the competing priorities that the three participants must balance on an ongoing basis. There is considerable variation between participants in terms of roles and responsibilities, but the division of work into sprints has given the team greater opportunity to fit production work in with other responsibilities, such as supervision and training; scholarly research and writing; service performed for disciplinary organizations; infrastructure building; and planning, research, and design work for future projects. The two-week sprint duration is a productive time interval during which the team can set and reach incremental goals, whether that is starting and finishing a small project on short notice, making a big push on a large-scale project, or continuing gradual progress on a large, deliberately- paced initiative. The brief meetings ensure that participants focus on the previous sprint and the upcoming sprint. There is usually just enough time to discuss accomplishments, goals, and obstacles, with some time left to troubleshoot as necessary. The meeting schedule and structure allows each individual to set his or her own goals so that he or she can make maximum progress during the sprint. This in turn feeds into accountability. There is always an external check on one’s progress—the next meeting comes up in two weeks, creating an effective deadline (which also sometimes corresponds to a project deadline). It becomes easier to stay on task and keep goals in sight with the sprint report looming in a matter of days. At the same time, Scrum helps to define each person’s role and clarifies how roles align with each other. Some tasks are completely independent, while others must be done in sequence and depend on another’s work. The sprint schedule allows large, complex projects to be divided into manageable pieces so that each sprint can result in a sense of accomplishment, even if it may require many sprint cycles to actually complete a project. This is especially true for large digital initiatives. For instance, completing the entire project may take a year, but subsets of a collection may be published in phases at more frequent intervals in the meantime. DIGITAL COLLECTIONS ARE A SPRINT, NOT A MARATHON | DULOCK AND LONG | doi: 10.6017/ital.v34i4.5869 14 Summary of Benefits ● Enhanced ability to manage multiple concurrent projects ● Published large collections incrementally, increasing responsiveness to users and other stakeholders ● Improved team building ● Increased communication and accountability among team members FUTURE CONSIDERATIONS Based on these outcomes, the team can safely say that it met its objectives for the test pilot. One of the reasons that it was feasible to try this when the participants were already highly committed is that the pilot used a small portion of the Scrum methodology and was not too rigid in its approach. The team felt that a hybrid of the Scrum planning and Scrum review meeting held twice a month would provide the benefits without overburdening schedules with additional meetings. There were also plans to have a virtual email check-in every other week to loosely achieve the goals of the daily Scrum meeting, that is, to improve communication and accountability. The email check-in fell by the wayside; the team found it wasn’t necessary because there were already adequate opportunities to check-in with each other over the course of a two-week sprint. The team has found the sprints and modified Scrum meetings to be highly useful and relatively easy to incorporate into their workflows. The next phase of the pilot will implement product backlogs and burn down charts, diagrams showing how much work remains for the team in a single sprint, with the goal of tracking collections’ progress at the item level through each step of the planning, selection, preservation assessment, digitization, metadata provisioning, and publication workflows. Figure 2. Hypothetical Backlog for the First Sprint of a Digital Collection17 INFORMATION TECHNOLOGIES AND LIBRARIES |DECEMBER 2015 15 Scrum backlogs are arranged on the basis of a task’s perceived benefit for customers. To adapt backlogs for digital collection production work, the backlog task list’s order will instead be based in part on the workflow sequence. For example, pieces from the physical collection must be selected before preservation staff can assess them. Additionally, the backlog items will be sequenced according to the materials’ research value or complexity. For instance, the digitization of a folder of significant correspondence from an archival collection would be assigned a higher priority in the backlog than the digitization of newspaper clippings of minor importance from the same collection. Or, materials that are easy to scan would be listed in the backlog ahead of fragile or complex items that require more time to complete. This will allow the team to publish the most valuable items from the collection more quickly. According to Scrum best practices, backlogs are also appropriately detailed. In the context of digital collection production work, collections’ backlogs would begin with a standard template of high-level activities: materials’ selection, copyright analysis, preservation assessment, digitization, metadata creation, and publication. As the team progresses through backlog items, they will become increasingly detailed. Backlogs also evolve. Scrum’s ability to respond to change has been one of its strongest assets in this environment and therefore the backlog’s ability to evolve will make it a valuable addition to the team’s process. For example, materials that a collection manager uncovers and adds to the project late in the process can be easily incorporated into the backlog or materials in the collection that are needed to support an upcoming instruction session can be moved up in the backlog for the next sprint. In this way, the backlog will support the team’s goal to nimbly respond to shifting priorities and emerging opportunities. Figure 3. Hypothetical Burn Down Chart18 DIGITAL COLLECTIONS ARE A SPRINT, NOT A MARATHON | DULOCK AND LONG | doi: 10.6017/ital.v34i4.5869 16 The final relevant feature of a backlog, the “effort estimates,” taken in conjunction with the burn down chart will help the team develop better metrics for estimating the time and resources required to complete a collection. When items are added to the backlog, team members estimate the amount of effort needed to complete it. The burn down chart illustrates how much work remains and, in general practice, is updated on a daily basis. Given that the team has truncated the Scrum meeting schedule, this may occur on a weekly basis, but will nonetheless benefit the team in several ways. Initially, it will keep the team on track and provide valuable and detailed information for stakeholders on the collections’ progress. As the team accrues old burn down charts from completed collections, they can use the data to hone their ability to estimate the amount of time and resources needed to complete a given project. CONCLUSION Through the pilot conducted for digital initiatives at CU-Boulder Libraries, application of aspects of the Scrum project management framework has demonstrated significant benefits with no discernable downside. Adoption of sprint planning and end meetings resulted in several positive outcomes for the participants. Digital collection production has become more regular; work can be underway on more collections simultaneously; and collections are, on average, published more quickly. In addition, communication and cooperation among the sprint pilot participants have increased and strengthened the sense of teamwork among them. The sprint schedule has blended well with existing digital initiatives meetings and workflows, and has enhanced the team’s ability to handle ever-shifting priorities. Additional aspects of Scrum, such as product backlogs and burn down charts, will be incorporated into the participants’ workflows to allow them to better track the work done at the item level, provide more detailed information for stakeholders during the course of a project, and predict how much time and effort will be required for future projects. The positive results of this pilot demonstrate the benefits to be gained by looking outside standard library practice and adopting techniques developed in another discipline. Given the range of activities performed in libraries, the possibilities to improve workflows and increase efficiency are limitless as long as those doing the work keep an open mind and a sharp eye out for methodologies that could ultimately benefit their work, and in turn, their users. REFERENCES 1. Sueli Mara Ferreira and Denise Nunes Pithan, “Usability of Digital Libraries,” OCLC Systems & Services: International Digital Library Perspectives 21, no. 4 (2005): 316, doi: 10.1108/10650750510631695. 2. Danielle A. Becker and Lauren Yannotta, “Modeling a Library Web Site Redesign Process: Developing a User-Centered Web Site Through Usability Testing,” Information Technology & Libraries 32, no. 1 (2013): 11, doi: 10.6017/ital.v32i1.2311. 3. Jodi Condit Fagan, “Usability Testing of a Large, Multidisciplinary Library Database: Basic Search and Visual Search,” Information Technology & Libraries 25 no. 3 (2006): 140–41, 10.6017/ital.v25i3.3345. http://dx.doi.org/10.1108/10650750510631695 http://dx.doi.org/10.6017/ital.v32i1.2311 http://dx.doi.org/10.6017/ital.v25i3.3345 INFORMATION TECHNOLOGIES AND LIBRARIES |DECEMBER 2015 17 4. Panayiotis Zaphiris, Kulvinder Gill, Terry H.-Y. Ma, Stephanie Wilson and Helen Petrie, “Exploring the Use of Information Visualization for Digital Libraries,” New Review of Information Networking 10, no. 1 (2004): 58, doi: 10.1080/1361457042000304136. 5. Benjamin Meunier and Olaf Eigenbrodt, “More Than Bricks and Mortar: Building a Community of Users through Library Design,” Journal of Library Administration 54 no. 3 (2014): 218–19, 10.1080/01930826.2014.915166. 6. Lisa A. Palmer and Barbara C. Ingrassia, “Utilizing the Power of Continuous Process Improvement in Technical Services,” Journal of Hospital Librarianship 5 no. 3 (2005): 94–95, 10.1300/J186v05n03_09. 7. Javier D. Fernández et al., “Agile DL: Building a DELOS-Conformed Digital Library Using Agile Software Development,” in Research and Advanced Technology for Digital Libraries, edited by Birte Christensen-Dalsgaard et al. (Berlin: Springer-Verlag, 2008), 398–9, doi: 10.1007/978-3- 540-87599-4_44. 8. Michelle Frisque, “Using Scrum to Streamline Web Applications Development and Improve Transparency” (paper presented at the 13th Annual LITA National Forum, Atlanta, Georgia, September 30–October 3, 2010). 9. Frank H. Cervone, “Understanding Agile Project Management Methods Using Scrum,” OCLC Systems & Services 27, no. 1 (2011): 19, 10.1108/10650751111106528. 10. Pete Deemer, Gabrielle Benefield, Craig Larman, and Bas Vodde, “The Scrum Primer: A Lightweight Guide to the Theory and Practice of Scrum," (2012), 3-15, www.infoq.com/minibooks/Scrum_Primer. 11. Eliza S. F. Cardozo et al., “SCRUM and Productivity in Software Projects: A Systematic Literature Review” (paper presented at the 14th International Conference on Evaluation and Assessment in Software Engineering (EASE), 2010), 3. 12. Cervone, “Understanding Agile Project Management,” 18. 13. Ibid., 19. 14. Sticky shed syndrome refers to the degradation of magnetic tape where the binder separates from the carrier. The binder can then stick to the playback equipment rendering the tape unplayable. 15. Frisque, “Using Scrum.” 16. The media lab manager responsible for audio and video digitization did not participate because his lab offers fee-based services to the public and thus has long-established business processes in place that would not have blended easily with sprints. 17. Figure 2 is based on illustration created by Mountain Goat Software, “Sprint Backlog,” https://www.mountaingoatsoftware.com/agile/scrum/sprint-backlog. 18. Figure 3 is adapted from template created by Expert Project Management, “Burn Down Chart Template,” www.expertprogrammanagement.com/wp- content/uploads/templates/burndown.xls. http://dx.doi.org/10.1080/1361457042000304136 http://dx.doi.org/10.1080/01930826.2014.915166 http://dx.doi.org/10.1300/J186v05n03_09 http://dx.doi.org/10.1007/978-3-540-87599-4_44 http://dx.doi.org/10.1007/978-3-540-87599-4_44 http://dx.doi.org/10.1108/10650751111106528 https://www.mountaingoatsoftware.com/agile/scrum/sprint-backlog http://www.expertprogrammanagement.com/wp-content/uploads/templates/burndown.xls http://www.expertprogrammanagement.com/wp-content/uploads/templates/burndown.xls 5888 ---- Microsoft Word - 5888-14722-8-CE.docx Exploratory  Subject  Searching  in     Library  Catalogs:  Reclaiming  the  Vision     Julia  Bauder  and     Emma  Lange     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  JUNE  2015             92   ABSTRACT   Librarians  have  had  innovative  ideas  for  ways  to  use  subject  and  classification  data  to  provide  an   improved  online  search  experience  for  decades,  yet  after  thirty-­‐plus  years  of  improvements  in  our   online  catalogs,  users  continue  to  struggle  with  narrowing  down  their  subject  searches  to  provide   manageable  lists  containing  only  relevant  results.  This  article  reports  on  one  attempt  to  rectify  that   situation  by  radically  reenvisioning  the  library  catalog  interface,  enabling  users  to  interact  with  and   explore  their  search  results  in  a  profoundly  different  way.  This  new  interface  gives  users  the  option  of   viewing  a  graphical  overview  of  their  results,  grouped  by  discipline  and  subject.  Results  are  depicted   as  a  two-­‐level  treemap,  which  gives  users  a  visual  representation  of  the  disciplinary  perspectives  (as   represented  by  the  main  classes  of  the  Library  of  congress  Classification)  and  topics  (as  represented   by  elements  of  the  Library  of  Congress  Subject  Headings)  included  in  the  results.   INTRODUCTION   Reading  library  literature  from  the  early  days  of  the  OPAC  era  is  simultaneously  inspiring  and   depressing.  The  enthusiasm  that  some  librarians  felt  in  those  days  about  the  new  possibilities  that   were  being  opened  by  online  catalogs  is  infectious.  Elaine  Svenonius  envisioned  a  catalog  that   could  interactively  guide  users  from  a  broad  single-­‐word  search  to  the  specific  topic  in  which  they   were  really  interested.1  Pauline  Cochrane  conceived  of  a  catalog  that  could  group  results  on   similar  aspects  of  a  given  subject,  showing  the  user  a  “systematic  outline”  of  what  was  available  on   the  subject  and  allowing  the  user  to  narrow  their  search  easily.2  Marcia  Bates  even  pondered   whether  “any  indexing/access  apparatus  that  does  not  stimulate,  intrigue,  and  give  pleasure  in  the   hunt  is  defective,”  since  “people  enjoy  exploring  knowledge,  particularly  if  they  can  pursue  mental   associations  in  the  same  way  they  do  in  their  minds.  .  .  .  Should  that  not  also  carry  over  into   enjoying  exploring  an  apparatus  that  reflects  knowledge,  that  suggests  paths  not  thought  of,  and   that  shows  relationships  between  topics  that  are  surprising?”3  However,  looking  back  thirty  years   later,  it  is  dispiriting  to  consider  how  many  of  these  visions  have  not  yet  been  realized.     The  following  article  reports  on  one  attempt  to  rectify  that  situation  by  radically  reenvisioning  the   library  catalog  interface,  enabling  users  to  interact  with  and  explore  their  search  results  in  a     profoundly  different  way.  The  idea  is  to  give  users  the  option  of  viewing  a  graphical  overview  of   their  results,  grouped  by  discipline  and  subject.  This  was  achieved  by  modifying  a  VuFind-­‐based     Julia  Bauder  (bauderj@grinnell.edu)  is  Social  Studies  and  Data  Services  Librarian,  and     Emma  Lange  (langemm@grinnell.edu)  is  an  undergraduate  student  and  former  library  intern,   Grinnell  College,  Grinnell,  Iowa.     EXPLORATORY  SUBJECT  SEARCHING  IN  LIBRARY  CATALOGS:  RECLAIMING  THE  VISION  |  BAUDER  AND  LANGE   doi:  10.6017/ital.v34i2.5888   93   discovery  layer  to  allow  users  to  choose  between  a  traditional,  list-­‐based  view  of  their  search   results  and  a  visualized  view.  In  the  visualized  view,  results  are  depicted  as  a  two-­‐level  treemap,   which  gives  users  a  visual  representation  of  the  disciplinary  perspectives  (as  represented  by  the   main  classes  of  the  Library  of  Congress  Classification  [LCC])  and  topics  (as  represented  by   elements  of  the  Library  of  Congress  Subject  Headings  [LCSH])  included  in  the  results.  An  example   of  this  visualized  view  can  be  seen  in  figure  1.   Figure  1.  Visualization  of  the  Results  for  a  Search  for  “Climate  Change.”   Subsequent  sections  of  this  paper  summarize  the  library-­‐science  and  computer-­‐science  literature   that  provides  the  theoretical  justification  this  project,  explain  how  the  visualizations  are  created,   and  report  on  the  results  of  usability  testing  of  the  visual  interface  with  faculty,  academic  staff,  and   undergraduate  students.     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  JUNE  2015     94   LITERATURE  REVIEW   Exploratory  Subject  Searching  in  Library  Catalogs   Since  Charles  Ammi  Cutter  published  his  Rules  for  a  Printed  Dictionary  Catalogue  in  1876,  most   library  catalogs  have  been  premised  on  the  idea  that  users  have  a  very  good  idea  of  what  they  are   looking  for  before  they  begin  to  interact  with  the  catalog.4  In  this  classic  view,  users  are  either   conducting  known-­‐item  searches—they  know  the  titles  or  the  author  of  the  books  they  want  to   find—or  they  know  the  exact  subject  on  which  they  are  interested  in  finding  books.  Yet  research   has  shown  that  known-­‐item  searches  are  only  about  half  of  catalog  searches,5  and  that  users  often   have  a  very  difficult  time  expressing  their  information  needs  with  enough  detail  to  construct  a   specific  subject  search.  Instead,  much  of  the  time,  users  approach  the  catalog  with  only  a  vaguely   formulated  information  need  and  an  even  vaguer  sense  of  what  words  to  type  into  the  catalog  to   get  the  resources  that  would  solve  their  information  need.6   Even  in  the  earliest  days  of  the  OPAC  era,  librarians  were  aware  of  this  problem.  Some  of  them,   including  Elaine  Svenonius  and  Pauline  Cochrane,  speculated  about  better  use  of  subject  and   classification  data  to  try  to  help  users  who  enter  too-­‐short,  overly  broad  searches  focus  their   results  on  the  information  that  they  truly  want.  One  of  Cochrane’s  many  ideas  on  this  topic  was  to   use  subject  and  classification  data  “to  present  a  systematic  outline  of  a  subject,”  which  would  let   users  see  all  of  the  different  aspects  of  that  subject,  as  reflected  in  the  library’s  classification   system  and  subject  headings,  and  the  various  locations  where  those  materials  could  be  found  in   the  library.7  Svenonius  suggested  using  library  classifications  to  help  narrow  users’  searches  to   appropriate  areas  of  the  catalog.  For  example,  she  suggests,  if  a  user  enters  “freedom”  as  a  search   term,  the  system  might  be  programmed  to  present  to  the  user  contexts  in  which  “freedom”  is  used   in  the  Dewey  Decimal  Classification,  such  as  “freedom  of  choice”  or  “freedom  of  the  press.”  Once   the  user  selects  a  one  of  these  phrases,  Svenonius  continued,  the  system  could  present  the  user   with  additional  contextual  information,  again  allow  the  user  to  specify  which  context  is  desired,   and  then  guide  the  user  to  the  exact  call  number  range  for  information  on  the  topic.  She  concluded,   “Thus  by  contextualizing  vague  words,  such  as  freedom,  within  perspective  hierarchies,  the   computer  might  guide  a  user  from  an  ineptly  or  imprecisely  articulated  search  request  to  one  that   is  quite  specific.”8   Ideas  such  as  these  had  little  impact  on  the  design  of  production  library  catalogs  until  the  late   1990s,  when  a  Dutch  company,  MediaLab  Solutions,  began  developing  AquaBrowser,  which   features  a  word  cloud  composed  of  synonyms  and  other  words  related  to  the  search  term  and   allows  users  to  refocus  their  search  by  clicking  on  these  words.9  AquaBrowser  became  available  in   the  United  States  in  the  mid-­‐2000s,  shortly  before  North  Carolina  State  University  launched  its   Endeca-­‐based  catalog  in  2006.10     While  AquaBrowser’s  word  cloud  is  certainly  visually  striking,  the  feature  that  these  and  most  of   the  subsequent  “next-­‐generation”  library  catalogs  implement  that  has  had  the  most  impact  on   search  behavior  is  faceting.  Facets,  while  not  as  sophisticated  as  the  systems  envisioned  by     EXPLORATORY  SUBJECT  SEARCHING  IN  LIBRARY  CATALOGS:  RECLAIMING  THE  VISION  |  BAUDER  AND  LANGE   doi:  10.6017/ital.v34i2.5888   95   Svenonius  and  Cochrane,  are  partial  solutions  to  the  problems  they  lay  out.  Facets  can  serve  to   give  users  a  high-­‐level  overview  of  what  is  available  on  a  topic,  based  on  classification,  format,   period,  or  other  factors.  They  can  also  help  guide  a  user  from  an  impossibly  broad  search  to  a   more  focused  one.  Various  studies  have  shown  that  faceted  interfaces  are  effective  at  helping   users  narrow  their  searches,  as  well  as  helping  them  discover  more  relevant  materials  than  they   did  when  performing  similar  tasks  on  nonfaceted  interfaces.11  However,  studies  have  also  shown   that  users  can  become  overwhelmed  by  the  number  and  variety  of  facets  available  and  the  number   of  options  shown  under  each  facet.12   Visual  Interfaces  to  Document  Corpora   When  librarians  were  pondering  how  to  create  a  better  online  library  catalog,  computer  scientists   were  investigating  the  broader  problem  of  helping  users  to  navigate  and  search  large  databases   and  collections  of  documents  effectively.  Visual  interfaces  have  been  one  of  the  methods  computer   scientists  have  investigated  for  providing  user-­‐friendly  navigation,  with  perhaps  the  most   prominent  early  advocate  for  visual  interfaces  being  Ben  Shneiderman.13  In  recent  years,   Shneiderman  and  other  researchers  have  built  and  tested  various  types  of  experimental  visual   interfaces  for  different  forms  of  information-­‐seeking.14  However,  with  a  few  exceptions,  most  of   these  visual  interfaces  have  remained  in  a  laboratory  rather  than  a  production  setting.15  With  the   exception  of  the  “date  slider,”  a  common  interface  feature  that  displays  a  bar  graph  showing  dates   related  to  the  search  results  and  allows  users  to  slide  handles  to  include  or  exclude  times  from   their  search  results,  few  current  document  search  systems  present  users  with  any  kind  of  visual   interface.   METHOD   The  Grinnell  College  Libraries  use  VuFind,  open-­‐source  software  originally  developed  at  Villanova   University  as  a  discovery  layer  to  use  over  a  traditional  ILS.  VuFind  in  turn  makes  use  of  Apache   Solr,  a  powerful  open-­‐source  indexing  and  search  platform,  and  SolrMarc,  code  developed  within   the  library  community  that  facilitates  indexing  MARC  records  into  Solr.  Using  SolrMarc,  MARC   fields  and  subfields  are  mapped  to  various  fields  in  the  Solr  index;  for  example,  the  contents  of   MARC  field  020,  subfield  a,  and  field  773,  subfield  z,  are  both  mapped  to  a  Solr  index  field  called   “isbn.”  More  than  fifty  Solr  fields  are  populated  in  our  index.  Our  visualization  system  was  built  on   top  of  VuFind’s  Solr  index  and  visualizes  data  taken  directly  from  the  index.     The  visualizations  are  created  in  Javascript  using  the  D3.js  visualization  library,  and  they  are   designed  to  implement  Shneiderman’s  Visual  Information  Seeking  Mantra:  “Overview  first,  zoom   and  filter,  then  details-­‐on-­‐demand.”16  The  goal  was  to  give  users  the  option  of  viewing  a  graphical   overview  of  their  results,  grouped  by  disciplinary  perspective  and  topic,  and  then  allow  them  to   zoom  in  on  the  results  from  specific  perspectives  or  on  specific  topics.  Once  they  have  used  the   interactive  visualization  to  narrow  their  search,  they  can  choose  to  see  a  traditional  list  of  results   with  full  bibliographic  details  about  the  items.  This  would,  ideally,  provide  a  version  of  the     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  JUNE  2015     96   systematic  outline  that  Cochrane  envisioned.  It  should  also  support  users  as  they  attempt  to   narrow  down  their  search  results  and  focus  on  a  specific  aspect  of  their  chosen  subject  without   overwhelming  them  with  long  lists  of  results  or  of  facets.   Currently,  we  are  visualizing  values  of  two  fields,  one  containing  the  first  letter  of  the  items’   Library  of  Congress  Classification  (LCC)  numbers  and  the  other  containing  elements  of  the  items’   Library  of  Congress  Subject  Headings  (LCSH).  This  data  is  visualized  as  a  two-­‐level  treemap.17   First,  large  boxes  are  drawn  representing  the  number  of  items  matching  the  search  within  each   letter  of  the  LCC.  Within  the  largest  of  these  boxes,  smaller  boxes  are  drawn  showing  the  most   common  elements  of  the  subject  headings  for  items  matching  that  search  within  that  LCC  main   class.  Less  common  subject  heading  elements  are  combined  into  an  additional  small  box,  labeled   “X  more  topics”;  clicking  on  that  box  zooms  in  so  that  users  only  see  results  from  one  LCC  main   class,  and  it  displays  all  of  the  LCSH  headings  applied  to  items  in  that  group.  Similarly,  users  can   click  on  any  of  the  smaller  LCC  boxes,  which  do  not  contain  LCSH  boxes  in  the  original   visualization,  to  zoom  in  on  that  LCC  main  class  and  see  the  LCSH  subject  headings  for  it.  Both  the   large  and  the  small  boxes  are  sized  to  represent  what  proportion  of  the  results  were  in  that  LCC   main  class  or  had  that  LCSH  subject  heading.     This  is  easier  to  explain  with  a  concrete  example.  Let’s  say  a  student  were  to  search  for  “climate   change”  and  click  on  the  option  to  visualize  the  results.  You  can  see  what  this  looks  like  in  figure  1.   Instead  of  seeing  a  list  of  nearly  two  thousand  books,  the  student  now  sees  a  visual  representation   of  the  disciplinary  perspectives  (as  represented  by  the  main  classes  of  the  LCC)  and  topics  (as   represented  by  elements  of  the  LCSH)  included  in  the  results.  Users  could  click  to  zoom  in  on  any   main  class  within  the  LCC  to  see  all  of  the  topics  covered  by  books  in  that  class,  as  in  figure  2,   where  the  student  has  zoomed  in  on  “S  –  Agriculture.”  Or  users  could  click  on  any  topic  facet  to  see   a  traditional  results  list  of  books  with  that  topic  facet  in  that  main  class.  At  any  zoom  level,  users   could  choose  to  return  to  the  traditional  results  list  by  clicking  on  the  “List  Results”  option.18   We  launched  this  feature  in  our  catalog  midway  through  the  spring  2014  semester.  Formal   usability  testing  was  completed  with  five  advanced  undergraduates,  three  staff,  and  two  faculty   members  in  the  summer  of  2014.  (See  appendix  A  for  the  outline  of  the  usability  test.)  One  first-­‐ year  student  completed  usability  testing  in  the  fall  2014  semester.  The  usability  study  asked   participants  to  complete  a  set  list  of  nine  specific,  predetermined  tasks.  Some  tasks  involved  the   use  of  now-­‐standard  catalog  features,  such  as  saving  results  to  a  list  and  emailing  results  to  oneself,   while  about  half  of  the  tasks  involved  navigation  of  the  visualization  tool,  which  was  entirely  new   to  the  participants.  Each  participant  received  the  same  tasks  and  testing  experience  regardless  of   their  status  as  a  student,  faculty,  or  staff,  and  each  academic  division  was  represented  among  the   participants.       EXPLORATORY  SUBJECT  SEARCHING  IN  LIBRARY  CATALOGS:  RECLAIMING  THE  VISION  |  BAUDER  AND  LANGE   doi:  10.6017/ital.v34i2.5888   97     Figure  2.  Visualization  of  the  Results  for  a  Search  for  “Climate  Change,”  Filtered  to  Show  Only   Results  with  Library  of  Congress  Classification  Numbers  Starting  with  S.   RESULTS   Usability  testing  revealed  no  major  obstacles  in  the  way  of  users’  ability  to  navigate  the   visualization  feature;  the  visualized  search  results  were  quickly  deciphered  by  the  participants   with  the  assistance  of  the  context  set  by  the  study’s  outlined  tasks.  Familiarity  with  library   catalogs  in  general,  and  the  Grinnell  College  Libraries  catalog  in  particular,  showed  no  marked   impact  on  users’  performance.  No  particular  user  group  performed  as  an  outlier  in  regards  to   users’  general  ability  to  complete  tasks  or  the  time  required  to  do  so.     The  most  common  issue  to  arise  during  the  session  concerned  the  visualization’s  truncated  text,   which  appears  in  the  far  left  column  of  results  when  the  descriptor  text  contains  too  many   characters  for  the  space  allocated.  (An  example  of  this  truncated  text  can  be  seen  in  figure  1.)  The     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  JUNE  2015     98   subject  boxes  appearing  in  the  furthest  left  column  contain  the  least  results,  and  therefore  receive   the  least  space  within  the  visualization.  This  limited  space  sometimes  results  in  truncated  text.   The  full-­‐text  can  be  viewed  by  hovering  over  the  truncated  text  box,  but  few  users  discovered  this   capability.  Another  common  concern  involved  a  participant’s  ability  to  switch  their  search  results   from  the  default  list  view  to  the  visualized  view.  All  participants  were  capable  of  selecting  the   “Visualize  These  Results”  button  required  to  produce  the  visualization,  but  a  handful  of   participants  expressed  that  they  feared  they  would  not  find  that  option  if  they  were  not  prompted   to  do  so.   Participants  remarked  that  the  visualization  initially  appeared  daunting  but  then  quickly  became   comfortable  navigating  the  results.  Most  participants,  including  staff,  stated  that  they  found  the   tool  useful  and  intended  to  use  it  in  the  future  during  the  course  of  their  typical  work  at  the  college.   CONCLUSION   Librarians  have  had  innovative  ideas  for  ways  to  use  subject  and  classification  data  to  provide  an   improved  online  search  experience  for  decades,  yet  after  thirty-­‐plus  years  of  improvements  in   online  catalogs,  users  continue  to  struggle  with  narrowing  down  their  searches  to  produce   manageable  lists  containing  only  relevant  results.19  Computer  scientists  have  been  advocating  for   interfaces  to  support  visual  information-­‐seeking  since  the  1980s.  Finally,  hardware  and  software   have  improved  to  the  point  where  many  of  these  ideas  can  be  implemented  feasibly,  even  by   relatively  small  libraries.  Now  is  the  time  to  put  some  of  them  into  production  and  see  how  well   they  work  for  library  users.  The  particular  visualizations  reported  in  this  article  may  or  may  not   be  the  best  possible  visualizations  of  bibliographic  data,  but  we  will  never  know  which  of  these   ideas  might  prove  to  be  the  revolution  that  library  discovery  interfaces  need  until  we  try  them.         EXPLORATORY  SUBJECT  SEARCHING  IN  LIBRARY  CATALOGS:  RECLAIMING  THE  VISION  |  BAUDER  AND  LANGE   doi:  10.6017/ital.v34i2.5888   99   Appendix  A.  Usability  Testing  Instrument   Introductory  Questions   Before  we  look  at  the  site,  I’d  like  to  ask  you  just  a  few  quick  questions.   —Have  you  searched  for  materials  using  the  Grinnell  College  libraries’  website  before?  If  so,  what   for  and  when?  (For  students  only:  Could  you  please  estimate  how  many  research  projects  you’ve   done  at  Grinnell  College  using  the  library  catalog?)   In  the  Grinnell  College  Libraries,  we’re  testing  out  a  new  tool  in  our  catalog  that  presents  search   results  in  a  different  way  than  you  are  used  to.  Now  I’m  going  to  read  you  a  short  explanation  of   why  we  created  this  tool  and  what  we  hope  the  tool  will  do  for  you  before  we  start  the  test.   Research  is  a  conversation:  a  scholar  reads  writings  by  other  scholars  in  the  field,  then  enters  into   dialogue  with  them  in  his  or  her  own  writing.  Most  of  the  time,  these  conversations  happen  within   the  boundaries  of  a  single  discipline,  such  as  chemistry,  sociology,  or  art  history,  even  when  many   disciplines  are  discussing  similar  topics.  But  when  you  do  a  search  in  a  library  catalog,  writings   that  are  part  of  many  different  conversations  are  all  jumbled  together  in  the  results.  It’s  like  being   thrown  into  one  big  room  where  all  of  these  scholars,  from  all  of  these  different  disciplines,  are   talking  over  each  other  all  at  once.  Our  new  visualization  tool  aims  to  help  you  sort  all  of  these   writings  into  the  separate  conversations  in  which  they  originated.     Scenarios   Now  I  am  going  to  ask  you  to  try  doing  some  specific  tasks  using  3Search.  You  should  read  the   instructions  aloud  for  all  tasks  individually  prior  to  beginning  each.  And  again,  as  much  as  possible,   it  will  help  us  if  you  can  try  to  think  out  loud  as  you  go  along.     Please  begin  by  reading  the  first  scenario  aloud  and  then  begin  the  first  scenario.  If  you  are  unsure   whether  you  finished  the  task  or  not,  please  ask  me.  I  can  confirm  if  the  task  has  been  completed.   Once  you  are  done  with  Scenario  1,  please  continue  onto  Scenario  2  by  reading  it  aloud  and  then   beginning  the  task.  Continue  this  process  until  all  scenarios  are  finished.  If  you  cannot  complete  a   task,  please  be  honest  and  try  to  explain  briefly  why  you  were  unsuccessful  and  continue  to  the   next.     1. Pretend  that  you  are  writing  a  paper  about  issues  related  to  privacy  and  the  Internet.  Do  a   search  in  3Search  with  the  words  “privacy  Internet.”   2. Please  select  the  first  WorldCat  result  and  attempt  to  determine  whether  you  have  access   to  the  full  text  of  this  book.  If  not,  please  indicate  where  you  could  request  the  full  text   through  the  InterLibrary  Loan  service.   3. Go  back  to  your  initial  search  results.  Please  choose  “Explore  these  results”  of  the  EBSCO   database  results.  Choose  an  article.  If  you  have  unlimited  texting,  have  the  article’s     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  JUNE  2015     100   information  texted  to  your  cell  phone.  Then,  add  the  article  to  a  new  list  for  future   reference  throughout  this  project.   4. Go  back  to  your  initial  search  results.  For  Grinnell  College’s  Collections  results,  click  on  the   “Explore  these  results”  link.  Then  click  on  the  “Visualize  Results”  link  to  visualize  the   results.  Which  disciplines  appear  to  have  the  greatest  interest  in  this  topic?   5. When  privacy  and  the  Internet  are  discussed  in  the  context  of  law,  what  are  some  of  the   topics  that  are  frequently  covered  in  these  discussions?   6. One  specific  topic  you  are  considering  is  the  legal  issues  around  libel  and  slander  on  the   Internet.  How  many  resources  do  the  libraries  have  on  that  specific  topic?   7. Click  on  “Q  –  Science,”  to  see  the  results  authored  by  theoretical  computer  scientists.  Based   on  these  results,  what  are  some  of  the  topics  that  are  frequently  covered  in  their   discussions  when  these  computer  scientists  discuss  privacy  and  the  Internet?   8. Pretend  that  you  are  writing  this  paper  for  a  computer  science  class  and  you  are  supposed   to  address  your  topic  from  a  computer  science  perspective.  Please  narrow  your  results  to   only  show  results  that  are  in  the  format  of  a  book.  Based  on  this  new  visualization,  what   might  be  some  good  topics  to  consider?   9. Add  one  of  these  books  to  the  list  you  created  in  step  3.  Please  email  all  of  the  items  on  this   list  to  yourself.   Debriefing   Thank  you.  That  is  it  for  the  computer  tasks.  I  have  a  few  quick  questions  for  you  now  that  you   have  gotten  a  chance  to  use  the  site.   1. What  do  you  think  about  3Search?  Is  it  something  that  you  would  use?  Why  or  why  not?   2. What  is  your  favorite  thing  about  3Search?   3. What  is  your  least  favorite  thing  about  3Search?   4. Did  you  find  the  visualization  function  useful?  Why  or  why  not?   5. Do  you  have  any  recommendations  for  changes  to  the  way  this  site  looks  or  works?         EXPLORATORY  SUBJECT  SEARCHING  IN  LIBRARY  CATALOGS:  RECLAIMING  THE  VISION  |  BAUDER  AND  LANGE   doi:  10.6017/ital.v34i2.5888   101   REFERENCES     1.     Elaine  Svenonius,  “Use  of  Classification  in  Online  Retrieval,”  Library  Resources  &  Technical   Services  27,  no.  1  (1983):  76–80,    http://alcts.ala.org/lrts/lrtsv25no1.pdf.     2.     Pauline  A.  Cochrane,  “Subject  Access—Free  or  Controlled?  The  Case  of  Papua  New  Guinea,”  in   Redesign  of  Catalogs  and  Indexes  for  Improved  Online  Subject  Access:  Selected  Papers  of  Pauline   A.  Cochrane  (Phoenix:  Oryx,  1985),  275.  Previously  published  in  Online  Public  Access  to  Library   Files:  Conference  Proceedings:  The  Proceedings  of  a  Conference  Held  at  the  University  of  Bath,  3– 5  September  1984  (Oxford:  Elsevier,  1985).   3.     Marcia  Bates,  “Subject  Access  in  Online  Catalogs:  A  Design  Model,”  Journal  of  the  American   Society  for  Information  Science  37,  no.  6  (1986):  363,  http://dx.doi.org/10.1002/(SICI)1097-­‐ 4571(198611)37:6<357::AID-­‐ASI1>3.0.CO;2-­‐H   4.     Charles  Ammi  Cutter,  Rules  for  a  Printed  Dictionary  Catalog  (Washington,  DC:  Government   Printing  Office,  1876).   5.     David  Ward,  Jim  Hahn,  and  Kirsten  Feist,  “Autocomplete  as  a  Research  Tool:  A  Study  on   Providing  Search  Suggestions,”  Information  Technology  &  Libraries  31,  no.  4  (2012):  6–19,   http://dx.doi.org/10.6017/ital.v31i4.1930;  Suzanne  Chapman  et  al.,  “Manually  Classifying   User  Search  Queries  on  an  Academic  Library  Web  Site,”  Journal  of  Web  Librarianship  7  (2013):   401–21,  http://dx.doi.org/10.1080/19322909.2013.842096.   6.     N.  J.  Belkin,  R.  N.  Oddy,  and  H.  M.  Brooks,  “ASK  for  Information  Retrieval:  Part  I.  Background   and  Theory,”  Journal  of  Documentation  (1982):  61–71,  http://dx.doi.org/10.1108/eb026722;   Christine  Borgman,  “Why  Are  Online  Catalogs  Still  Hard  to  Use?,”  Journal  of  the  American   Society  for  Information  Science  (1996):  493–503,  http://dx.doi.org/10.1002/(SICI)1097-­‐ 4571(199607)47:7<493::AID-­‐ASI3>3.0.CO;2-­‐P;  Karen  Markey,  “The  Online  Library  Catalog:   Paradise  Lost  and  Paradise  Regained?,”  D-­‐Lib  Magazine  13,  no.  1/2  (2007),   http://www.dlib.org/dlib/january07/markey/01markey.html.   7.     Cochrane,  “Subject  Access—Free  or  Controlled?,”  275.   8.     Svenonius,  “Use  of  Classification  in  Online  Retrieval,”  78–79.   9.     Jasper  Kaizer  and  Anthony  Hodge,  “AquaBrowser  Library:  Search,  Discover,  Refine,”  Library  Hi   Tech  News  (December  2005):  9–12,  http://dx.doi.org/10.1108/07419050510644329.   10.    Kristen  Antelman,  Emily  Lynema,  and  Andrew  Pace,  “Toward  a  Twenty-­‐First  Century  Library   Catalog,”  Information  Technology  &  Libraries  25,  no.  3  (2006):  128–39,   http://dx.doi.org/10.6017/ital.v25i3.3342.   11.    Tod  Olson,  “Utility  of  a  Faceted  Catalog  for  Scholarly  Research,”  Library  Hi  Tech  (2007):  550– 61,  http://dx.doi.org/10.1108/07378830710840509;  Jody  Condit  Fagan,  “Usability  Studies  of     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  JUNE  2015     102     Faceted  Browsing:  A  Literature  Review,”  Information  Technology  and  Libraries  29,  no.  2   (2010):  58-­‐66,  http://dx.doi.org/10.6017/ital.v29i2.3144.   12.    Kathleen  Bauer,  “Yale  University  Library  VuFind  Test—Undergraduates,”  November  11,  2008,   accessed  September  9,  2014,   http://www.library.yale.edu/usability/studies/summary_undergraduate.doc.   13.    See,  for  example,  Ben  Shneiderman,  “The  Future  of  Interactive  Systems  and  the  Emergence  of   Direct  Manipulation,”  Behaviour  &  Information  Technology  1  (1982):  237–56,   http://dx.doi.org/10.1080/01449298208914450;  Ben  Shneiderman,  “Dynamic  Queries  for   Visual  Information  Seeking,”  IEEE  Software  11  (1994):  70–77,   http://dx.doi.org/10.1109/52.329404.   14.    See,  for  example,  Aleks  Aris  et  al.,  “Visual  Overviews  for  Discovering  Key  Papers  and   Influences  Across  Research  Fronts,”  Journal  of  the  American  Society  for  Information  Science  &   Technology  60  (2009):  2219–28,  http://dx.doi.org/10.1002/asi.v60:11;  Furu  Wei  et  al.,   “TIARA:  A  Visual  Exploratory  Text  Analytic  System,”  in  Proceedings  of  the  16th  ACM  SIGKDD   International  Conference  on  Knowledge  Discovery  and  Data  Mining  (Washington,  DC:  ACM,   2010),  153–62,  http://dx.doi.org/10.1145/1835804.1835827;  Cody  Dunne,  Ben  Shneiderman,   Robert  Gove,  Judith  Klavans,  and  Bonnie  Dorr,  “Rapid  Understanding  of  Scientific  Paper   Collections:  Integrating  Statistics,  Text  Analysis,  and  Visualization,”  Journal  of  the  American   Society  for  Information  Science  &  Technology  63  (2012):  2351–69,   http://dx.doi.org/10.1002/asi.22652.   15.    The  most  notable  exception  is  Carrot2  (http://search.carrot2.org),  a  search  tool  that  will   automatically  cluster  web  search  results  and  display  visualizations  of  those  clusters.   16.    Ben  Shneiderman,  “The  Eyes  Have  It:  A  Task  by  Data  Type  Taxonomy  for  Information   Visualizations,”  September  1996,  accessed  April  27,  2014,   http://drum.lib.umd.edu/bitstream/1903/5784/1/TR_96-­‐66.pdf.   17.    Ben  Shneiderman,  “Treemaps  for  Space-­‐Constrained  Visualization  of  Hierarchies:  Including   the  History  of  Treemap  Research  at  the  University  of  Maryland,”  Institute  for  Systems   Research,  accessed  October  6,  2014,  http://www.cs.umd.edu/hcil/treemap-­‐history.   18.    To  explore  this  feature  in  our  catalog,  go  to  https://libweb.grinnell.edu/vufind/Search/Home,   do  a  search,  and  click  on  the  “Visualize  Results”  link  in  the  upper  right.   19.    A  recent  Project  Information  Literacy  report  found  that  the  two  aspects  of  research  that  first-­‐ year  students  found  most  difficult  were  “coming  up  with  keywords  to  narrow  down  searches”   and  “filtering  and  sorting  through  irrelevant  results  from  online  searches.”  Alison  J.  Head,   Learning  the  Ropes:  How  Freshmen  Conduct  Course  Research  Once  They  Enter  College  (Project   Information  Literacy,  December  5,  2013),   http://projectinfolit.org/images/pdfs/pil_2013_freshmenstudy_fullreport.pdf,  15.   5889 ---- Microsoft Word - September_ITAL_Park_proofed.docx Evaluation  of  Semi-­‐Automatic     Metadata  Generation  Tools:  A  Survey     of  the  Current  State  of  the  Art     Jung-­‐ran  Park    and     Andrew  Brenza     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  SEPTEMBER  2015             22   ABSTRACT   Assessment  of  the  current  landscape  of  semi-­‐automatic  metadata  generation  tools  is  particularly   important  considering  the  rapid  development  of  digital  repositories  and  the  recent  explosion  of  big   data.  Utilization  of  semi-­‐automatic  metadata  generation  is  critical  in  addressing  these   environmental  changes  and  may  be  unavoidable  in  the  future  considering  the  costly  and  complex   operation  of  manual  metadata  creation.  To  address  such  needs,  this  study  examines  the  range  of   semi-­‐automatic  metadata  generation  tools  (N  =  39)  while  providing  an  analysis  of  their  techniques,   features,  and  functions.  The  study  focuses  on  open-­‐source  tools  that  can  be  readily  utilized  in  libraries   and  other  memory  institutions.  The  challenges  and  current  barriers  to  implementation  of  these  tools   were  identified.  The  greatest  area  of  difficulty  lies  in  the  fact  that  the  piecemeal  development  of  most   semi-­‐automatic  generation  tools  only  addresses  part  of  the  issue  of  semi-­‐automatic  metadata   generation,  providing  solutions  to  one  or  a  few  metadata  elements  but  not  the  full  range  of  elements.   This  indicates  that  significant  local  efforts  will  be  required  to  integrate  the  various  tools  into  a   coherent  set  of  a  working  whole.  Suggestions  toward  such  efforts  are  presented  for  future   developments  that  may  assist  information  professionals  with  incorporation  of  semi-­‐automatic  tools   within  their  daily  workflows.     INTRODUCTION   With  the  rapid  increase  in  all  types  of  information  resources  managed  by  libraries  over  the  last   few  decades,  the  ability  of  the  cataloging  and  metadata  community  to  describe  those  resources  has   been  severely  strained.  Furthermore,  the  reality  of  stagnant  and  decreasing  library  budgets  has   prevented  the  library  community  from  addressing  this  issue  with  concomitant  staffing  increases.   Nevertheless,  the  ability  of  libraries  to  make  information  resources  accessible  to  their   communities  of  users  remains  a  central  concern.  Thus  there  is  a  critical  need  to  devise  efficient   and  cost  effective  ways  of  creating  bibliographic  records  so  that  users  are  able  to  find,  identify,   and  obtain  the  information  resources  they  need.     One  promising  approach  to  managing  the  ever-­‐increasing  amount  of  information  is  with  semi-­‐ automatic  metadata  generation  tools.  Semi-­‐automatic  metadata  generation  tools     Jung-­‐ran  Park  (jung-­‐ran.park@drexel.edu)  is  Editor,  Journal  of  Library  Metadata,  and  Associate   Professor,  College  of  Computing  and  Informatics,  Drexel  University,  Philadelphia.     Andrew  Brenza  (apb84@drexel.edu)  is  Project  Assistant,  College  of  Computing  and  Informatics,   Drexel  University,  Philadelphia.     EVALUATION  OF  SEMI-­‐AUTOMATIC  METADATA  GENERATION  TOOLS|  PARK  AND  BRENZA     doi:  10.6017/ital.v34i3.5889   23   concern  the  use  of  software  to  create  metadata  records  with  varying  degrees  of  supervision  from  a   human  specialist.1  In  its  ideal  form,  semi-­‐automatic  metadata  generation  tools  are  capable  of   extracting  information  from  structured  and  unstructured  information  resources  of  all  types  and   creating  quality  metadata  that  not  only  facilitate  bibliographic  record  creation  but  also  semantic   interoperability,  a  critical  factor  for  resource  sharing  and  discovery  in  the  networked  environment.   Through  the  use  of  semi-­‐automatic  metadata  generation  tools,  the  library  community  has  the   potential  to  address  many  issues  related  to  the  increase  of  information  resources,  the  strain  on   library  budget,  the  need  to  create  high-­‐quality,  interoperable  metadata  records,  and,  ultimately,   the  effective  provision  of  information  resources  to  users.   There  are  many  potential  benefits  to  semi-­‐automatic  metadata  generation.  The  first  is  scalability.   Because  of  the  quantity  of  information  resources  and  the  costly  and  time-­‐consuming  nature  of   manual  metadata  generation,2  it  is  increasingly  apparent  that  there  simply  are  not  enough   information  professionals  available  for  satisfying  the  metadata-­‐generation  needs  of  the  library   community.  Semi-­‐automatic  metadata  generation,  on  the  other  hand,  offers  the  promise  of  using   high  levels  of  computing  power  to  manage  large  amounts  of  information  resources.  In  addition  to   scalability,  semi-­‐automatic  metadata  generation  also  offers  potential  cost  savings  through  a   decrease  in  the  time  required  to  create  effective  records.  Furthermore,  the  time  savings  would   allow  information  professionals  to  focus  on  tasks  that  are  more  conceptually  demanding  and  thus   not  suitable  for  automatic  generation.  Finally,  because  computers  can  perform  repetitive  tasks   with  relative  consistency  when  compared  to  their  human  counterparts,  automatic  metadata   generation  promises  the  ability  to  create  more  consistent  records.  A  potential  increase  in   consistency  of  quality  metadata  records  would,  in  turn,  increase  the  potential  for  interoperability   and  thereby  the  accessibility  of  information  resources  in  general.  Thus  semi-­‐automatic  metadata   generation  offers  the  potential  to  not  only  ease  resource  description  demands  on  the  library   community  but  also  to  improve  resource  discovery  for  its  users.     GOALS  OF  THE  STUDY   Assessment  of  the  current  landscape  of  semi-­‐automatic  metadata  generation  tools  is  particularly   important  considering  the  fast  development  of  digital  repositories  and  the  recent  explosion  of   data  and  information.  Utilization  of  semi-­‐automatic  metadata  generation  is  critical  to  address  such   environmental  changes  and  may  be  unavoidable  in  the  future  considering  the  costly  and  complex   operation  of  manual  metadata  creation.  Even  though  there  are  promising  experimental  studies   that  exploit  various  methods  and  sources  for  semi-­‐automatic  metadata  generation,3  a  lack  of   studies  assessing  and  evaluating  the  range  of  tools  have  been  developed,  implemented,  or   improved.  To  address  such  needs,  this  study  aims  to  examine  the  current  landscape  of  semi-­‐ automatic  metadata  generation  tools  while  providing  an  evaluative  analysis  of  their  techniques,   features,  and  functions.  The  study  primarily  focuses  on  open-­‐source  tools  that  can  be  readily   utilized  in  libraries  and  other  memory  institutions.  The  study  also  highlights  some  of  the   challenges  still  facing  the  continued  development  of  semi-­‐automatic  tools  and  the  current  barriers     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  SEPTEMBER  2015       24   to  their  incorporation  into  the  daily  workflows  for  information  organization  and  management.   Future  directions  for  the  further  development  of  tools  are  also  discussed.     Toward  this  end,  a  critical  review  of  the  literature  in  relation  to  semi-­‐automatic  metadata   generation  tools  published  from  2004  to  2014  was  conducted.  Databases  such  as  Library  and   Information  Sciences  Abstracts  and  Library,  Information  Science  and  Technology  Abstracts  were   searched  and  germane  articles  identified  through  review  of  titles  and  abstracts.  Because  the   problem  of  creating  viable  tools  for  the  reliable  automatic  generation  of  metadata  is  a  not  a   problem  limited  to  the  library  and  information  science  professions,4  database  searches  were   expanded  to  include  those  databases  pertinent  to  the  computing  science,  including  Proquest   Computing,  Academic  Search  Premier,  and  Applied  Science  and  Technology.  Keywords,  such  as   “automatic  metadata  generation,”  “metadata  extraction,”  “metadata  tools,”  and  “text  mining,”   including  their  stems,  were  used  to  explore  the  databases.  In  addition  to  keyword  searching,   relevant  articles  were  also  identified  within  the  reference  sections  of  articles  already  deemed   pertinent  to  the  focus  of  the  survey  as  well  as  through  the  expansion  of  results  lists  through  the   application  of  relevant  subject  terms  applied  to  pertinent  articles.  To  ensure  that  the  latest,  most   reliable  developments  in  automatic  metadata  were  reviewed,  various  filters,  such  as  date  range   and  peer-­‐review,  were  employed.  Once  tools  were  identified,  their  capabilities  were  tested  (when   possible),  their  features  were  noted,  and  overarching  developments  were  determined.     The  remainder  of  the  article  provides  an  overview  of  the  primary  techniques  developed  for  the   semi-­‐automatic  generation  of  metadata  and  a  review  of  the  open-­‐source  metadata  generation   tools  that  employ  them.  The  challenges  and  current  barriers  to  semi-­‐automatic  metadata  tool   implementation  are  described  as  well  as  suggestions  for  future  developments  that  may  assist   information  professionals  with  integration  of  semi-­‐automatic  tools  within  the  daily  workflow  of   technical  services  departments.     Current  Techniques  for  the  Automatic  Generation  of  Metadata   As  opposed  to  manual  metadata  generation,  semi-­‐automatic  metadata  generation  relies  on   machine  methods  to  assist  with  or  to  complete  the  metadata-­‐creation  process.  Greenberg   distinguished  between  two  methods  of  automatic  metadata  generation:  metadata  extraction  and   metadata  harvesting.5  Metadata  extraction  in  general  employs  automatic  indexing  and   information  retrieval  techniques  to  generate  structured  metadata  using  the  original  content  of   resources.  On  the  other  hand,  metadata  harvesting  concerns  a  technique  to  automatically  gather   metadata  from  individual  repositories  in  which  metadata  has  been  produced  by  semi-­‐automatic  or   manual  approaches.  The  harvested  metadata  can  be  stored  in  a  central  repository  for  future   resource  retrieval.   Within  this  dichotomy  of  extraction  methods,  there  are  several  other  more  specific  techniques   that  researchers  have  developed  for  the  semi-­‐automatic  generation  of  metadata.  Polfreman  et  al.   identified  an  additional  six  techniques  that  have  been  developed  over  the  years:  meta-­‐tag   harvesting,  content  extraction,  automatic  indexing,  text  and  data  mining,  extrinsic  data  auto     EVALUATION  OF  SEMI-­‐AUTOMATIC  METADATA  GENERATION  TOOLS|  PARK  AND  BRENZA     doi:  10.6017/ital.v34i3.5889   25   generation,  and  social  tagging.6  Although  the  last  technique  is  not  properly  a  semi-­‐automatic   metadata  generation  technique  because  it  is  used  to  generate  metadata  with  a  minimum  of   intervention  required  by  metadata  professionals,  it  can  be  viewed  as  a  possible  mode  to   streamline  the  metadata  creation  process.     Both  Greenberg  and  Polfreman  provide  comprehensive,  high-­‐level  characterizations  of  the   techniques  employed  in  current  semi-­‐automatic  metadata  generation  tools.  However,  an   evaluation  of  these  techniques  within  the  context  of  a  broad  survey  of  the  tools  themselves  and  a   comprehensive  enumeration  of  currently  available  tools  are  not  addressed.  Thus,  although  these   techniques  will  be  examined  for  the  remainder  of  this  section,  they  serve  simply  as  a  framework   through  which  this  study  provides  a  current  and  comprehensive  analysis  of  the  tools  available  for   use  today.  Each  section  provides  an  overview  of  the  relevant  technique,  a  discussion  of  the  most   current  research  related  to  it,  and  the  tools  that  employ  that  technique.   The  tables  included  in  each  section  provide  lists  of  the  semi-­‐automatic  metadata  generation  tools   (N  =  39)  evaluated  in  the  course  of  this  survey.  The  information  presented  in  the  tables  is   designed  to  provide  a  characterization  of  each  tool:  its  name,  its  online  location,  the  technique(s)   used  to  generate  metadata,  and  a  brief  description  of  the  tool’s  functions  and  features.  Only  those   tools  that  are  currently  available  for  download  or  for  use  as  web  services  at  the  time  of  this   writing  are  included.  Furthermore,  the  listed  tools  have  not  been  strictly  limited  to  metadata-­‐ generation  applications  but  also  include  some  content  management  system  software  (CMSS)  as   these  generally  provide  some  form  of  semi-­‐automatic  metadata  extraction.  Typically,  CMSS  are   capable  of  extracting  technical  metadata  as  well  as  data  that  can  found  in  the  meta-­‐tags  of   information  resources,  such  as  the  file  name,  and  using  that  information  as  the  title  of  a  record.     Meta-­‐Tag  Extraction   Meta-­‐tag  extraction  is  a  computing  process  whereby  values  for  metadata  fields  are  identified  and   populated  through  an  examination  of  metadata  tags  within  or  attached  to  a  document.  In  other   words,  it  is  a  form  of  metadata  harvesting  and,  possibly,  conversion  of  that  metadata  into  other   formats.  MarcEdit,  the  most  widely  used  semi-­‐automatic  tool  for  the  generation  of  metadata  in  US   libraries,7  is  an  example  of  this  technique.  MarcEdit  essentially  harvests  metadata  from  Open   Archives  Initiative  Protocol  for  Metadata  Harvesting  (OAI-­‐PMH)  compliant  records  and  offers  the   user  the  opportunity  to  convert  those  records  to  a  variety  of  formats,  including  MAchine-­‐Readable   Cataloging  (MARC),  MAchine-­‐Readable  Cataloguing  in  XML  (MARC  XML),  Metadata  Object   Description  Schema  (MODS),  and  Encoded  Archival  Description  (EAD).  It  also  offers  the   capabilities  of  converting  records  from  any  of  the  supported  formats  to  any  of  the  other  supported   formats.   Other  examples  of  this  technique  are  the  web  services  Editor-­‐Converter  Dublin  Core  Metadata  and   Firefox  Dublin  Core  Viewer  Extension.  Both  of  these  programs  search  HTML  files  on  the  web  and   convert  information  found  in  HTML  meta-­‐tags  to  Dublin  Core  elements.  In  the  cases  of  MarcEdit     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  SEPTEMBER  2015       26   and  Editor-­‐Converter  Dublin  Core,  users  are  presented  with  the  converted  information  in  an   interface  that  allows  the  user  to  edit  or  refine  the  data.     Figure  1  provides  an  illustration  of  the  extracted  metadata  of  the  New  York  Times  homepage  using   Editor-­‐Converter  Dublin  Core,  while  figure  2  offers  an  illustration  of  the  editor  that  this  web   service  provides.         Figure  1.  Screenshot  of  Extracted  Dublin  Core  Metadata  Using  Editor-­‐Converter  Dublin  Core.     Figure  2.  Screenshot  of  Editor-­‐Converter  Dublin  Core  Editing  Tool  (only  eight  of  the  sixteen  fields   are  visible  in  this  screenshot).     EVALUATION  OF  SEMI-­‐AUTOMATIC  METADATA  GENERATION  TOOLS|  PARK  AND  BRENZA     doi:  10.6017/ital.v34i3.5889   27   Perhaps  the  biggest  weakness  to  this  type  of  tool  is  that  it  entirely  depends  on  the  quality  of  the   metadata  from  which  the  programs  harvest.  This  can  be  most  readily  seen  in  the  above  figure  by   the  lack  of  values  for  a  number  of  the  Dublin  Core  fields  for  the  The  New  York  Times  website.   Programs  that  solely  employ  the  technique  of  meta-­‐tag  harvesting  are  unable  to  infer  values  for   metadata  elements  that  are  not  already  populated  in  the  source.     Table  1  lists  the  tools  that  support  meta-­‐tag  harvesting  either  as  the  sole  technique  or  as  one  of  a   suite  of  techniques  used  to  generate  metadata  from  resources.  Of  the  thirty-­‐nine  tools  evaluated   for  this  study,  nineteen  support  meta-­‐tag  harvesting.   Tool  Name   Location   Techniques   Functions/Features   ANVL/ERC   Kernel  Metadata   Conversion   Toolkit   http://search.cpan.org/~jak/File-­‐ ANVL/anvl     meta-­‐tag  harvester   A  utility  that  can  automatically   convert  records  in  the  ANVL   format  into  other  formats  such  as   XML,  JSON  (JavaScript  Object   Notation),  Turtle  or  Plain,  among   others.   Apache  POI  –   Text  Extractor   http://poi.apache.org/download.html     content  extractor;   meta-­‐tag  harvester;   extrinsic  auto-­‐ generator   Apache  POI  provides  basic  text   extraction  for  all  project   supported  file  formats.  In   addition  to  the  (plain)  text,   Apache  POI  can  access  the   metadata  associated  with  a  given   file,  such  as  title  and  author.     Apache  Tika   http://tika.apache.org/     content  extractor;   meta-­‐tag  harvester;   extrinsic  auto-­‐ generator   Built  on  Apache  POI,  the  Apache   Tika  toolkit  detects  and  extracts   metadata  and  text  content  from   various  documents.   Ariadne   Harvester   http://sourceforge.net/projects/ariadn ekps/files/?source=navbar     meta-­‐tag  harvester   A  harvester  of  OAI-­‐PMH   compliant  records  which  can  be   converted  to  various  other   schema  such  as  Learning  Object   Metadata  (LOM).       BIBFRAME  Tools   http://www.loc.gov/bibframe/implem entation/     meta-­‐tag  harvester   BIBFRAME  offers  a  number  of   tools  for  the  conversion  of   MARCXML  documents  to   BIBFRAME  documents.    Web   service  and  downloadable   software  are  both  available.   Data  Fountains   http://datafountains.ucr.edu/     content  extractor;   automatic  indexer;   meta-­‐tag  harvester;   extrinsic  auto-­‐ generator   Scans  HTML  documents  and  first   extracts  information  contained  in   meta-­‐tags.    If  information  is   unavailable  in  meta-­‐tags,  the   program  will  use  other   techniques  to  assign  values.     Includes  a  focused  web  crawler   that  can  target  websites   concerning  a  specific  subject.           INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  SEPTEMBER  2015       28   Dublin  Core  Meta   Toolkit   http://sourceforge.net/projects/dcmet atoolkit/files/?source=navbar     meta-­‐tag  harvester   Transforms  data  collected  via   different  methods  into  Dublin   Core  (DC)  compatible  metadata.   Dspace   http://www.dspace.org/     meta-­‐tag  harvester;   extrinsic  auto-­‐ generator;  social   tagging   Automatically  extracts  technical   information  regarding  file  format   and  size.    Can  also  extract  some   information  from  meta-­‐tags.   Editor-­‐Converter   Dublin  Core   Metadata   http://www.library.kr.ua/dc/dcedituni e.html   meta-­‐tag  harvester;   extrinsic  auto-­‐ generator   Scans  HTML  documents,   harvesting  metadata  from  tags   and  converting  them  to  DC.   Embedded   Metadata   Extraction  Tool   (EMET)   http://www.artstor.org/global/g-­‐ html/download-­‐emet-­‐public.html     content  extractor;   EMET  is  a  tool  designed  to   extract  metadata  embedded  in   JPEG  and  TIFF  files.   meta-­‐tag  harvester;   extrinsic  auto-­‐ generator   Firefox  Dublin   Core  Viewer   Extension   http://www.splintered.co.uk/experime nts/73/     meta-­‐tag  harvester;   extrinsic  auto-­‐ generator   Scans  HTML  documents,   harvesting  metadata  from  tags   and  displaying  them  in  Dublin   Core.   MarcEdit   http://marcedit.reeset.net/   meta-­‐tag  harvester   Harvests  OAI-­‐PMH  compliant   data  and  converts  it  to  various   formats  including  DC  and  MARC.   Metatag   Extractor   Software   http://meta-­‐tag-­‐ extractor.software.informer.com/     meta-­‐tag  harvester   Permits  customizable  extraction   features,  harvesting  meta-­‐tags  as   well  as  contact  information  from   websites.   My  Meta  Maker   http://old.isn-­‐ oldenburg.de/services/mmm/     meta-­‐tag  harvester   Can  convert  manually  entered   data  into  DC.   Photo  RDF-­‐Gen   http://www.webposible.com/utilidade s/photo_rdf_generator_en.html   meta-­‐tag  harvester   Generates  Dublin  Core  and   Resource  Description  Framework   (RDF)  output  from  manually   entered  input.   PyMarc   https://github.com/edsu/pymarc     meta-­‐tag  harvester   Scripting  tool  in  Python  language   for  the  batch  processing  of  MARC   records,  similar  to  MarcEdit.       RepoMMan   http://www.hull.ac.uk/esig/repomman /index.html   meta-­‐tag  harvester;   content  extractor;   extrinsic  auto-­‐ generator   Automatically  extracts  various   elements  for  documents   uploaded  to  Fedora  such  as   author,  title,  description,  and  key   words,  among  others.    Results  are   presented  to  user  for  review.   SHERPA/RoMEO   http://www.sherpa.ac.uk/romeo/api.h tml     meta-­‐tag  harvester   A  machine-­‐to-­‐machine   Application  Program  Interface   (API)  that  permits  the  automatic   look-­‐up  and  importation  of   publishers  and  journals.   URL  and  Metatag   Extractor   http://www.metatagextractor.com/     meta-­‐tag  harvester   Permits  the  targeted  searching  of   websites  and  extracts  URLs  and   meta-­‐tags  from  those  sites.   Table  1.  Semi-­‐Automatic  Tools  that  Support  Meta-­‐Tag  Harvesting.     EVALUATION  OF  SEMI-­‐AUTOMATIC  METADATA  GENERATION  TOOLS|  PARK  AND  BRENZA     doi:  10.6017/ital.v34i3.5889   29   Content  Extraction     Content  extraction  is  a  form  of  metadata  extraction  whereby  various  computing  techniques  are   used  to  extract  information  from  the  information  resource  itself.  In  other  words,  these  techniques   do  not  rely  on  the  identification  of  relevant  meta-­‐tags  for  the  population  of  metadata  values.  An   example  of  this  technique  is  the  Kea  application,  a  program  developed  at  the  New  Zealand  Digital   Library  that  uses  machine  learning,  term  frequency-­‐inverse  document  frequency  (TF.IDF)  and   first-­‐occurrence  techniques  to  identify  and  assign  key  phrases  from  the  full  text  of  documents.8   The  major  advantage  of  this  type  of  technique  is  that  the  extraction  of  metadata  can  be  done   independently  of  the  quality  of  metadata  associated  with  any  given  information  resource.  Another   example  of  a  tool  utilizing  this  technique  is  the  Open  Text  Summarizer,  an  open-­‐source  program   that  offers  the  capability  of  reading  a  text  and  extracting  important  sentences  to  create  a  summary   as  well  as  to  assign  keywords.  Figure  3  provides  a  screenshot  of  what  a  summarized  text  might   look  like  using  the  Open  Text  Summarizer.           Figure  3.  Open  Text  Summarizer:  Sample  Summary  of  Text.   Another  form  of  this  technique  often  relies  on  the  predictable  structure  of  certain  types  of   documents  to  identify  candidate  values  for  metadata  elements.  For  instance,  because  of  the   reliable  format  of  scholarly  research  papers—which  generally  include  a  title,  author,  abstract,   introduction,  conclusion,  and  reference  sections  in  predictable  ways—this  format  can  be  exploited   by  machines  to  extract  metadata  values  from  them.  Several  projects  have  been  able  to  exploit  this   technique  in  combination  with  machine  learning  algorithms  to  extract  various  forms  of  metadata.     For  instance,  in  the  Randkte  project,  optical  character  recognition  software  was  used  to  scan  a   large  quantity  of  legal  documents  from  which,  because  of  the  regularity  of  the  documents’     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  SEPTEMBER  2015       30   structure,  structural  metadata  such  as  chapter,  section,  and  page  number  could  be  extracted.9  In   contrast,  the  Kovacevic’s  project  used  the  predictable  structure  of  scholarly  articles,  converting   documents  from  PDF  to  HTML  files  while  preserving  the  formatting  details  and  used  classification   algorithms  to  extract  metadata  regarding  title,  author,  abstract,  and  keywords,  among  other   elements.10   Table  2  lists  the  tools  that  support  content  extraction  either  as  the  sole  technique  or  as  one  of  a   suite  of  techniques  used  to  generate  metadata  from  resources.  Of  the  thirty-­‐nine  tools  evaluated   for  this  study,  twenty  tools  support  some  form  of  content  extraction.   Tool  Name   Location   Techniques   Functions/Features   Apache  POI— Text  Extractor   http://poi.apache.org/download.html     content  extractor;   meta-­‐tag  harvester;   extrinsic  auto-­‐ generator   Apache  POI  provides  basic  text   extraction  for  all  project   supported  file  formats.  In   addition  to  the  (plain)  text,   Apache  POI  can  access  the   metadata  associated  with  a  given   file,  such  as  title  and  author.     Apache   Standol   https://stanbol.apache.org/     content  extractor;   automatic  indexer   Extracts  semantic  metadata  from   PDF  and  text  files.  Can  apply   extracted  terms  to  ontologies.   Apache  Tika   http://tika.apache.org/     content  extractor;   meta-­‐tag  harvester;   extrinsic  auto-­‐ generator   Built  on  Apache  POI,  the  Apache   Tika  toolkit  detects  and  extracts   metadata  and  text  content  from   various  documents.   Biblio  Citation   Parser   http://search.cpan.org/~mjewell/   Biblio-­‐Citation-­‐Parser-­‐1.10/     content  extractor   A  set  of  modules  for  citation   parsing.   CatMDEdit   http://catmdedit.sourceforge.net/     content  extractor   CatMDEdit  allows  the  automatic   creation  of  metadata  for   collections  of  related  resources,   in  particular  spatial  series  that   arise  as  a  result  of  the   fragmentation  of  geometric   resources  into  datasets  of   manageable  size  and  similar   scale.   CrossRef   http://www.crossref.org/   SimpleTextQuery/     content  extractor   This  web  service  returns  Digital   Object  Identifiers  for  inputted   references.     Data   Fountains   http://datafountains.ucr.edu/     content  extractor;   automatic  indexer;   meta-­‐tag  harvester;   extrinsic  auto-­‐ generator   Scans  HTML  documents  and  first   extracts  information  contained  in   meta-­‐tags.  If  information  is   unavailable  in  meta-­‐tags,  the   program  will  use  other   techniques  to  assign  values.   Includes  a  focused  web  crawler   that  can  target  websites   concerning  a  specific  subject.       EVALUATION  OF  SEMI-­‐AUTOMATIC  METADATA  GENERATION  TOOLS|  PARK  AND  BRENZA     doi:  10.6017/ital.v34i3.5889   31   Embedded   Metadata   Extraction   Tool  (EMET)   http://www.artstor.org/global/g   -­‐html/download-­‐emet-­‐public.html     content  extractor;   meta-­‐tag  harvester;   extrinsic  auto-­‐ generator   EMET  is  a  tool  designed  to  extract   metadata  embedded  in  JPEG  and   TIFF  files.   FreeCite   http://freecite.library.brown.edu/     content  extractor   Free  parsing  tool  for  the   extraction  of  reference   information.  Can  be  downloaded   or  used  as  a  web  service.     General   Architecture   for  Text   Engineering   (GATE)   http://gate.ac.uk/overview.html     content  extractor;   automatic  indexer;   Natural  language  processor  and   information  extractor.   Kea     http://www.nzdl.org/Kea/index_old   .html#download   content  extractor;   automatic  indexer   Analyzes  the  full  texts  of   resources  and  extracts   keyphrases.  Keyphrases  can  also   be  mapped  to  customized   ontologies  or  controlled   vocabularies  for  subject  term   assignment.   MetaGen   http://www.codeproject.com/Articles /41910/MetaGen-­‐A-­‐project   -­‐metadata-­‐generator-­‐for-­‐Visual-­‐St     content  extractor;   automatic  indexer   Used  to  build  a  metadata   generator  for  Silverlight  and   Desktop  CLR  projects,  MetaGen   can  be  used  as  a  replacement  for   static  reflection  (expression   trees),  reflection  (walking  the   stack),  and  various  other  means   for  deriving  the  name  of  a   property,  method,  or  field.     MetaGenerator   http://extensions.joomla.org/   extensions/site-­‐management/seo-­‐a   -­‐metadata/meta-­‐data/11038   content  extractor   A  plugin  that  automatically   generates  description  and   keyword  meta-­‐tags  by  pulling   text  from  joomla  content.  With   this  plugin  you  can  also  control   some  title  options  and  add  URL   meta-­‐tags.     Ont-­‐O-­‐Mat   http://projects.semwebcentral.org/   projects/ontomat/     content  extractor   Assists  user  with  annotation  of   websites  that  are  Semantic  Web-­‐ compliant.  May  now  include  a   feature  that  automatically   suggests  portions  of  the  website   to  annotate.   Open  Text   Summarizer   http://libots.sourceforge.net/   content  extractor   Extracts  pertinent  sentences  from   a  resource  to  build  a  free  text   description.     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  SEPTEMBER  2015       32   ParsCit   http://wing.comp.nus.edu.sg/parsCit/ #ws     content  extractor   Open-­‐source  string-­‐parsing   package  for  the  extraction  of   reference  information  from   scholarly  articles.   RepoMMan   http://www.hull.ac.uk/esig/   repomman/index.html   meta-­‐tag  harvester;   content  extractor;   extrinsic  auto-­‐ generator   Automatically  extracts  various   elements  for  documents   uploaded  to  Fedora  such  as   author,  title,  description,  and  key   words,  among  others.  Results  are   presented  to  user  for  review.   Simple   Automatic   Metadata   Generation   Interface   (SamgI)   http://hmdb.cs.kuleuven.be/amg/   Download.php   content  extractor;   extrinsic  auto-­‐ generator   A  suite  of  tools  that  is  able  to   automatically  extract  metadata   elements  such  as  key  phrase  and   language  from  documents  as  well   as  from  the  context  in  which  a   document  exists.     Termine   http://www.nactem.ac.uk/software/   termine/     content  extractor   Extracts  keywords  from  texts   through  C-­‐value  analysis  and   Acromine,  an  acronym  identifier   and  dictionary.  Available  as  free   web  service  for  academic  use.   Yahoo  Content   Analysis  API   https://developer.yahoo.com/   contentanalysis/     content  extractor;   automatic  indexer   The  Content  Analysis  Web   Service  detects  entities/concepts,   categories,  and  relationships   within  unstructured  content.  It   ranks  those  detected   entities/concepts  by  their  overall   relevance,  resolves  those  if   possible  into  Wikipedia  pages,   and  annotates  tags  with  relevant   metadata.   Table  2.  Semi-­‐automatic  Tools  that  Support  Content  Extraction   Automatic  Indexing   In  the  same  way  as  content  extraction,  automatic  indexing  involves  the  use  of  machine  learning   and  rule-­‐based  algorithms  to  extract  metadata  values  from  within  information  resources   themselves,  rather  than  relying  on  the  content  of  meta-­‐tags  applied  to  resources.  However,  this   technique  also  involves  the  mapping  of  extracted  metadata  terms  to  controlled  vocabularies  such   as  the  Library  of  Congress  Subject  Headings  (LCSH),  the  Getty  Thesaurus  of  Geographic  Names   (TGN),  or  the  Library  of  Congress  Name  Authority  File  (LCNAF),  or  to  domain-­‐specific  or  locally   developed  ontologies.  Thus,  in  this  technique,  researchers  use  classifying  and  clustering   algorithms  to  extract  relevant  metadata  from  texts.  Term-­‐frequency  statistics  or  IF.IDF,  which   determines  likelihood  of  keyword  applicability  through  its  relative  frequency  within  a  given     EVALUATION  OF  SEMI-­‐AUTOMATIC  METADATA  GENERATION  TOOLS|  PARK  AND  BRENZA     doi:  10.6017/ital.v34i3.5889   33   document  as  opposed  to  its  relative  infrequency  in  related  documents,  are  commonly  used  in  this   technique.     Projects  such  as  John  Hopkins  University’s  Automatic  Name  Authority  Control  (ANAC)  tool  utilizes   this  technique  to  extract  the  names  of  composers  within  its  sheet  music  collections  and  to  assign   the  authorized  form  of  those  names  based  on  comparisons  with  LCNAF.11  Erbs  et  al.  also  use  this   technique  to  extract  key  phrases  from  German  educational  documents  which  are  then  used  to   assign  index  terms,  thereby  increasing  the  degree  to  which  related  documents  are  collocated   within  the  repository  and  the  consistency  of  subject  term  application.12   Table  3  lists  the  tools  that  support  automatic  indexing  either  as  the  sole  technique  or  as  one  of  a   suite  of  techniques  used  to  generate  metadata  from  resources.  Of  the  thirty-­‐nine  tools  evaluated   for  this  study,  seven  tools  support  some  form  of  automatic  indexing.   Tool  Name   Location   Techniques   Functions/Features   Apache  POI— Text  Extractor   http://poi.apache.org/download.html     content  extractor;   meta-­‐tag  harvester;   extrinsic  auto-­‐ generator   Apache  POI  provides  basic  text   extraction  for  all  project   supported  file  formats.  In  addition   to  the  (plain)  text,  Apache  POI  can   access  the  metadata  associated   with  a  given  file,  such  as  title  and   author.     Apache  Tika   http://tika.apache.org/     content  extractor;   meta-­‐tag  harvester;   extrinsic  auto-­‐ generator   Built  on  Apache  POI,  the  Apache   Tika  toolkit  detects  and  extracts   metadata  and  text  content  from   various  documents.   Data   Fountains   http://datafountains.ucr.edu/     content  extractor;   automatic  indexer;   meta-­‐tag  harvester;   extrinsic  auto-­‐ generator   Scans  HTML  documents  and  first   extracts  information  contained  in   meta-­‐tags.  If  information  is   unavailable  in  meta-­‐tags,  the   program  will  use  other  techniques   to  assign  values.  Includes  a   focused  web  crawler  that  can   target  websites  concerning  a   specific  subject.     Digital  Record   Object   Identification   (DROID)   http://www.nationalarchives.gov.uk/   information-­‐management/manage   -­‐information/preserving-­‐digital   -­‐records/droid/     extrinsic  auto-­‐ generator   DROID  is  a  software  tool   developed  by  the  National   Archives  to  perform  automated   batch  identification  of  file  formats.   Dspace   http://www.dspace.org/     meta-­‐tag  harvester;   extrinsic  auto-­‐ generator   Automatically  extracts  technical   information  regarding  file  format   and  size.  Can  also  extract  some   information  from  meta-­‐tags.   Editor-­‐ Converter   Dublin  Core   Metadata   http://www.library.kr.ua/dc/   dceditunie.html   meta-­‐tag  harvester;   extrinsic  auto-­‐ generator   Scans  HTML  documents,   harvesting  metadata  from  tags   and  converting  them  to  Dublin   Core.     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  SEPTEMBER  2015       34   Embedded   Metadata   Extraction   Tool  (EMET)   http://www.artstor.org/global/g   -­‐html/download-­‐emet-­‐public.html     content  extractor;   meta-­‐tag  harvester;   extrinsic  auto-­‐ generator   EMET  is  a  tool  designed  to  extract   metadata  embedded  in  JPEG  and   TIFF  files.   Firefox  Dublin   Core  Viewer   Extension   http://www.splintered.co.uk/   experiments/73/     meta-­‐tag  harvester;   extrinsic  auto-­‐ generator   Scans  HTML  documents,   harvesting  metadata  from  tags   and  displaying  them  to  Dublin   Core.   JHove   http://jhove.sourceforge.net/   #implementation     extrinsic  auto-­‐ generator   Extracts  metadata  regarding  file   format  and  size  as  well  as   validating  the  structure  of  the   identified  file  format.   National   Library  of   New   Zealand— Metadata   Extraction   Tool   http://meta-­‐extractor   .sourceforge.net/     extrinsic  auto-­‐ generator   Developed  by  the  National  Library   of  New  Zealand  to   programmatically  extract   preservation  metadata  from  a   range  of  file  formats  like  PDF   documents,  image  files,  sound   files,  Microsoft  Office  documents,   and  others.   Omeka   http://omeka.org/     extrinsic  auto-­‐ generator;  social   tagging   Automatically  extracts  technical   information  regarding  file  format   and  size.     RepoMMan   http://www.hull.ac.uk/esig/   repomman/index.html   meta-­‐tag  harvester;   content  extractor;   extrinsic  auto-­‐ generator   Automatically  extracts  various   elements  for  documents  uploaded   to  Fedora  such  as  author,  title,   description,  and  key  words,   among  others.  Results  are   presented  to  user  for  review.   Simple   Automatic   Metadata   Generation   Interface   (SamgI)   http://hmdb.cs.kuleuven.be/amg/   Download.php   content  extractor;   extrinsic  auto-­‐ generator   A  suite  of  tools  that  is  able  to   automatically  extract  metadata   elements  such  as  keyphrase  and   language  from  documents  as  well   as  from  the  context  in  which  a   document  exists.     Table  3.  Semi-­‐automatic  Tools  that  Support  Automatic  Indexing   Text  and  Data  Mining   The  two  methods  discussed  above,  content  extraction  and  automatic  indexing,  rely  on  text-­‐  and   data-­‐mining  techniques  for  the  automatic  extraction  of  metadata.  In  other  words,  the  above   methods  utilize  machine-­‐learning  algorithms,  statistical  analysis  of  term  frequencies,  clustering   techniques,  or  techniques  that  examine  the  frequency  of  term  utilization  between  documents  as   opposed  to  the  use  of  controlled  vocabularies,  and  classifying  techniques,  or  techniques  that   exploit  the  conventional  structure  of  documents,  for  the  semi-­‐automatic  generation  of  metadata.   Because  of  the  complexity  of  these  techniques,  few  tools  have  been  fully  developed  for  application   within  real-­‐world  library  settings.  Rather,  most  uses  of  these  techniques  have  been  developed  to   solve  the  problems  of  automatic  metadata  generation  within  the  context  of  specific  research   projects.       EVALUATION  OF  SEMI-­‐AUTOMATIC  METADATA  GENERATION  TOOLS|  PARK  AND  BRENZA     doi:  10.6017/ital.v34i3.5889   35   There  are  two  reasons  for  this.  One  is  that,  as  many  researchers  have  noted,  the  effectiveness  of   machine  learning  techniques  depends  on  the  quality  and  quantity  of  training  data  used  to  teach   the  system.13,  14,  15  Because  of  the  number  and  diversity  of  subject  domains  as  well  as  the  shear   variety  of  document  formats,  many  applications  are  designed  to  address  the  metadata  needs  of   very  specific  subject  domains  and  very  specific  types  of  documents.  This  is  a  point  that  Kovacevic   et  al.  make  in  stating  that  machine  learning  techniques  generally  work  best  for  documents  of  a   similar  type,  like  research  papers.16  Another  issue,  especially  as  it  applies  to  automatic  indexing,  is   the  fact  that,  as  Gardner  notes,  controlled  vocabularies  such  as  the  LCSH  are  too  complicated  and   diverse  in  structure  to  be  applied  through  semi-­‐automatic  means.17  Although  some  open-­‐source   tools  such  as  Data  Fountains  have  made  efforts  to  overcome  this  complexity,  projects  like  it  are  the   exception  rather  than  the  rule.  These  issues  signify  the  difficulty  with  developing  sophisticated   semi-­‐automatic  metadata  generation  tools  that  have  general  applicability  across  a  wide  range  of   subject  domains  and  format  types.  Nevertheless,  for  semi-­‐automatic  metadata  generation  tools  to   become  a  reality  for  the  library  community,  such  complexity  will  have  to  be  overcome.   There  are,  however,  some  tools  that  have  broader  applicability  or  can  be  customized  to  meet  local   needs.  For  instance,  the  Kea  keyphrase  extractor  offers  the  option  of  building  local  or  applying   available  ontologies  that  can  be  used  to  refine  the  extraction  process.  Perhaps  the  most  promising   of  all  is  the  above  mentioned  Data  Fountains  suite  of  tools  developed  by  the  University  of   California.  The  Data  Fountains  suite  incorporates  almost  every  one  of  the  semi-­‐automatic   metadata  techniques  described  in  this  study,  including  sophisticated  content  extraction  and   automatic  indexing  features.  It  also  provides  several  ways  to  customize  the  suite  in  order  to  meet   local  needs.     Extrinsic  Data  Auto-­‐Generation   Extrinsic  data  auto-­‐generation  is  the  process  of  extracting  metadata  about  an  information   resource  that  is  not  contained  within  the  resource  itself.  Extrinsic  data  auto-­‐generation  can   involve  the  extraction  of  technical  metadata  such  as  file  format  and  size  but  can  also  include  the   extraction  of  more  complicated  features  such  as  the  grade  level  of  an  educational  resource  or  the   intended  audience  for  a  document.  The  process  of  extracting  technical  metadata  is  perhaps  one   area  of  semi-­‐automatic  metadata  generation  that  is  in  a  high  state  of  development,  included  in   most  CMSS  such  as  Dspace,18  as  well  as  other  more  sophisticated  tools  such  as  Harvard’s  JHove,   which  can  recognize  at  least  7twelve  different  kinds  of  textual,  audio,  and  visual  file  formats.19  On   the  other  hand,  the  problem  of  semi-­‐automatically  generating  other  types  of  extrinsic  metadata,   like  grade  level,  are  of  the  most  difficult  to  solve.     As  Leibbrandt  et  al.  note  in  their  analysis  of  the  use  of  artificial  intelligence  mechanisms  to   generate  subject  metadata  for  a  repository  of  educational  materials  at  the  Education  Services   Australia,  the  extraction  of  extrinsic  metadata  such  as  grade  level  was  much  more  difficult  than   the  extraction  of  keywords  because  of  the  lack  of  information  surrounding  a  resource’s  context   within  the  resource  itself.20  This  difficulty  can  also  be  seen  in  the  absence  of  tools  that  support  the     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  SEPTEMBER  2015       36   extraction  of  extrinsic  data  beyond  those  that  are  harvesting  metadata  that  has  been  created   manually  or  extracting  technical  metadata.     Table  4  lists  the  tools  that  support  extrinsic  data  auto-­‐generation  either  as  the  sole  technique  or  as   one  of  a  suite  of  techniques  used  to  generate  metadata  from  resources.  Of  the  thirty-­‐nine  tools   evaluated  for  this  study,  thirteen  tools  support  some  form  of  extrinsic  data  auto-­‐generation.   Tool  Name   Location   Techniques   Functions/Features   Apache  POI— Text  Extractor   http://poi.apache.org/download.html     content  extractor;   meta-­‐tag  harvester;   extrinsic  auto-­‐ generator   Apache  POI  provides  basic  text   extraction  for  all  project   supported  file  formats.  In  addition   to  the  (plain)  text,  Apache  POI  can   access  the  metadata  associated   with  a  given  file,  such  as  title  and   author.     Apache  Tika   http://tika.apache.org/     content  extractor;   meta-­‐tag  harvester;   extrinsic  auto-­‐ generator   Built  on  Apache  POI,  the  Apache   Tika  toolkit  detects  and  extracts   metadata  and  text  content  from   various  documents.   Data   Fountains   http://datafountains.ucr.edu/     content  extractor;   automatic  indexer;   meta-­‐tag  harvester;   extrinsic  auto-­‐ generator   Scans  HTML  documents  and  first   extracts  information  contained  in   meta-­‐tags.  If  information  is   unavailable  in  meta-­‐tags,  the   program  will  use  other  techniques   to  assign  values.  Includes  a   focused  web  crawler  that  can   target  websites  concerning  a   specific  subject.     Digital  Record   Object   Identification   (DROID)   http://www.nationalarchives.gov.uk/   information-­‐management/manage   -­‐information/preserving-­‐digital   -­‐records/droid/     extrinsic  auto-­‐ generator   DROID  is  a  software  tool   developed  by  the  National   Archives  to  perform  automated   batch  identification  of  file  formats.   Dspace   http://www.dspace.org/     meta-­‐tag  harvester;   extrinsic  auto-­‐ generator   Automatically  extracts  technical   information  regarding  file  format   and  size.  Can  also  extract  some   information  from  meta-­‐tags.   Editor-­‐ Converter   Dublin  Core   Metadata   http://www.library.kr.ua/dc/   dceditunie.html   meta-­‐tag  harvester;   extrinsic  auto-­‐ generator   Scans  HTML  documents,   harvesting  metadata  from  tags   and  converting  them  to  Dublin   Core.   Embedded   Metadata   Extraction   Tool  (EMET)   http://www.artstor.org/global/g   -­‐html/download-­‐emet-­‐public.html     content  extractor;   meta-­‐tag  harvester;   extrinsic  auto-­‐ generator   EMET  is  a  tool  designed  to  extract   metadata  embedded  in  JPEG  and   TIFF  files.   Firefox  Dublin   Core  Viewer   Extension   http://www.splintered.co.uk/   experiments/73/     meta-­‐tag  harvester;   extrinsic  auto-­‐ generator   Scans  HTML  documents,   harvesting  metadata  from  tags   and  displaying  them  to  Dublin   Core.   JHove   http://jhove.sourceforge.net/   extrinsic  auto-­‐ Extracts  metadata  regarding  file     EVALUATION  OF  SEMI-­‐AUTOMATIC  METADATA  GENERATION  TOOLS|  PARK  AND  BRENZA     doi:  10.6017/ital.v34i3.5889   37   #implementation     generator   format  and  size  as  well  as   validating  the  structure  of  the   identified  file  format.   National   Library  of   New   Zealand— Metadata   Extraction   Tool   http://meta-­‐extractor   .sourceforge.net/     extrinsic  auto-­‐ generator   Developed  by  the  National  Library   of  New  Zealand  to   programmatically  extract   preservation  metadata  from  a   range  of  file  formats  like  PDF   documents,  image  files,  sound   files,  Microsoft  Office  documents,   and  others.   Omeka   http://omeka.org/     extrinsic  auto-­‐ generator;  social   tagging   Automatically  extracts  technical   information  regarding  file  format   and  size.     RepoMMan   http://www.hull.ac.uk/esig/   repomman/index.html   meta-­‐tag  harvester;   content  extractor;   extrinsic  auto-­‐ generator   Automatically  extracts  various   elements  for  documents  uploaded   to  Fedora  such  as  author,  title,   description,  and  key  words,   among  others.  Results  are   presented  to  user  for  review.   Simple   Automatic   Metadata   Generation   Interface   (SamgI)   http://hmdb.cs.kuleuven.be/amg/   Download.php   content  extractor;   extrinsic  auto-­‐ generator   A  suite  of  tools  that  is  able  to   automatically  extract  metadata   elements  such  as  keyphrase  and   language  from  documents  as  well   as  from  the  context  in  which  a   document  exists.       Table  4.  Semi-­‐Automatic  Tools  that  Support  Extrinsic  Data  Auto-­‐Generation.   Social  Tagging     Social  tagging  is  now  a  familiar  form  of  subject  metadata  generation  although,  as  mentioned   previously,  it  is  not  properly  a  form  of  automatic  metadata  generation.  Nevertheless,  because  of   the  relatively  low  cost  in  generating  and  maintaining  metadata  through  social  tagging  and  its   current  widespread  popularity,  a  few  projects  have  attempted  to  utilize  such  data  to  enhance   repositories.  For  instance,  Linstaedt  et  al.  use  sophisticated  computer  programs  to  analyze  still   images  found  within  Flickr  and  then  use  this  analysis  to  process  new  images  and  to  propagate   relevant  user  tags  to  those  images.21     In  a  slightly  more  complicated  example,  Liu  and  Qin  employ  machine-­‐learning  techniques  to   initially  process  and  assign  metadata,  including  subject  terms,  to  a  repository  of  documents   related  to  the  computer  science  profession.22  However,  this  proof  of  concept  project  also  permits   users  to  edit  the  fields  of  the  metadata  once  established.  The  user-­‐edited  tags  are  then   reprocessed  by  the  system  with  the  hope  of  improving  the  machine-­‐learning  mechanisms  of  the   database,  creating  a  kind  of  feedback  loop  for  the  system.  Specifically,  the  improved  tags  are  used   by  the  system  to  suggest  and  assign  subject  terms  for  new  documents  as  well  as  to  improve   subject  description  of  existing  documents  within  the  repository.  Although  these  two  examples   provide  instances  of  sophisticated  reprocessing  of  social  tag  metadata,  these  capabilities  do  not   seem  to  be  present  in  open-­‐source  tools  at  this  time.  Nevertheless,  social  tagging  capabilities  are   offered  by  many  CMSS  such  as  Omeka.  These  social  tagging  capabilities  may  offer  a  means  to   enhance  subject  access  to  holdings.       INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  SEPTEMBER  2015       38   Table  5  below  lists  the  tools  that  support  social  tagging  either  as  the  sole  technique  or  as  one  of  a   suite  of  techniques  used  to  generate  metadata  from  resources.  Of  the  thirty-­‐nine  tools  evaluated   for  this  study,  two  tools  support  some  form  of  social  tagging.   Tool  Name   Location   Techniques   Functions/Features   Dspace   http://www.dspace.org/     meta-­‐tag  harvester;   extrinsic  auto-­‐ generator;  social   tagging   Automatically  extracts   technical  information   regarding  file  format  and  size.   Can  also  extract  some   information  from  meta-­‐tags.   Omeka   http://omeka.org/     extrinsic  auto-­‐ generator;  social   tagging   Automatically  extracts   technical  information   regarding  file  format  and  size.     Table  5.  Semi-­‐automatic  Tools  that  Support  Social  Tagging.   Challenges  to  Implementation   Although  semi-­‐automatic  metadata  generation  tools  offer  many  benefits,  especially  in  regards  to   streamlining  the  metadata-­‐creation  process,  there  are  significant  barriers  to  the  widespread   adoption  and  implementation  of  these  tools.  One  problem  with  semi-­‐automatic  metadata   generation  tools  is  that  many  are  developed  locally  to  address  the  specific  needs  of  a  given  project   or  as  part  of  academic  research.  This  local,  highly  focused  milieu  for  development  means  that   general  applicability  of  the  tools  is  potentially  diminished.  The  local  context  may  also  hinder   widespread  adoption  of  applications  that  would  result  in  strong  communities  of  application  users   and  provide  further  support  for  the  development  of  applications  in  an  open-­‐source  context.   Because  of  the  highly  specific  nature  of  many  current  tools,  their  relevance  to  real-­‐world   processes  of  metadata  creation  within  the  broader  context  of  libraries’  diverse  information   management  needs  are  not  accounted  for.   Additionally,  many  tools  are  focused  on  solving  one  or,  at  most,  a  few  metadata  generation   problems.  For  instance,  the  Kea  application  is  designed  to  use  machine-­‐learning  techniques  for  the   sole  purpose  of  extracting  keywords,  the  Open  Text  Summarizer  is  limited  to  automatic   extractions  of  summary  descriptions  and  keywords,  and  Editor  Converter  Dublin  Core  is  designed   to  extract  information  in  HTML  meta-­‐tags  and  map  them  to  Dublin  Core  elements.  Because  of  the   piecemeal  development  of  semi-­‐automatic  generation  tools,  any  comprehensive  package  of  tools   will  require  the  significant  efforts  of  the  implementer  to  coordinate  the  selected  applications  and   to  produce  results  in  a  single  output.  This  is,  to  say  the  least,  a  daunting  task.     Furthermore,  a  high  degree  of  technical  skill  is  required  to  implement  these  complex  tools.  Many   of  the  more  sophisticated  tools  used  to  semi-­‐automatically  generate  metadata,  such  as  Data   Fountains,  Kea,  and  Apache  Stanbol,  require  competence  in  a  variety  of  programming  languages.     EVALUATION  OF  SEMI-­‐AUTOMATIC  METADATA  GENERATION  TOOLS|  PARK  AND  BRENZA     doi:  10.6017/ital.v34i3.5889   39   Significant  knowledge  of  C++,  Python,  and  Java,  are  required  to  implement  these  systems  properly.   The  high  degree  of  technical  knowledge  needed  to  implement  these  tools  means  that  many   libraries  and  other  institutions  may  not  have  resources  to  begin  implementing  them,  let  alone   incorporating  them  into  the  daily  workflows  of  the  metadata  creation  process.  Further,  this  high   degree  of  technical  expertise  may  require  libraries  to  seek  assistance  outside  of  the  library.  In   other  words,  librarians  may  need  to  build  strong  collaborative  relationships  with  those  who  have   the  technical  skills,  expertise  and  credentials  to  implement  and  maintain  these  complicated  tools.   As  Vellucci  et  al.  note  in  regards  to  their  development  of  the  Metadata  Education  and  Research   Information  Commons  (MERIC),  a  metadata-­‐driven  clearinghouse  of  education  materials  related   to  metadata,  elaborate  and  multidisciplinary  partnerships  need  to  be  firmly  established  for  the   ultimate  success  of  such  projects,  including  the  sustained  support  of  the  highest  levels  of   administration.23  These  types  of  partnerships  may  be  difficult  to  establish  and  maintain  for  the   sustained  implementation  of  complicated  tools.     Additionally,  sustainable  development  of  tools,  especially  in  regards  to  the  funding  needed  for   continued  development  of  open-­‐source  applications,  appears  to  be  a  significant  barrier  to   implementation.  For  instance,  at  the  time  of  this  writing,  many  of  the  tools  that  were  touted  in  the   literature  as  being  most  promising,  such  as  DC  Dot,  Reggie,  and  DescribeThis,  are  no  longer   available  for  implementation.  Beyond  the  fact  that  discontinuation  hurts  the  potential  adoption   and  continued  development  of  semi-­‐automatic  tools  within  real  world  library  and  other   information  settings,  there  is  also  the  problem  that  those  settings  that  have  in  fact  adopted  tools   may  lose  the  technical  support  of  a  central  developer  and  community  of  users.  Thus   discontinuation  may  result  in  higher  rates  of  tool  obsolescence  and  increase  the  potential   expenses  of  libraries  who  have  implemented  and  then  must  change  applications.   Finally,  the  application  of  semi-­‐automatic  metadata  tools  remains  relatively  untested  in  real-­‐world   scenarios.  As  Polfreman  et  al.  note,  most  tests  of  automatic  metadata  generation  tools  have  several   of  problems,  including  small  sample  sizes,  narrow  scope  of  project  domains,  and  experiments  that   lack  true  objectivity  because  systems  are  generally  tested  by  their  creators.24  For  these  reasons,   libraries  and  other  institutions  may  be  reluctant  to  expand  the  resources  needed  to  implement   and  fully  integrate  a  complicated,  promising,  but  ultimately  untested,  tool  within  the  already   strained  workflows  of  its  processes.     CONCLUSION   Semi-­‐automatic  metadata  generation  tools  hold  the  promise  of  assisting  information  professionals   with  the  management  of  ever-­‐increasing  quantities  and  types  of  information  resources.  Using   software  that  can  create  metadata  records  consistently  and  efficiently,  semi-­‐automatic  metadata   generation  tools  potentially  offer  significant  cost  and  time  savings.  However,  the  full  integration  of   these  tools  into  the  daily  workflows  of  libraries  and  other  information  settings  remains  elusive.   For  instance,  although  many  tools  have  been  developed  that  have  addressed  many  of  the  more   complicated  aspects  of  semi-­‐automatic  metadata  generation,  including  the  extraction  of     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  SEPTEMBER  2015       40   information  related  to  conceptually  difficult  areas  of  bibliographic  description  such  as  subject   terms,  open-­‐ended  resource  descriptions,  and  keyword  assignment,  many  of  these  tools  are   relevant  only  at  the  project  level  and  are  not  applicable  to  the  broader  contexts  needed  by   libraries.  In  other  words,  the  current  array  of  tools  exists  to  solve  experimental  problems  but  has   not  been  developed  to  the  point  that  the  library  community  can  implement  it  in  a  meaningful  way.     Perhaps  the  greatest  area  of  difficulty  lies  in  the  fact  that  most  tools  only  address  part  of  the   problem  of  semi-­‐automatic  metadata  generation,  providing  solutions  to  the  semi-­‐automatic   generation  of  one  or  a  few  bibliographic  elements  but  not  the  full  range  elements.  This  means  that   for  libraries  to  truly  have  a  comprehensive  tool  set  for  the  semi-­‐automatic  generation  of  metadata   records,  significant  local  efforts  will  be  required  to  integrate  the  various  tools  into  a  working   whole.  Couple  this  issue  with  the  instability  of  tool  development  and  maintenance  and  it  appears   that  the  library  community  may  lack  incentive  to  invest  already  strained  and  limited  resources  in   the  adoption  of  these  tools.   Thus  it  appears  that  a  number  of  steps  will  need  to  be  taken  before  the  library  community  can   seriously  consider  the  incorporation  of  semi-­‐automatic  metadata  generation  tools  within  its  daily   workflows.  First,  it  seems  that  the  integration  of  these  various  tools  into  a  coherent  set  of   applications  is  likely  the  next  step  in  the  development  of  viable  semi-­‐automatic  metadata   generation.  Since  most  small  libraries  likely  do  not  have  the  resources  required  to  integrate  these   disparate  tools  together,  let  alone  incorporate  them  within  existing  library  systems,  a  single   package  of  tools  will  be  needed  simply  from  a  resource  perspective.  Secondly,  considering  the  high   level  of  technical  expertise  needed  to  implement  the  current  array  of  tools,  the  integrated  set  of   tools  must  be  accomplished  in  such  a  way  as  to  foster  implementation,  utilization,  and   maintenance  with  a  minimum  of  technical  expertise.  For  instance,  if  an  integrated  set  of  tools  that   functioned  across  a  wide  range  of  subject  domains  and  format  types  could  be  developed,  the  suite   might  be  akin  to  the  CMSS  currently  employed  by  many  libraries.  Furthermore,  with  a  suite  of   tools  that  are  relatively  easy  to  use,  adaption  would  likely  increase.  This  might  result  in  a  stable   community  of  users  that  would  foster  the  further  development  of  the  tools  in  a  sustainable   manner.  A  comprehensive,  relatively  easy  to  implement  set  of  tools  might  foster  independent   testing  of  those  tools.  The  independent  testing  of  the  semi-­‐automatic  tools  is  needed  to  provide  an   objective  basis  for  tool  evaluation  and  further  development.   Finally,  designing  automated  workflows  tailored  to  the  subject  domain  and  types  of  resources   seems  to  be  an  essential  step  for  integrating  semi-­‐automatic  metadata  generation  tools  into   metadata  creation.  Such  workflows  may  delineate  data  elements  that  can  be  generated  by   automated  meta-­‐tag  extractor  from  data  elements  that  need  to  be  refined  and  manually  created  by   cataloging  and  metadata  professionals.  To  develop,  maximize,  and  sustain  semi-­‐automatic   metadata  generation  workflows,  administrative  support  for  finance,  human  resources,  and   training  is  critical.       EVALUATION  OF  SEMI-­‐AUTOMATIC  METADATA  GENERATION  TOOLS|  PARK  AND  BRENZA     doi:  10.6017/ital.v34i3.5889   41   Thus,  although  many  of  the  technical  aspects  of  semi-­‐automatic  metadata  generation  are  well  on   their  way  to  being  solved,  many  other  barriers  exist  that  might  limit  adoption.  Further,  these   barriers  may  have  a  negative  influence  on  the  continued,  sustainable  development  of  semi-­‐ automatic  metadata  generation  tools.  Nevertheless,  there  is  a  critical  need  that  the  library   community  finds  ways  to  manage  the  recent  explosion  of  data  and  information  in  cost-­‐effective   and  efficient  ways.  Semi-­‐automatic  metadata  generation  holds  the  promise  to  do  just  that.     ACKNOWLEDGEMENT   This  study  was  supported  by  the  Institute  of  Museum  and  Library  Services.   REFERENCES     1.     Jane  Greenberg,  Kristina  Spurgin,  and  Abe  Crystal,  “Final  Report  for  the  AMeGA  (AutoZmatic   2.     Sue  Ann  Gardner,  “Cresting  Toward  the  Sea  Change,”  Library  Resources  &  Technical  Services   56,  no.  2  (2012):  64–79,  http://dx.doi.org/10.5860/lrts.56n2.64.   3.     For  details,  see  Jung-­‐ran  Park  and  Caimei  Lu,  “Application  of  Semi-­‐Automatic  Metadata   Generation  in  Libraries:  Types,  Tools,  and  Techniques,”  Library  &  Information  Science   Research  31,  no.  4  (2009):  225–31,  http://dx.doi.org/10.1016/j.lisr.2009.05.002.   4.     Erik  Mitchell,  “Trending  Tech  Services:  Programmatic  Tools  and  the  Implications  of   Automation  in  the  Next  Generation  of  Metadata,”  Technical  Services  Quarterly  30,  no.  3  (2013):   296–10,  http://dx.doi.org/10.1080/07317131.2013.785802.     5.     Jane  Greenberg,  “Metadata  Extraction  and  Harvesting:  A  Comparison  of  Two  Automatic   Metadata  Generation  Applications,”  Journal  of  Internet  Cataloging  6,  no.  4  (2004):  59–82,   http://dx.doi.org/10.1300/J141v06n04_05.     6.     Malcolm  Polfreman,  Vanda  Broughton,  and  Andrew  Wilson,  “Metadata  Generation  for   Resource  Discovery,”  JISC,  2008,   http://www.jisc.ac.uk/whatwedo/programmes/resourcediscovery/autometgen.aspx.   7     Park  and  Lu,  “Application  of  Semi-­‐Automatic  Metadata  Generation  in  Libraries.”   8.   Kea  Automatic  Keyphrase  Extraction  homepage,  http://www.nzdl.org/Kea/index_old.html.   9.     Wilhelmina  Randtke,  “Automated  Metadata  Creation:  Possibilities  and  Pitfalls,”  Serials   Librarian  64,  no.  1–4  (2013):  267–84,  http://dx.doi.org/10.1080/0361526X.2013.760286.       10.    Aleksandar  Kovačević  et  al.,“Automatic  Extraction  of  Metadata  from  Scientific  Publications  for   CRIS  Systems.”  Electronic  Library  and  Information  Systems  45,  no.  4  (2011):  376–96,   http://dx.doi.org/10.1108/00330331111182094.     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  SEPTEMBER  2015       42     11.    Mark  Patton  et  al.,  “Toward  a  Metadata  Generation  Framework:  A  Case  Study  at  Johns  Hopkins   University,”  D-­‐Lib  Magazine  10,  no.  11  (2004),   http://www.dlib.org/dlib/november04/choudhury/11choudhury.html.     12.    Nicolai  Erbs,  Iryna  Gurevych,  and  Marc  Rittberger,  “Bringing  Order  to  Digital  Libraries:  From   Keyphrase  Extraction  to  Index  Term  Assignment.”  D-­‐Lib  Magazine  19,  no.  9/10  (2013),   http://www.dlib.org/dlib/september13/erbs/09erbs.html.   13.    Polfreman,  Broughton,  and  Wilson,  “Metadata  Generation  for  Resource  Discovery.”   14.    Randtke,  “Automated  Metadata  Creation.”   15.    Xiaozhong  Liu  and  Jian  Qin,  “An  Interactive  Metadata  Model  for  Structural,  Descriptive,  and   Referential  Representation  of  Scholarly  Output,”  Journal  of  the  Association  for  Information   Science  &  Technology  65,  no.  5  (2014):  964–83,  http://dx.doi.org/10.1002/asi.23007.   16.    Kovačević  et  al.,  “Automatic  Extraction  of  Metadata  from  Scientific  Publications  for  CRIS   Systems.”   17.    Gardner,  “Cresting  Toward  the  Sea  Change.”   18.    Mary  Kurtz,  “Dublin  Core,  Dspace,  and  a  Brief  Analysis  of  Three  University  Repositories,”   Information  Technology  &  Libraries  29,  no.  1  (2010):  40–46,   http://dx.doi.org/10.6017/ital.v29i1.3157.     19.    “JHOVE  -­‐  JSTOR/Harvard  Object  Validation  Environment,”  JSTOR,     http://jhove.sourceforge.net.   20.    Richard  Leibbrandt  et  al.,  “Smart  Collections:  Can  Artificial  Intelligence  Tools  and  Techniques   Assist  with  Discovering,  Evaluating  and  Tagging  Digital  Learning  Resources?”  International   Association  of  School  Librarianship:  Selected  Papers  from  the  Annual  Conference  (2010).   21.    Stefanie  Lindstaedt  et  al.,  “Automatic  Image  Annotation  Using  Visual  Content  and   Folksonomies,”  Multimedia  Tools  &  Applications  42,  no.  1  (2009):  97–113,   http://dx.doi.org/10.1007/s11042-­‐008-­‐0247-­‐7.   22.    Liu  and  Qin,  “An  Interactive  Metadata  Model.”   23.    Sherry  Vellucci,  Ingrid  Hsieh-­‐Yee,  and  William  Moen,  “The  Metadata  Education  and  Research   Information  Commons  (MERIC):  A  Collaborative  Teaching  and  Research  Initiative,”  Education   for  Information  25,  no.  3/4  (2007):  169–78.   24.    Polfreman,  Broughton,  and  Wilson,  “Metadata  Generation  for  Resource  Discovery.”   5893 ---- Microsoft Word - September_ITAL_Maceli_proofed.docx What  Technology  Skills  Do  Developers   Need?  A  Text  Analysis  of  Job  Listings  in   Library  and  Information  Science  (LIS)     from  Jobs.code4lib.org.      Monica  Maceli     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  SEPTEMBER  2015             8   ABSTRACT   Technology  plays  an  indisputably  vital  role  in  library  and  information  science  (LIS)  work;  this  rapidly   moving  landscape  can  create  challenges  for  practitioners  and  educators  seeking  to  keep  pace  with   such  change.  In  pursuit  of  building  our  understanding  of  currently  sought  technology  competencies   in  developer-­‐oriented  positions  within  LIS,  this  paper  reports  the  results  of  a  text  analysis  of  a  large   collection  of  job  listings  culled  from  the  Code4lib  jobs  website.  Beginning  more  than  a  decade  ago  as   a  popular  mailing  list  covering  the  intersection  of  technology  and  library  work,  the  Code4lib   organization's  current  offerings  include  a  website  that  collects  and  organizes  LIS-­‐related  technology   job  listings.  The  results  of  the  text  analysis  of  this  dataset  suggest  the  currently  vital  technology  skills   and  concepts  that  existing  and  aspiring  practitioners  may  target  in  their  continuing  education  as   developers.     INTRODUCTION For  those  seeking  employment  in  a  technology-­‐intensive  position  within  library  and  information   science  (LIS),  the  number  and  variation  of  technology  skills  required  can  be  daunting.  The  need  to   understand  common  technology  job  requirements  is  relevant  to  current  students  positioning   themselves  to  begin  a  career  within  LIS,  those  currently  in  the  field  that  wish  to  enhance  their   technology  skills,  and  LIS  educators.  The  aim  of  this  short  paper  is  to  highlight  the  skills  and   combinations  of  skills  currently  sought  by  LIS  employers  in  North  America  through  textual   analysis  of  job  listings.  Previous  research  in  this  area  explored  job  listings  through  various   perspectives,  from  categorizing  titles  to  interviewing  employers;1,2  the  approach  taken  in  this   study  contributes  a  new  perspective  to  this  ongoing  and  highly  necessary  work.  This  research   report  seeks  a  further  understanding  of  the  following  research  questions:   • What  are  the  most  common  job  titles  and  skills  sought  in  technology-­‐focused  LIS  positions?   • What  technology  skills  are  sought  in  combination?   • What  implications  do  these  findings  have  for  aspiring  and  current  LIS  practitioners   interested  in  developer  positions?     As  detailed  in  the  following  research  method  section,  this  study  addresses  these  questions     Monica  Maceli  (mmaceli@pratt.edu)  is  Assistant  Professor,  School  of  Information  and  Library   Science,  Pratt  Institute,  New  York.     WHAT  TECHNOLOGY  SKILLS  DO  DEVELOPERS  NEED?  |  MACELI   doi:  10.6017/ital.v34i3.5893   9   through  textual  analysis  of  relevant  job  listings  from  a  novel  dataset—the  job  listings  from  the   Code4lib  jobs  website  (http://jobs.code4lib.org/).  Code4lib  began  more  than  a  decade  ago  as  an   electronic  discussion  list  for  topics  around  the  intersection  of  libraries  and  technology.3  Over  time,   the  Code4lib  organization  expanded  to  an  annual  conference  in  the  United  States,  the  Code4Lib   Journal,  and  most  relevant  to  this  work,  an  associated  jobs  website  that  highlights  jobs  culled  from   both  the  discussion  list  and  other  job-­‐related  sources.  Figure  1  illustrates  the  home  page  of  the   Code4lib  jobs  website;  the  page  presents  job  listings  and  associated  tags,  with  the  tags  facilitating   navigation  and  viewing  of  other  related  positions.  Users  may  also  view  positions  geographically  or   by  employer.           Figure  1.  Homepage  of  the  code4lib  Jobs  Website,  Displaying  Most-­‐Recently  Posted  Jobs  and  the   Associated  Tags.4   In  addition  to  the  visible  user  interface  for  job  exploration,  the  website  consists  of  software  to   gather  the  job  listings  from  a  variety  of  sources.  The  website  incorporates  jobs  posted  to  the   Code4lib  discussion  list,  American  Library  Association,  Canadian  Library  Association,  Australian   Library  and  Information  Association,  HigherEd  Jobs,  Digital  Koans,  Idealist,  and  ArchivesGig.  This   broad  incoming  set  of  jobs  provides  a  wide  look  into  new  technology-­‐related  postings.     New  job  listings  are  automatically  added  to  a  queue  to  be  assessed  and  tagged  by  human  curators   before  posting.  This  allows  manual  intervention  where  a  curator  assesses  whether  the  job  is   relevant  to  technology  in  the  library  domain  and  to  validate  the  job  listing  information  and   metadata  (see  figure  2).  Curating  is  done  on  a  volunteer  basis,  and  curators  are  asked  to  assess   whether  the  position  is  relevant  to  the  Code4lib  community,  if  it  is  unique,  and  to  ensure  that  it   has  an  associated  employer,  set  of  tags,  and  descriptive  text.  Combining  both  software  processes     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  SEPTEMBER  2015                   10   and  human  intervention  in  the  job  assessment  results  in  the  ability  to  gather  a  large  number  of   jobs  of  high  relevance  to  the  Code4lib  community.  As  mentioned  earlier,  Code4lib’s  origins  are  in   the  area  of  software  development  and  design  as  applied  in  LIS  contexts.  These  foci  mean  that  most   jobs  identified  as  relevant  for  inclusion  in  the  Code4lib  jobs  dataset  are  oriented  toward  developer   activities.  The  Code4lib  jobs  website  therefore  provides  a  useful  and  novel  dataset  within  which  to   understand  current  employment  opportunities  relating  to  the  intersection  between  technology— particularly  developer  work—and  the  LIS  field.       Figure  2.  Code4lib  Job  Curators  Interface  Where  Job  Data  is  Validated  and  Tags  Assigned.5   RESEARCH  METHOD   To  analyze  the  job  listing  data  in  greater  depth,  a  textual  analysis  was  conducted  using  the  R   statistical  package,  exploring  job  titles  and  descriptions.6  First,  the  job  listing  data  from  the  most   recent  complete  year  (2014)  were  dumped  from  the  database  backend  of  the  Code4lib  jobs   website;  this  dataset  contained  1,135  positions  in  total.  The  dataset  included  the  job  titles,   descriptions,  location  and  employer  information,  as  well  as  tags  associated  with  the  various     WHAT  TECHNOLOGY  SKILLS  DO  DEVELOPERS  NEED?  |  MACELI   doi:  10.6017/ital.v34i3.5893   11   positions.  The  text  was  then  cleaned  to  remove  any  markup  tags  or  special  characters  that   remained  from  the  scraping  of  listings.  Finally,  the  tm  (text  mining)  package  in  R  was  used  to   calculate  frequency,  correlation  of  terms,  generate  plots,  and  cluster  terms  across  both  job  titles   and  descriptions.7   RESULTS   Job  Title  Analysis   Of  the  full  set  of  1,135  positions,  30  percent  were  titled  as  a  librarian  position;  popular  specialties   included  systems  librarian  and  various  digital  collections  and  curation-­‐oriented  librarian  titles.   Figures  3  and  4  detail  the  most  common  terms  used  in  position  titles  across  librarian  and   nonlibrarian  positions.       Figure  3.  Most  Common  Terms  Used  in  Librarian  Position  Titles.   345 89 63 59 34 29 25 25 23 21 20 20 18 18 16 14 13 13 13 12 12 11 11 11 10 librarian digital systems services metadata data technologies university technology web electronic resources assistant information emerging scholarship collections library management initiatives sciences cataloging projects research professor Top Title Terms - Librarian Positions   INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  SEPTEMBER  2015                   12     Figure  4.  Most  Common  Terms  Used  in  Nonlibrarian  Position  Titles.   The  most  popular  job  title  terms  were  then  clustered  using  Ward’s  agglomerative  hierarchical   method  (dendogram  in  figure  5).  Agglomerative  hierarchical  clustering,  of  which  Ward’s  method   is  widely  used,  begins  first  with  single-­‐item  clusters,  then  identifies  and  joins  similar  clusters  until   the  final  stage  in  which  one  larger  cluster  is  formed.  Commonly  used  in  text  analysis,  this  allows   the  investigator  to  explore  datasets  in  which  the  number  of  clusters  is  not  known  before  the   analysis.  The  dendograms  generated  (e.g.,  figure  5)  allow  for  visual  identification  and   interpretation  of  closely  related  terms  representing  various  common  positions,  e.g.,  digital   librarian,  software  engineer,  collections  management,  etc.  Given  that  job  titles  in  listings  may   include  extraneous  or  infrequent  words,  such  as  the  organization  name,  the  cluster  analysis  can   provide  an  additional  view  into  common  job  titles  across  the  full  dataset  in  a  more  generalized   fashion.     182 141 116 90 86 68 65 59 59 59 55 52 49 49 40 40 40 40 38 35 34 34 33 32 24 digital developer library manager specialist software web archivist services technology engineer director data systems analyst coordinator information senior metadata administrator lead project head programmer research Top Title Terms - Non-Librarian Positions   WHAT  TECHNOLOGY  SKILLS  DO  DEVELOPERS  NEED?  |  MACELI   doi:  10.6017/ital.v34i3.5893   13       Figure  5.  Cluster  Dendrogram  of  Terms  Used  in  Job  Titles  Generated  Using  Ward's  Agglomerative   Hierarchical  Method.       Tag  Analysis   As  described  earlier,  the  Code4lib  jobs  website  allows  curators  to  validate  and  tag  jobs  before   listing.  The  word  cloud  in  figure  6  displays  the  most  common  tags  associated  with  positions,  with   XML  being  the  most  popular  tag  (178  occurrences).  Figure  7  contains  the  raw  frequency  counts  of   common  tags  observed.       INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  SEPTEMBER  2015                   14         Figure  6.  Word  Cloud  of  Most  Frequent  Tags  Associated  with  Job  Listings  by  Curators.     WHAT  TECHNOLOGY  SKILLS  DO  DEVELOPERS  NEED?  |  MACELI   doi:  10.6017/ital.v34i3.5893   15     Figure  7.  Frequency  of  Commonly  Occurring  Tags  (frequency  of  fifty  occurrences  or  more)  in  the   2014  Job  Listings.   Job  Description  Analysis   The  job  description  text  was  then  analyzed  to  explore  commonly  co-­‐occurring  technology-­‐related   terms,  focusing  on  frequent  skills  required  by  employers.  Figures  8,  9,  and  10  plot  term   correlations  and  interconnectedness.  Terms  with  correlation  coefficients  of  0.3  or  higher  were   chosen  for  plotting;  this  common  threshold  chosen  broadly  included  terms  with  a  range  in   positive  relationship  strength  from  moderate  to  strong.     Plots  were  created  to  express  correlations  around  the  top  five  terms  identified  from  the  tags:  XML,   Javascript,  PHP,  metadata,  and  HTML  (frequencies  in  figure  7).  Any  number  of  terms  and   178 155 152 142 125 119 114 106 101 99 90 90 89 89 86 82 79 78 70 70 69 69 66 63 62 54 53 51 51 50 50 XML JavaScript PHP Metadata HTML Archive Cascading Style Sheets Python Integrated library system Java MySQL Dublin Core MARC standards Encoded Archival Description Ruby Drupal Project management SQL Metadata Object Description Standard Data management GNU/Linux Digital preservation Perl Digital library XSL Transformations Resource Description and Access Digital repository World Wide Web Management DSpace METS Frequency of Tags - 2014 Job Listings   INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  SEPTEMBER  2015                   16   frequencies  can  be  plotted  from  such  a  dataset;  to  orient  the  findings  closely  around  the  job  listing   text,  a  focus  on  the  top  terms  was  chosen.  These  plots  illustrate  the  broader  set  of  skills  related  to   these  vital  competencies  represented  in  the  job  listings.         Figure  8.  Job  Listing  Terms  Correlated  with  “XML”  (most  popular  tag).         Figure  9.  Job  Listing  Terms  Correlated  with  “Javascript”  (Second  Most  Popular  Tag),  including   “PHP”  and  “HTML”  (third  and  fifth  most  popular  tags,  respectively).     WHAT  TECHNOLOGY  SKILLS  DO  DEVELOPERS  NEED?  |  MACELI   doi:  10.6017/ital.v34i3.5893   17     Figure  10.  Job  Listing  Terms  Correlated  with  “Metadata”  (fourth  most  popular  tag).     Finally,  a  series  of  general  plots  was  created  to  visualize  the  broad  set  of  skills  necessary  in   fulfilling  the  positions  of  interest  to  the  Code4lib  community.  As  detailed  in  the  title  analysis   (figures  3  and  4),  apart  from  the  generic  term  librarian,  the  two  most  common  terms  across  all  job   titles  were  digital  and  developer.  Correlation  plots  were  created  to  detail  the  specific  skills  and   requirements  commonly  sought  in  positions  using  such  terms.  Figure  11  illustrates  the  terms   correlated  with  the  general  term  of  developer,  while  figure  12  displays  terms  correlated  with   digital.  The  implications  of  these  findings  will  be  discussed  further  in  the  following  discussion   section.             INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  SEPTEMBER  2015                   18     Figure  11.  Job  Listing  Terms  Correlated  with  “Developer.”       Figure  12.  Job  Listing  Terms  Correlated  with  “Ddigital.”     WHAT  TECHNOLOGY  SKILLS  DO  DEVELOPERS  NEED?  |  MACELI   doi:  10.6017/ital.v34i3.5893   19   DISCUSSION   Taken  as  a  whole,  the  job  listing  dataset  covered  a  quite  dramatic  range  of  positions,  from  highly   technical  (e.g.,  senior-­‐level  software  engineer  or  web  developer)  to  managerial  and  leadership   roles  (e.g.,  director  or  department  head  roles  centered  on  digital  services  or  emerging   technologies).  These  findings  support  the  suggestions  of  earlier  research,8  which  advocated  for  LIS   graduate  programs  to  build  their  offerings  not  just  in  technology  skills  but  also  in  technology   management  and  decision-­‐making.  However,  the  Code4lib  jobs  dataset  is  a  one-­‐dimensional  view   into  the  employment  process  and  is  focused  largely  on  the  developer  perspective.  Additional   contextual  information,  including  whether  suitable  candidates  were  easily  identified  and  if  the   position  was  successfully  filled,  would  provide  a  more  complete  view  of  the  employment  process.   Prior  research  has  indicated  that  many  technology-­‐related  positions  in  LIS  are  in  fact  difficult  to   fill  with  LIS  graduates.9  While  LIS  graduate  programs  have  made  great  strides  in  increasing  the   number  of  courses  and  topics  covered  that  address  technology,  these  improvements  may  not   benefit  those  already  in  the  field  or  wishing  to  shift  towards  a  more  technology-­‐focused  position.   In  the  common  tags  and  terms  analysis,  experience  with  specific  LIS  applications  was  relatively   infrequently  required,  with  the  Drupal  content  management  system  a  notable  exception.  More   generalizable  programming  languages  or  concepts,  e.g.,  Python,  relational  databases,  XML,  etc.,   were  favored  As  with  technology  positions  outside  of  the  LIS  domain,  employers  likely  seek  those   with  the  ability  to  flexibly  apply  their  skills  across  various  tools  and  platforms.  This  may  also   relate  to  the  above  challenges  in  filling  such  positions  with  LIS  graduates,  with  the  goal  of  opening   up  the  position  to  a  larger  technologist  applicant  base.   Common  web  technologies  popular  in  the  open-­‐source  software  often  favored  by  LIS   organizations  continued  to  dominate,  with  a  clear  preference  for  candidates  well  versed  in  HTML,   CSS,  JavaScript,  and  PHP.  Relating  to  these  skills,  web  development  and  design  practices  were   often  intertwined  with  positions  requesting  both  developer-­‐oriented  skillsets  as  well  as  interface   design  (e.g.,  figure  7).  Technologies  supporting  modern  web  application  development  and   workflow  management  were  evident  as  well,  e.g.,  common  requirements  for  experience  with   versioning  systems  such  as  Git,  popular  JavaScript  libraries,  and  development  frameworks.  Also   striking  was  the  richness  of  the  terms  correlated  with  metadata  (figure  10),  including  mention  of   growing  areas  of  expertise,  such  as  linked  data.     Interestingly,  the  general  correlation  plots  expressing  the  common  terms  sought  around  “digital”   and  “developer”  positions  were  quite  varied.  While  the  developer  plot  (figure  11  above)  provided   a  richly  technical  view  into  common  technologies  broadly  applied  in  web  and  software   development,  the  terms  correlated  around  digital  were  notably  less  technical  (figure  12  above).   While  there  was  a  clear  focus  on  digital  preservation  activities  and  common  standards  in  this  area,   mention  of  terms  such  as  “grant”  indicated  that  these  positions  likely  have  a  broad  role.  The  term   digital  was  frequently  observed  in  librarian  job  titles,  so  these  roles  may  be  tasked  with  both   technical  and  administrative  work.       INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  SEPTEMBER  2015                   20   Finally,  there  are  inherent  difficulties  in  capturing  all  jobs  relating  to  technology  use  in  the  LIS   domain  that  introduce  limitations  into  this  study.  While  the  incoming  job  feeds  attempt  to  broadly   capture  recent  job  posts,  it  is  possible  that  jobs  are  missed  or  overlooked  by  the  job  curators.   Given  the  lack  of  one  centralized  job-­‐posting  source  regardless  of  the  field,  this  is  a  common   challenge  to  research  work  attempting  to  assess  every  job  posting.  And  as  mentioned  above,  there   is  also  a  lack  of  corresponding  data  as  to  whether  these  jobs  are  successfully  filled  and  what   candidate  backgrounds  are  ultimately  chosen  (i.e.,  from  within  or  outside  of  LIS).     CONCLUSION   This  assessment  of  the  in-­‐demand  technology  skills  provides  students,  educators,  and  information   professionals  with  useful  direction  in  pursuing  technology  education  or  strengthening  their   existing  skills.  There  are  myriad  technology  skills,  tools,  and  concepts  in  today’s  information   environments.  Reorienting  the  pursuit  of  knowledge  in  this  area  around  current  employer   requirements  can  be  useful  in  professional  development,  new  course  creation,  and  course  revision.   The  constellations  of  correlated  skills  presented  above  (figures  8–12)  and  popular  job  tags  (figure   7)  describe  key  areas  of  technology  competencies  in  the  diverse  areas  of  expertise  presently   needed,  from  web  design  and  development  to  metadata  and  digital  collection  management.  In   addition  to  the  results  presented  in  this  paper,  the  Code4lib  job  website  provides  a  continuously   current  view  into  recent  jobs  and  related  tags;  this  data  can  help  those  in  the  LIS  field  orient   professional  and  curricular  development  toward  real  employer  needs.   ACKNOWLEDGEMENTS   The  author  would  like  to  thank  Ed  Summers  of  the  Maryland  Institute  for  Technology  in  the   Humanities  for  generously  providing  the  jobs.code4lib.org  dataset  for  analysis.     REFERENCES     1. Janie  M.  Mathews  and  Harold  Pardue,  “The  Presence  of  IT  Skill  Sets  in  Librarian  Position   Announcements,”  College  &  Research  Libraries  70,  no.  3  (2009):  250–57,   http://dx.doi.org/10.5860/crl.70.3.250.     2. Vandana  Singh  and  Bharat  Mehra,  “Strengths  and  Weaknesses  of  the  Information  Technology   Curriculum  in  Library  and  Information  Science  Graduate  Programs,”  Journal  of  Librarianship  &   Information  Science  45,  no.  3  (2013):  219–31,  http://dx.doi.org/10.1177/0961000612448206.     3. “About”"  Code4lib,  accessed  January  6,  2014,  http://jobs.code4lib.org/about/.   4. “code4lib  jobs:  all  jobs,”  Code4lib  Jobs,  accessed  January  12,  2015,  http://jobs.code4lib.org/.     5. “code4lib  jobs:  Curate,”  Code4lib  Jobs,  accessed  January  17,  2015,   http://jobs.code4lib.org/curate/.     6. R  Core  Team,  R:  The  R  Project  for  Statistical  Computing,  2014,  http://www.R-­‐project.org/.     WHAT  TECHNOLOGY  SKILLS  DO  DEVELOPERS  NEED?  |  MACELI   doi:  10.6017/ital.v34i3.5893   21   7. Ingo  Feinerer  and  Kurt  Hornik,  “tm:  Text  Mining  Package,”  2014,  http://CRAN.R-­‐ project.org/package=tm.     8. Meredith  G.  Farkas,  “Training  Librarians  for  the  Future:  Integrating  Technology  into  LIS   Education,”  in  Information  Tomorrow:  Reflections  on  Technology  and  the  Future  of  Public  &   Academic  Libraries,  edited  by  Rachel  Singer  Gordon,  193–201  (Medford,  NJ:  Information  Today,   2007).   9. Mathews  and  Pardue,  “The  Presence  of  IT  Skill  Sets  in  Librarian  Position  Announcements.”   5900 ---- Microsoft Word - September_ITAL_Betz_final.docx Self-­‐Archiving  with  Ease  in  an     Institutional  Repository:     Microinteractions  and  the  User  Experience     Sonya  Betz    and     Robyn  Hall     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  SEPTEMBER  2015             43   ABSTRACT   Details  matter,  especially  when  they  can  influence  whether  users  engage  with  a  new  digital  initiative   that  relies  heavily  on  their  support.  During  the  recent  development  of  MacEwan  University’s   institutional  repository,  the  librarians  leading  the  project  wanted  to  ensure  the  site  would  offer  users   an  easy  and  effective  way  to  deposit  their  works,  in  turn  helping  to  ensure  the  repository’s  long-­‐term   viability.  The  following  paper  discusses  their  approach  to  user-­‐testing,  applying  Dan  Saffer’s   framework  of  microinteractions  to  how  faculty  members  experienced  the  repository’s  self-­‐archiving   functionality.  It  outlines  the  steps  taken  to  test  and  refine  the  self-­‐archiving  process,  shedding  light  on   how  others  may  apply  the  concept  of  microinteractions  to  better  understand  a  website’s  utility  and   the  overall  user  experience  that  it  delivers.     INTRODUCTION   One  of  the  greatest  challenges  in  implementing  an  institutional  repository  (IR)  at  a  university  is   acquiring  faculty  buy-­‐in.  Support  from  faculty  members  is  essential  to  ensuring  that  repositories   can  make  online  sharing  of  scholarly  materials  possible,  along  with  the  long-­‐term  digital   preservation  of  these  works.  Many  open  access  mandates  have  begun  to  emerge  around  the  world,   developed  by  universities,  governments,  and  research  funding  organizations,  which  serve  to   increase  participation  through  requiring  that  faculty  contribute  their  works  to  a  repository.1   However,  for  many  staff  managing  IRs  at  academic  libraries  there  are  no  enforceable  mandates  in   place,  and  only  a  fraction  of  faculty  works  can  be  contributed  without  copyright  implications  when   author  agreements  transfer  copyrights  to  publishers.  Persuading  faculty  members  to  take  the  time   to  sort  through  their  works  and  self-­‐archive  those  that  are  not  bound  by  rights  restrictions  is  a   challenge.   Standard  installations  of  popular  IR  software,  including  DSpace,  Digital  Commons,  and  ePrints,  do   little  to  help  facilitate  easy  and  efficient  IR  deposits  by  faculty.  As  Dorothea  Salo  writes  in  a  widely   cited  critique  of  IRs  managed  by  academic  libraries,  the  “‘build  it  and  they  will  come’  proposition   has  been  decisively  wrong.”2  A  major  issue  she  points  out  is  that  repositories  were  predicated  on   the  “assumption  that  faculty  would  deposit,  describe,  and  manage  their  own  material.”3  Seven     Sonya  Betz  (sonya.betz@ualberta.ca)  is  Digital  Initiatives  Project  Librarian,  University  of  Alberta   Libraries,  University  of  Alberta,  Edmonton,  Alberta.  Robyn  Hall  (HallR27@macewan.ca)  is   Scholarly  Communications  Librarian,  MacEwan  University  Library,  MacEwan  University,   Edmonton,  Alberta.     SELF-­‐ARCHIVING  WITH  EASE  IN  AN  INSTITUTIONAL  REPOSITORY  |  BETZ  AND  HALL     doi:  10.6017/ital.v34i3.5900   44   years  after  the  publication  of  her  article,  a  vast  majority  of  the  more  than  2,600  repositories   currently  operating  around  the  world  still  function  in  this  way  and  struggle  to  attract  widespread   faculty  support.4  To  deposit  works  into  these  systems,  faculty  are  often  required  to  fill  out  an   online  form  to  describe  and  upload  each  work  individually.  This  can  be  a  laborious  process  that   includes  deciphering  lengthy  copyright  agreements,  filling  out  an  array  of  metadata  fields,  and   ensuring  file  formats  or  file  sizes  that  are  compatible  with  the  constraints  of  the  software.   In  August  of  2014,  MacEwan  University  Library  in  Edmonton,  Alberta,  launched  an  IR,  Research   Online  at  MacEwan  (RO@M;  http://roam.macewan.ca).  Our  hope  was  that  RO@M’s  simple  user   interface  and  straightforward  submission  process  would  help  to  bolster  faculty  contributions.  The   site  was  built  using  Islandora,  an  open-­‐source  software  framework  that  offered  the  project   developers  substantial  flexibility  in  appearance  and  functionality.  In  an  effort  to  balance  their   desire  for  independence  over  their  work  with  ease  of  use,  faculty  and  staff  have  the  option  of   submitting  to  RO@M  in  one  of  two  ways:  they  can  choose  to  complete  a  brief  process  to  create   basic  metadata  and  upload  their  work,  or  they  can  simply  upload  their  work  and  have  RO@M  staff   create  metadata  and  complete  the  deposit.     Thoroughly  testing  both  of  these  processes  was  critical  to  the  success  of  the  IR.  We  wanted  to   ensure  that  there  were  no  obstacles  in  the  design  that  would  dissuade  faculty  members  from   contributing  their  works  once  they  had  made  the  decision  to  start  the  contribution  process.  As  the   primary  means  of  adding  content  to  the  IR,  and  as  a  process  that  other  institutions  have  found   problematic,  carefully  designing  each  step  of  how  a  faculty  contributor  submits  material  was  our   highest  priority.  To  help  us  focus  our  testing  on  some  of  these  important  details,  and  to  provide  a   framework  of  understanding  for  refining  our  design,  we  turned  to  Dan  Saffer’s  2013  book   Microinteractions:  Designing  with  Details.  The  following  case  study  describes  our  use  of   microinteractions  as  a  user-­‐testing  approach  for  libraries  and  discusses  what  we  learned  as  a   result.  We  seek  to  shed  light  on  how  other  repository  managers  might  envision  and  structure  their   own  self-­‐archiving  processes  to  ensure  buy-­‐in  while  still  relying  on  faculty  members  to  do  some  of   the  necessary  legwork.  Additionally,  we  lay  out  how  other  digital  initiatives  may  embrace  the   concept  of  microinteractions  as  a  means  of  better  understanding  the  relationship  between  the   utility  of  a  website  and  the  true  value  of  positive  user  experience.     LITERATURE  REVIEW   User  Experience  and  Self-­‐Archiving  in  Institutional  Repositories   User  experience  (UX)  in  libraries  has  gained  significant  traction  in  recent  years  and  provides  a   useful  framework  for  exploring  how  our  users  are  interacting  with,  and  finding  meaning  in,  the   library  technologies  we  create  and  support.  Although  there  is  still  some  disagreement  around  the   definition  and  scope  of  what  exactly  we  mean  when  we  talk  about  UX,  there  seems  to  be  general   consensus  that  paying  attention  to  UX  shifts  focus  from  the  usability  of  a  product  to  more   nonutilitarian  qualities,  such  as  meaning,  affect,  and  value.5  Hassenzhal  simply  defines  UX  as  a     INFORMATION  TECHNOLOGIES  AND  LIBRARIES  |  SEPTEMBER  2015   45   “momentary,  primarily  evaluative  feeling  (good-­‐bad)  while  interacting  with  a  product  or  service.”6   Hassenzhal,  Diefenbach,  and  Goritz  argue  that  positive  emotional  experiences  with  technology   occur  when  the  interaction  fulfills  certain  psychological  needs,  such  as  competence  or  popularity.7   The  2010  ISO  standard  for  human-­‐centered  design  for  interactive  systems  defines  UX  even  more   broadly,  suggesting  that  it  “includes  all  the  users’  emotions,  beliefs,  preferences,  perceptions,   physical  and  psychological  responses,  behaviors  and  accomplishments  that  occur  before,  during   and  after  use.”8  However,  when  creating  tools  for  library  environments,  it  can  be  difficult  for   practitioners  to  translate  ambiguous  emotional  requirements,  such  as  satisfying  emotional  and   psychological  needs  or  increasing  motivation,  with  pragmatic  outcomes,  such  as  developing  a   piece  of  functionality  or  designing  a  user  interface.   It  has  been  well  documented  that  repository  managers  struggle  to  motivate  academics  to  self-­‐ archive  their  works.9  However,  the  literature  focusing  on  how  IR  websites’  self-­‐archiving   functionality  helps  or  hinders  faculty  support  and  engagement  is  sparse.  One  study  of  note  was   conducted  by  Kim  and  Kim  in  2006,  who  led  usability  testing  and  focus  groups  on  an  IR  in  South   Korea.  10  They  provide  a  number  of  ways  to  improve  usability  on  the  basis  of  their  findings,  which   include  avoiding  jargon  terms  and  providing  comprehensive  instructions  at  points  of  need  rather   than  burying  them  in  submenus.  Similarly,  Veiga  e  Silva,  Goncalves,  and  Laender  reported  results   of  usability  testing  conducted  on  the  Brazilian  Digital  Library  of  Computing,  which  confirmed  their   initial  goals  of  building  a  self-­‐archiving  service  that  was  easily  learned,  comfortable,  and   efficient.11  The  authors  of  both  of  these  studies  suggest  that  user-­‐friendly  design  could  help  to   ensure  the  active  support  and  sustainability  of  their  services,  but  long-­‐term  use  remained  to  be   seen  at  the  time  of  publication.  Meanwhile,  Bell  and  Sarr  recommend  integrating  value-­‐added   features  into  IR  websites  as  a  way  to  attract  faculty.12  Their  successful  strategy  for  reengineering  a   struggling  IR  at  the  University  of  Rochester  included  adding  tools  to  allow  users  to  edit  metadata   and  add  and  remove  files,  and  providing  portfolio  pages  where  faculty  could  list  their  works  in  the   IR,  link  to  works  available  elsewhere,  detail  their  research  interests,  and  upload  a  copy  of  their  CV.   Although  the  question  remains  as  to  whether  a  positive  user  experience  in  an  IR  can  be  a   significant  motivating  factor  for  increasing  faculty  participation,  there  seems  to  be  enough   evidence  to  support  its  viability  as  an  approach.   Applying  Microinteractions  to  User  Testing   Dan  Saffer’s  2013  book,  Microinteractions:  Designing  with  Details,  follows  logically  from  the  UX   movement.  Although  he  uses  the  phrase  “user  experience”  sparingly,  Saffer  consistently  connects   interactive  technologies  with  the  emotional  and  psychological  mindset  of  the  user.  Saffer  focuses   on  “microinteractions,”  which  he  defines  as  “a  contained  product  moment  that  revolves  around  a   single  use  case.”13  Saffer  argues  that  well-­‐designed  microinteractions  are  “the  difference  between   a  product  you  love  and  product  you  tolerate.”14  Saffer’s  framework  is  an  effective  application  of  UX   theory  to  a  pragmatic  task.  Not  only  does  he  privilege  the  emotional  state  of  the  user  as  a  priority     SELF-­‐ARCHIVING  WITH  EASE  IN  AN  INSTITUTIONAL  REPOSITORY  |  BETZ  AND  HALL     doi:  10.6017/ital.v34i3.5900   46   for  design,  he  also  provides  concrete  recommendations  for  designing  technology  that  provokes   positive  psychological  states  such  as  pleasure,  engagement,  and  fun.   Defining  what  we  mean  by  a  “microinteraction”  is  important  when  translating  Saffer’s  theory  to  a   library  environment.  He  describes  a  microinteraction  as  “a  tiny  piece  of  functionality  that  only   does  one  thing  .  .  .  every  time  you  change  a  setting,  sync  your  data  or  devices,  set  an  alarm,  pick  a   password,  turn  on  an  appliance,  log  in,  set  a  status  message,  or  favorite  or  Like  something,  you  are   engaging  with  a  microinteraction.”15  In  libraries,  many  microinteractions  are  built  around   common  user  tasks  such  as  booking  a  group-­‐use  room,  placing  a  hold  on  an  item,  registering  for  an   event,  rating  a  book,  or  conducting  a  search  in  a  discovery  tool.  A  single  piece  of  interactive  library   technology  may  have  any  number  of  discrete  microinteractions,  and  often  are  part  of  a  larger   ecosystem  of  connected  processes.  For  example,  an  integrated  library  system  is  composed  of   hundreds  of  microinteractions  designed  both  for  end  users  and  library  staff,  while  a  self-­‐checkout   machine  is  primarily  designed  to  facilitate  a  single  microinteraction.   Saffer’s  framework  provided  a  valuable  new  lens  on  how  we  could  interpret  users’  interactions   with  our  IR.  While  we  generally  conceptualize  an  IR  as  a  searchable  collection  of  institutional   content,  we  can  also  understand  it  as  a  collection  of  microinteractions.  For  example,  RO@M’s  core   is  microinteractions  that  enable  tasks  such  as  searching  content,  browsing  content,  viewing  and   downloading  content,  logging  in,  submitting  content,  and  contacting  staff.  RO@M  also  includes   microinteractions  for  staff  to  upload,  review,  and  edit  content.  As  discussed  above,  one  of  the   primary  goals  when  developing  our  IR  was  to  allow  faculty  to  deposit  scholarly  content,  such  as   articles  and  conference  papers,  directly  to  the  repository.  We  wanted  this  process  to  be  simple  and   intuitive,  and  for  faculty  to  have  some  control  over  the  assignation  of  keywords  and  other   metadata,  but  also  to  have  the  option  to  simply  submit  content  with  minimal  effort.  We  decided  to   employ  user  testing  to  carefully  examine  the  deposit  process  as  a  discrete  microinteraction  and  to   apply  Saffer’s  framework  as  a  means  of  assessing  both  functionality  and  UX.  We  hoped  that   focusing  on  the  details  of  that  particular  microinteraction  would  allow  us  to  make  careful  and   thoughtful  design  choices  that  would  lead  to  a  more  consistent  and  pleasurable  UX.   METHOD  AND  CASE  STUDY   We  conducted  two  rounds  of  user  testing  for  the  self-­‐archiving  process.  Our  initial  user  testing   was  conducted  in  January  2014.  We  asked  seven  faculty  to  review  and  comment  on  a  mockup  of   the  deposit  form  to  test  the  workflow.  This  simple  exercise  allowed  us  to  confirm  the  steps  in  the   upload  process,  and  identified  a  few  critical  issues  that  we  could  resolve  before  building  out  the  IR   in  Islandora.  After  completing  the  development  of  the  IR,  and  with  a  working  copy  of  the  site   installed  on  our  user  acceptance  testing  (UAT)  server,  we  conducted  a  second  round  of  in-­‐depth   usability  testing  within  our  new  microinteraction  framework.     In  April  2014  we  recruited  six  faculty  members  through  word  of  mouth  and  through  a  call  for   participants  in  the  university’s  weekly  electronic  staff  newsletter.  The  volunteers  represented   major  disciplines  at  MacEwan  University,  including  health  sciences,  social  sciences,  humanities,     INFORMATION  TECHNOLOGIES  AND  LIBRARIES  |  SEPTEMBER  2015   47   and  natural  sciences.  Saffer  describes  a  process  for  testing  microinteractions  and  suggests  that  the   most  relevant  way  to  test  microinteractions  is  to  include  “hundreds  (if  not  thousands)  of   participants.”16  However,  he  goes  on  to  describe  the  most  effective  methods  of  testing  to  be   qualitative,  including  conversation,  interviews,  and  observation.  Testing  thousands  of  participants   with  one-­‐on-­‐one  interviews  and  observation  sessions  is  well  beyond  the  means  of  most  academic   libraries,  and  runs  counter  to  standard  usability  testing  methodology.  While  testing  only  six   participants  may  seem  like  a  small  number,  and  one  that  is  apt  to  render  inconclusive  results  and   sparse  feedback,  it  is  strongly  supported  by  usability  experts,  such  as  Jakob  Nielson.  During  the   course  of  our  testing,  we  quickly  reached  what  Nielson  refers  to  in  his  piece  “How  Many  Test  Users   in  a  Usability  Study?”  as  “the  point  of  diminishing  returns.”17  He  suggests  that  for  most  qualitative   studies  aimed  at  gathering  insights  to  inform  site  design  and  overall  UX,  five  users  is  in  fact  a   suitable  number  of  participants.  We  support  his  recommendation  on  the  basis  of  our  own   experiences;  by  the  fourth  participant,  we  were  receiving  very  repetitive  feedback  on  what   worked  well  and  what  needed  to  be  changed.   Testing  took  place  in  faculty  members’  offices  on  their  own  personal  computers  so  that  they  would   have  the  opportunity  to  engage  with  the  site  as  they  would  under  normal  workday  circumstances.   Each  user  testing  session  lasted  45  to  60  minutes,  and  was  facilitated  by  three  members  of  the   RO@M  team:  the  web  and  UX  librarian  guided  each  faculty  member  through  the  testing  process,   the  scholarly  communications  librarian  observed  the  interaction,  and  a  library  technician  took   detailed  notes  recording  participant  comments  and  actions.  Each  faculty  member  was  given  an   article  and  asked  to  contribute  that  article  to  RO@M  using  the  UAT  site.  The  RO@M  team  observed   the  entire  process  carefully,  especially  noting  any  problematic  interactions,  while  encouraging  the   faculty  member  to  think  aloud.  Once  testing  was  complete,  the  scholarly  communications  librarian   analyzed  the  notes  and  identified  areas  of  common  concern  and  confusion  among  participants,  as   well  as  several  suggestions  that  the  participants  made  to  improve  the  site’s  functionality  as  they   worked  through  the  process.  She  then  went  about  making  changes  to  the  site  based  on  this   feedback.  As  we  discuss  in  the  next  section,  each  task  that  faculty  members  performed,  from  easy   to  frustrating,  represented  an  interaction  with  the  user  interface  that  affected  participants’   experiences  of  engaging  with  the  contribution  process,  and  informed  changes  we  were  able  to   make  before  launching  the  IR  service  three  months  later.     Basic  Elements  of  Microinteractions   Saffer’s  theory  describes  four  primary  components  of  a  microinteraction:  the  trigger,  rules,   feedback,  and  loops  and  modes.  Viewing  the  IR  upload  tool  as  a  microinteraction  intended  to  be   efficient  and  user-­‐friendly  required  us  to  first  identify  each  of  these  different  components  as  they   applied  to  the  contribution  process  (see  figure  1),  and  then  evaluate  the  tool  as  a  whole  through   our  user  testing.     SELF-­‐ARCHIVING  WITH  EASE  IN  AN  INSTITUTIONAL  REPOSITORY  |  BETZ  AND  HALL     doi:  10.6017/ital.v34i3.5900   48     Figure  1.  IR  Self-­‐Archiving  Process  with  Microinteraction  Components.   Trigger   The  first  component  to  examine  in  a  microinteraction  is  the  trigger,  which  is,  quite  simply,   “whatever  initiates  the  microinteraction.”18  On  an  iPhone,  a  trigger  for  an  application  might  be  the   icon  that  launches  an  app;  on  a  dishwasher,  the  trigger  would  be  the  button  pressed  to  start  the   machine;  on  a  website,  a  trigger  could  be  a  login  button  or  a  menu  item.  Well-­‐designed  triggers   follow  good  usability  principles:  they  appear  when  and  where  the  user  needs  them,  they  initiate   the  same  action  every  time,  and  they  act  predictably  (for  example,  buttons  are  pushable,  toggles   slide).     INFORMATION  TECHNOLOGIES  AND  LIBRARIES  |  SEPTEMBER  2015   49   Examining  our  trigger  was  a  first  step  in  assessing  how  well  our  upload  microinteraction  was   designed.  Uploading  and  adding  content  is  a  primary  function  of  the  IR,  and  the  trigger  needed  to   be  highly  noticeable.  We  can  assume  that  users  would  be  goal-­‐based  in  their  approach  to  the  IR;   faculty  would  be  visiting  the  site  with  the  specific  purpose  of  uploading  content  and  would  be   actively  looking  for  a  trigger  to  begin  an  interaction  that  would  allow  them  to  do  so.     The  initial  design  of  RO@M  included  a  top-­‐level  menu  item  as  the  only  trigger  for  contributing   works.  In  the  persistent  navigation  at  the  top  of  the  site,  users  could  click  on  the  menu  item   labeled  “Contribute”  where  they  would  then  be  presented  with  a  login  screen  to  begin  the   contribution  process.  This  was  immediately  obvious  to  half  of  the  participants  during  user  testing.   However,  the  other  half  immediately  clicked  on  the  word  “Share,”  which  appeared  on  the  lower   half  of  the  page  beside  a  small  icon  simply  as  a  way  to  add  some  aesthetic  appeal  to  the  homepage   along  with  the  words  “Discover”  and  “Preserve.”  Not  surprisingly,  the  users  were  interpreting  the   word  and  icon  as  a  trigger.  Because  of  the  user  behavior  that  we  observed,  we  decided  to  add   hyperlinks  to  all  three  of  these  words,  with  “Share”  linking  to  the  contribution  login  screen  (see   figure  2),  “Discover”  leading  to  a  Browse  page,  and  “Preserve”  linking  to  an  FAQ  for  Authors  page   that  included  information  on  digital  preservation.  This  increased  visibility  of  the  trigger   significantly  for  the  microinteraction.     Figure  2.  “Share”  as  Additional  Trigger  for  Contributing  Works.     SELF-­‐ARCHIVING  WITH  EASE  IN  AN  INSTITUTIONAL  REPOSITORY  |  BETZ  AND  HALL     doi:  10.6017/ital.v34i3.5900   50   Rules   The  second  component  of  microinteractions  described  by  Saffer  are  the  rules.  Rules  are  the   parameters  that  govern  a  microinteraction;  they  provide  a  framework  of  understanding  to  help   users  succeed  at  completing  the  goal  of  a  microinteraction  by  defining  “what  can  and  cannot  be   done,  and  in  what  order.”19  While  users  don’t  need  to  understand  the  engineering  behind  a  library   self-­‐checkout  machine,  for  example,  they  do  need  to  understand  what  they  can  and  cannot  do   when  they’re  using  the  machine.  The  hardware  and  software  of  a  self-­‐checkout  machine  is   designed  to  support  the  rules  by  encouraging  users  to  scan  their  cards  to  start  the  machine,  to   align  their  books  or  videos  so  that  they  can  be  scanned  and  desensitized,  and  to  indicate  when   they  have  completed  the  interaction.   The  goal  when  designing  a  self-­‐archiving  process  in  RO@M  was  to  ensure  that  the  rules  were  easy   for  users  to  understand,  followed  a  logical  structure,  and  were  not  overly  complex.  To  this  end,  we   drew  on  Saffer’s  approach  to  designing  rules  for  microinteractions,  along  with  the  philosophy   espoused  by  Steve  Krug  in  his  influential  web  design  book,  Don’t  Make  Me  Think:  A  Common  Sense   Approach  to  Web  Usability.20  Both  Krug  and  Saffer  argue  for  reducing  complexity  and  removing   decision-­‐making  from  the  user  whenever  possible  to  remove  potential  for  user  error  or  mistakes.   The  rules  in  RO@M  follow  a  familiar  form-­‐based  approach:  users  log  in  to  the  system,  they  have  to   agree  to  a  licensing  agreement,  they  create  some  metadata  for  their  item,  and  they  upload  a  file   (see  figure  1).  However,  determining  the  order  for  each  of  these  elements,  and  ensuring  that  users   could  understand  how  to  fill  out  the  form  successfully,  required  careful  thinking  that  was  greatly   informed  by  the  user  testing  we  conducted.   For  example,  we  designed  RO@M  to  connect  to  the  same  authentication  system  used  for  other   university  applications,  ensuring  that  faculty  could  log  in  with  the  credentials  they  use  daily  for   institutional  email  and  network  access.  Forcing  faculty  to  create,  and  remember,  a  unique   username  and  password  to  submit  content  would  have  increased  the  possibility  of  login  errors   and  resulted  in  confusion  and  frustration.  We  also  used  drop-­‐down  options  where  possible   throughout  the  microinteraction  instead  of  requiring  faculty  to  input  data  such  as  file  types,   faculty  or  department  names,  or  content  types  into  free-­‐text  boxes.   During  our  user  testing  we  found  that  those  fields  where  we  had  free-­‐text  input  for  metadata   entry  most  often  led  to  confusion  and  errors.  For  instance,  it  quickly  became  apparent  that  name   authority  would  be  an  issue.  When  filling  out  the  “Author”  field,  some  people  used  initials,  some   used  middle  names,  and  some  added  “Dr”  before  their  name,  which  could  negatively  affect  the  IR’s   search  results  and  the  ability  to  track  where  and  when  these  works  may  be  cited  by  others.  When   asked  to  include  a  citation  for  published  works,  most  of  our  participants  noted  frustration  with   this  requirement  because  they  could  not  do  so  quickly,  and  they  had  concerns  about  creating   correct  citations.  Finally,  many  participants  also  became  confused  at  the  last,  optional  field  in  the   form  that  allowed  them  to  assign  a  creative  commons  license  to  their  works.     INFORMATION  TECHNOLOGIES  AND  LIBRARIES  |  SEPTEMBER  2015   51   Our  user  testing  indicated  that  we  would  need  to  be  mindful  of  how  information  like  author  names   and  citations  were  entered  by  users  before  making  an  item  available  on  the  site.  Under  ideal   circumstances,  we  would  have  modified  the  form  to  ensure  that  any  information  that  the  system   knew  about  the  user  was  brought  forward:  what  Saffer  calls  “don’t  start  from  zero.”21  This  could   include  automatically  filling  in  details  like  a  user’s  name.  However,  like  many  libraries,  we  chose  to   adapt  existing  software  rather  than  develop  our  microinteraction  from  the  ground  up;   implementing  such  changes  would  have  been  too  time-­‐consuming  or  expensive.  In  response,  we   instead  added  additional  workflows  to  allow  administrators  to  edit  the  metadata  before  a   contribution  would  be  published  to  the  web  so  we  could  correct  any  errors.  We  also  changed  the   “Citation”  field  to  “Publication  Information”  to  imply  that  users  did  not  need  to  include  a  complete   citation.  Lastly,  we  made  sure  that  “All  Rights  Reserved”  was  the  default  selection  for  the  optional   “Add  a  Creative  Commons  License?”  field  in  the  form  because  this  was  language  that  with  which   our  users  were  familiar  with  and  comfortable  proceeding.     Policy  constraints  are  another  aspect  of  the  rules  that  provide  structure  around  microinteractions,   and  can  also  limit  design  choices  that  can  be  made.  Having  faculty  complete  a  nonexclusive   licensing  agreement  that  acknowledged  they  had  the  appropriate  copyright  permissions  to  allow   them  to  contribute  the  work  was  a  required  component  in  our  rules.  Without  the  agreement,  we   would  risk  liability  for  copyright  infringement  and  could  not  accept  the  content  into  the  IR.   However,  our  early  designs  for  the  repository  included  this  step  at  the  end  of  the  submission   process,  after  faculty  had  created  metadata  about  the  item.  Our  initial  round  of  testing  revealed   that  several  of  our  participants  were  unsure  of  whether  they  had  the  appropriate  copyright   permissions  to  add  content  and  didn’t  want  to  complete  the  submission,  a  frustrating  experience   for  them  after  spending  time  filling  out  author  information,  keywords,  abstract,  and  the  like.  We   attempted  to  resolve  this  issue  by  moving  the  agreement  much  earlier  in  the  process,  forcing  users   to  acknowledge  the  agreement  before  creating  any  metadata.  We  also  used  simple,   straightforward  language  for  the  agreement  and  added  additional  information  about  how  to   determine  copyrights  or  contact  RO@M  staff  for  assistance.  Integrating  an  API  that  could   automatically  search  a  journal’s  archiving  policies  in  SHERPA  RoMEO  at  this  stage  in  the   contribution  process  is  something  we  plan  to  investigate  to  help  reduce  complexity  further  for   users.     Feedback   Understanding  the  concept  of  feedback  is  critical  to  the  design  of  microinteractions.  While  most   libraries  are  familiar  with  collecting  feedback  from  users,  the  feedback  Saffer  describes  is  flowing   in  the  opposite  direction,  and  instead  refers  to  feedback  the  application  or  interface  is  providing   back  to  users.  This  feedback  gives  users  information  when  and  where  they  need  it  to  help  them   navigate  the  microinteraction.  As  Saffer  comments,  “the  true  purpose  of  feedback  is  to  help  users   understand  how  the  rules  of  the  microinteraction  work.”22     SELF-­‐ARCHIVING  WITH  EASE  IN  AN  INSTITUTIONAL  REPOSITORY  |  BETZ  AND  HALL     doi:  10.6017/ital.v34i3.5900   52   Feedback  can  be  provided  in  a  variety  of  ways.  An  action  as  simple  as  a  color  change  when  a  user   hovers  over  a  link  is  a  form  of  feedback,  providing  visual  information  that  indicates  that  a  segment   of  text  can  be  clicked  on.  Confirmation  messages  are  an  obvious  form  of  feedback,  while  a  folder   with  numbers  indicating  how  many  items  have  been  added  to  it  is  more  subtle.  While  visual   feedback  is  most  commonly  used,  Saffer  also  describes  cases  where  auditory  and  haptic  (touch)   feedback  may  be  useful  .  Designing  feedback,  much  like  designing  rules,  should  aim  to  reduce   complexity  and  confusion  for  a  user,  and  should  be  explicitly  connected  both  functionally  and   visually  to  what  the  user  needs  to  know.   In  an  online  web  environment,  much  of  the  feedback  we  provide  the  user  should  be  based  on  good   usability  principles.  For  example,  formatting  web  links  consistently  and  providing  predictable   navigation  elements  are  some  ways  that  feedback  can  be  built  into  a  design.  Providing  feedback  at   the  users’  point  of  need  is  also  critical,  especially  error  messages  or  instructional  content.  This   proved  to  be  especially  important  to  our  RO@M  test  subjects.  While  the  IR  featured  an  “About”   section  accessible  in  the  persistent  navigation  at  the  top  of  the  website  that  contained  detailed   instructions  and  information  about  how  to  submit  works,  and  terms  of  use  governing  these   submissions,  this  content  was  virtually  invisible  to  the  users  we  observed.  Instead,  they  relied   heavily  on  the  contextual  feedback  that  was  included  throughout  the  contribution  process  when  it   was  visible  to  them.     These  observations  led  us  to  rethink  our  approach  to  providing  feedback  in  several  cases.  For   example,  an  unfortunate  constraint  of  our  software  required  users  to  select  a  faculty  or  school  and   a  department  and  then  click  an  “Add”  button  before  they  could  save  and  continue.  We  included   some  instructions  above  the  drop-­‐down  menus,  stating  “Select  and  click  Add”  in  an  effort  to   prevent  any  errors.  However,  our  participants  failed  to  notice  the  instructions  and  inevitably   triggered  a  brief  error  message  (see  figure  3).  We  later  changed  the  word  “Add”  in  the  instructions   from  black  to  bright  red  hoping  to  increase  its  visibility,  and  we  ensured  that  the  error  message   that  displayed  when  users  failed  to  select  “Add”  clearly  explained  how  to  correct  the  problem  and   move  on.  We  also  observed  that  the  plus  signs  to  add  additional  authors  and  keywords  were  not   visible  to  users.  We  added  feedback  that  included  both  text  and  icons  with  more  detail  (see  figure   4).  However,  this  remains  a  problem  for  users  that  we  will  need  to  further  explore.  On  completing   a  contribution,  users  receive  a  confirmation  page  that  thanks  them  for  the  contribution,  provides  a   timeline  for  when  the  item  will  appear  on  the  site,  and  notes  that  they  will  receive  an  email  when   it  appears.  Response  to  this  page  was  positive  as  it  succinctly  covered  all  of  the  information  the   users  felt  they  needed  to  know  having  completed  the  process.       INFORMATION  TECHNOLOGIES  AND  LIBRARIES  |  SEPTEMBER  2015   53     Figure  3.  Feedback  for  the  “Add”  Button.     Figure  4.  Feedback  for  Adding  Multiple  Authors  and  Keywords.     SELF-­‐ARCHIVING  WITH  EASE  IN  AN  INSTITUTIONAL  REPOSITORY  |  BETZ  AND  HALL     doi:  10.6017/ital.v34i3.5900   54   Modes  and  Loops   The  final  two  components  of  microinteractions  defined  by  Saffer  are  modes  and  loops.  Saffer   describes  a  mode  as  a  “fork  in  the  rules,”  or  a  point  in  a  microinteraction  where  the  user  is   exposed  to  a  new  process,  interface,  or  state.23  For  example,  Google  Scholar  provides  users  with  a   setting  to  show  “library  access  links”  for  participating  institutions  with  OpenURL  compatible  link   resolvers.24  Users  who  have  set  this  option  are  presented  with  a  search  results  page  that  is   different  from  the  default  mode  and  includes  additional  links  to  their  chosen  institution’s  link   resolver.  Our  microinteraction  includes  two  distinct  modes.  Once  logged  in,  users  can  choose  to   contribute  works  through  the  “Do  It  Yourself”  submission  that  we’ve  described  here  in  some   detail,  or  they  can  choose  “Let  Us  Do  It”  and  complete  a  simplified  version  that  requires  them  to   acknowledge  the  licensing  agreement,  upload  their  files,  and  provide  any  additional  data  they   chose  in  a  free-­‐text  box  (see  figure  5).  The  majority  of  our  testers  specified  that  they  would  opt  for   the  “Do  It  Yourself”  option  because  they  wanted  to  have  control  over  the  metadata  describing   their  work,  including  the  abstract  and  keywords.  However,  since  launching  the  repository,  several   submissions  have  arrived  via  the  “Let  Us  Do  It”  form,  which  suggests  a  reasonable  amount  of   interest  in  this  mode.     Figure  5.  The  “Let  Us  Do  It”  Form.   Loops,  on  the  other  hand,  are  simply  a  repeating  cycle  in  the  microinteraction.  A  loop  could  be  a   process  that  runs  in  the  background,  checking  for  network  connections,  or  it  could  be  a  more   visible  process  that  adapts  itself  on  the  basis  of  the  user’s  behavior.  For  example,  in  the  RO@M   submission  process  users  can  move  backward  and  forward  in  the  contribution  forms;  both  have     INFORMATION  TECHNOLOGIES  AND  LIBRARIES  |  SEPTEMBER  2015   55   “Previous”  and  “Save  and  Continue”  buttons  on  each  page  to  allow  users  to  navigate  easily.  The   final  step  on  the  “Do  it  Yourself”  form  allows  users  to  review  their  metadata  and  the  file  that  they   have  uploaded.  They  can  then  use  the  Previous  button  to  make  changes  to  what  they  have  entered   before  completing  the  submission.  Ideally,  users  would  be  able  to  edit  this  content  directly  from   this  review  page,  but  software  constraints  prevented  us  from  including  this  feature,  and  the   “Previous”  button  did  not  pose  any  major  challenges  for  our  testing  participants.  Another  example   of  a  loop  in  RO@M  is  a  “contribute  more  works”  button  embedded  in  the  confirmation  screen  that   takes  users  back  to  the  beginning  of  the  microinteraction.  This  feature  was  suggested  by  one  of   our  participants,  and  it  extends  the  life  of  the  microinteraction,  potentially  leading  to  additional   contributions.   DISCUSSION  AND  CONCLUSIONS   Focusing  on  the  details  of  the  self-­‐archiving  process  in  our  IR  provided  extremely  rich  qualitative   data  for  improving  the  user  interface,  while  analyzing  the  structure  of  the  microinteraction,   following  Saffer’s  model,  was  also  a  valuable  exercise  in  thinking  about  user  needs  and  software   design  from  a  different  perspective  from  standard  usability  studies.  The  improvements  we  made,   based  on  both  Saffer’s  theory  and  the  results  we  observed  through  testing,  added  significant   functionality  and  ease  of  use  to  the  self-­‐archiving  process  for  faculty.  Thinking  carefully  about   elements  like  placement  of  buttons,  small  changes  in  wording  or  flow,  and  timing  of  instructional   or  error  feedback  highlighted  the  big  effect  small  elements  can  have  on  usability.     However,  there  are  some  limitations  to  both  the  theory,  and  our  approach  to  testing  and   improving  the  IR  that  affect  how  well  we  can  understand  and  utilize  the  results.  Of  particular   concern  is  how  well  this  kind  of  testing  can  capture  the  UX  of  a  faculty  member  beyond  the  utility   or  ease  of  use  of  the  interaction.  In  an  observational  study  we  can  rely  on  comments  from   participants  and  key  statements  that  may  indicate  a  participant’s  emotional  or  affective  state,  but   we  didn’t  include  targeted  questions  to  gather  this  data  and  focused  instead  on  the  details  of  the   microinteraction.  We  didn’t  ask  how  they  felt  while  using  the  IR,  or  if  successfully  uploading  an   item  to  the  IR  gave  them  a  sense  of  autonomy  or  competence,  or  if  this  experience  would   encourage  them  to  submit  content  in  the  future.  Nevertheless,  improving  usability  is  a  solid   foundation  for  providing  a  positive  UX.  Hassahzhal  describes  the  difference  between  “do-­‐goals”   (completing  a  task)  and  “be-­‐goals”  (human  psychological  needs  like  being  competent,  or   developing  relationships).25  While  he  argues  that  “be-­‐goals”  are  the  ultimate  drivers  of  UX,  he  also   suggests  that  creating  tools  that  make  the  completion  of  do-­‐goals  easy  can  facilitate  the  potential   fulfillment  of  be-­‐goals  by  removing  barriers  and  making  their  fulfillment  more  likely.  Ultimately,   however,  a  range  of  user  testing  strategies  can  lead  to  improvements  in  a  user  interface,  whether   that  testing  relies  on  carefully  detailed  examination  of  a  microinteraction,  analysis  of  large  data   sets  from  Google  Analytics,  or  interviews  with  key  user  groups.  Microinteraction  theory  is  a  useful   approach,  and  valuable  in  its  conceptualization,  but  it  should  be  one  of  many  tools  libraries  adopt   to  improve  their  online  UX.     SELF-­‐ARCHIVING  WITH  EASE  IN  AN  INSTITUTIONAL  REPOSITORY  |  BETZ  AND  HALL     doi:  10.6017/ital.v34i3.5900   56   Similarly,  focusing  on  the  UX  of  IRs  must  be  only  one  of  many  strategies  institutions  employ  to   improve  rates  of  faculty  self-­‐archiving.  There  have  been  recent  studies  that  argue  that  regardless   of  platform  or  process,  faculty-­‐initiated  submissions  have  proven  to  be  uncommon.26  Instead,  they   suggest  that  sustainability  relies  on  marketing,  direct  outreach  with  individual  faculty  members,   and  significant  staff  involvement  in  identifying  content  for  inclusion,  investigating  rights,  and   depositing  on  authors’  behalf.  It  would  be  short  sighted  to  suggest  that  relying  solely  on  designing   a  user-­‐friendly  website,  or  only  developing  savvy  promotional  and  outreach  efforts,  can  determine   the  ongoing  success  of  an  IR  initiative.  Gaining  and  maintaining  support  is  an  ongoing,   multifaceted  process,  and  largely  depends  on  the  academic  culture  of  an  institution  as  well  as   available  financial  and  staffing  resources.  As  such,  user  testing  offers  qualitative  insights  into  ways   that  processes  and  functions  might  be  improved  to  enhance  the  viability  of  IR  initiatives  in  tandem   with  a  variety  of  marketing  and  outreach     REFERENCES     1.     “Welcome  to  ROARMAP,”  University  of  Southampton,  2014,  http://roarmap.eprints.org.   2.     Dorothea  Salo,  “Innkeeper  at  the  Roach  Motel,”  Library  Trends  57,  no.  2  (2008):  98,   http://muse.jhu.edu/journals/library_trends.     3.     Ibid.,  100.   4.     “The  Directory  of  Open  Access  Repositories—OpenDOAR,”  University  of  Nottingham,  UK,   2014,  http://www.opendoar.org.   5.     Effie  L-­‐C  Law  et  al.,  “Understanding  Scoping  and  Defining  User  eXperience:  A  Survey   Approach,”  Computer-­‐Human  Interaction  2009:  User  Experience  (New  York:  ACM  Press,  2009),   719.   6.     Marc  Hassenzahl,  “User  Experience  (UX):  Towards  an  Experiential  Perspective  on  Product   Quality,”  Proceedings  of  the  20th  International  Conference  of  the  Association  Francophone   d’Interaction  Homme-­‐Machine  (New  York:  ACM  Press,  2008),  11,   http://dx.doi.org/10.1145/1512714.1512717.     7.     Marc  Hassenzahl,  Sarah  Diefenbach,  and  Anja  Göritz,  “Needs,  Affect,  and  Interactive  Products:   Facets  of  User  Experience,”  Interacting  with  Computers  22,  no.  5  (2010):  353–62,   http://dx.doi.org/10.1016/j.intcom.2010.04.002.   8.     International  Standards  Organization,  Human-­‐Centred  Design  for  Interactive  Systems,  ISO   9241-­‐210  (Geneva:  ISO,  2010),  section  2.15.     9.     See  Philip  M.  Davis  and  Matthew  J.L.  Connolly,  “Institutional  Repositories:  Evaluating  the   Reasons  for  Non-­‐use  of  Cornell  University’s  Installation  of  DSpace,”  D-­‐Lib  Magazine  13,  no.   3/4  (2007),  http://www.dlib.orghttp://www.dlib.org;  Ellen  Dubinsky,  “A  Current  Snapshot  of   Institutional  Repositories:  Growth  Rate,  Disciplinary  Content  and  Faculty  Contributions,”     INFORMATION  TECHNOLOGIES  AND  LIBRARIES  |  SEPTEMBER  2015   57     Journal  of  Librarianship  &  Scholarly  Communication  2,  no.  3  (2014):  1–22,   http://dx.doi.org/10.7710/2162-­‐3309.1167;  Anthony  W.  Ferguson,  “Back  Talk—Institutional   Repositories:  Wars  and  Dream  Fields  to  Which  Too  Few  Are  Coming,”  Against  the  Grain  18,  no.   2  (2006):  86–85,   http://docs.lib.purdue.edu/atg/vol18/iss2/14http://docs.lib.purdue.edu/atg/vol18/iss2/14;   Salo,  “Innkeeper  at  the  Roach  Motel”;  Feria  Wirba  Singeh,  A  Abrizah,  and  Noor  Harun  Abdul   Karim,  “What  Inhibits  Authors  to  Self-­‐Archive  in  Open  Access  Repositories?  A  Malaysian  Case,”   Information  Development  29,  no.  1  (2013):  24–35,   http://dx.doi.org/0.1177/0266666912450450.   10.   Hyun  Hee  Kim  and  Yong  Ho  Kim,  “Usability  Study  of  Digital  Institutional  Repositories,”   Electronic  Library  26,  no.  6  (2008):  863–81,  http://dx.doi.org/10.1108/02640470810921637.   11.    Lena  Veiga  e  Silva,  Marcos  André  Gonçalves,  and  Alberto  H.  F.  Laender,  “Evaluating  a  Digital   Library  Self-­‐Archiving  Service:  The  BDBComp  User  Case  Study,”  Information  Processing  &   Management  43,  no.  4  (2007):  1103–20,  http://dx.doi.org/10.1016/j.ipm.2006.07.023.   12.    Suzanne  Bell  and  Nathan  Sarr,  “Case  Study:  Re-­‐Engineering  an  Institutional  Repository  to   Engage  Users,”  New  Review  of  Academic  Librarianship  16,  no.  S1  (2010):  77–89,   http://dx.doi.org/10.1080/13614533.2010.5095170.   13.    Dan  Saffer,  Microinteractions:  Designing  with  Details  (Cambridge,  MA:  O’Reilly,  2013),  2.   14.    Ibid.,  3.   15.    Ibid.,  2.   16.    Ibid.,  142.   17.    Jakob  Nielson,  “How  Many  Test  Users  in  a  Usability  Study?”  Nielsen  Norman  Group,  2012,   http://www.nngroup.com/articles/how-­‐many-­‐test-­‐users.     18.    Saffer,  Microinteractions,  48.   19.    Ibid.,  82.   20.    Steve  Krug,  Don’t  Make  Me  Think:  A  Common  Sense  Approach  to  Web  Usability  (Berkeley,  CA:   New  Riders,  2000).   21.    Saffer,  Microinteractions,  64.   22.    Ibid.,  86.   23.   Ibid.,  111.   24.    “Library  Support,”  Google  Scholar,  http://scholar.google.com/intl/en-­‐ US/scholar/libraries.html.       SELF-­‐ARCHIVING  WITH  EASE  IN  AN  INSTITUTIONAL  REPOSITORY  |  BETZ  AND  HALL     doi:  10.6017/ital.v34i3.5900   58     25.    Hassahzhal,  “User  Experience,”  10–15.   26.    See  Dubinsky,  “A  Current  Snapshot  of  Institutional  Repositories,”  1–22;  Shannon  Kipphut-­‐ Smith,  “Good  Enough:  Developing  a  Simple  Workflow  for  Open  Access  Policy  Implementation,”   College  &  Undergraduate  Libraries  21,  no.  3/4  (2014):  279–94.   http://dx.doi.org/10.1080/10691316.2014.932263.   5912 ---- Microsoft Word - march_ital_dehmlow.docx Editorial  Board  Thoughts     A&I  Databases:  the  Next  Frontier     to  Discover   Mark  Dehmlow       INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  MARCH  2015     1   I  think  it  is  fair  to  say  that  the  discovery  technology  space  is  a  relatively  mature  market  segment,   not  complete,  but  mature.    Much  of  the  easy-­‐to-­‐negotiate  content  has  been  negotiated,  and  many   of  the  systems  on  the  market  are  above  or  approaching  a  billion  records.    This  would  seem  a  lot,   but  there  is  a  whole  slice  of  tremendously  valuable  content  still  not  fully  available  across  all   platforms,  namely  the  specialized  subject  abstracting  and  indexing  database  content.    This  content   has  a  lot  of  significant  value  for  the  discovery  community—many  of  those  databases  go  further   back  than  content  pulled  from  journal  publishers  or  full-­‐text  databases.    Equally  as  important  is   that  they  represent  an  important  portion  of  humanities  and  social  sciences  content  that  is  less   represented  in  discovery  systems  as  compared  to  STEM  content.    For  vendors  of  A&I  content,  the   concerns  are  clear  and  realistic,  differently  from  journal  publishers  whose  metadata  is  meant  to   direct  users  to  their  main  content  (full  text),  the  metadata  for  A&I  publishers  is  the  main  content.     According  to  a  recent  NFAIS  report,  a  major  concern  for  them  is  that  if  they  include  their  content   in  discovery  systems,  they  “risk  loss  of  brand  awareness”  and  the  implications  are  that  institutions   will  be  more  likely  to  cancel  those  subscriptions.1    The  focus  therefore  seems  to  have  been  how  to   optimize  the  visibility  of  their  content  in  discovery  systems  before  being  willing  to  share  it.       In  addition  to  the  NFAIS  report,  some  of  the  conversations  I  have  seen  on  the  topic  seem  to  focus   on  wanting  discovery  system  providers  to  meet  a  more  complex  set  of  requirements  that  will   maximize  leveraging  the  rich  metadata  contained  in  those  resources,  the  idea  being  that  utilizing   that  metadata  in  specific  ways  will  increase  the  visibility  of  the  content.    In  principle  I  think  it  is  a   commendable  goal  to  maximize  the  value  of  the  comprehensive  metadata  A&I  records  contain,   and  the  complexities  of  including  A&I  data  into  discovery  systems  need  to  be  carefully  considered   -­‐  namely  blending  multiple  subject  and  authority  vocabularies,  and  ensuring  that  metadata   records  are  appropriately  balanced  with  full  text  in  the  relevancy  algorithm.  But  I  also  worry  that   setting  too  many  requirements  that  are  too  complicated  will  lead  to  delayed  access  and  biased   search  results.    It  is  important  that  this  content  is  blended  in  a  meaningful  way,  but  determining   relevancy  is  a  complex  endeavor,  and  it  is  critically  important  for  relevancy  to  be  unbiased  from   the  content  provider  perspective  and  instead  focus  on  the  user,  their  query,  and  the  context  of   their  search.       Another  concern  that  I  have  heard  articulated  is  that  results  in  discovery  services  are  unlikely  to     be  as  good  as  native  A&I  systems  because  of  the  already  mentioned  blending  issues.    This  is  likely     Mark  Dehmlow  (mark.dehmlow@nd.edu),  a  member  of  the  ITAL  Editorial  Board,  is  Program   Director,  Library  Information  Technology,  University  of  Notre  Dame,  South  Bend,  IN.       EDITORIAL  BOARD  THOUGHTS:  A&I  DATABASES  |  DEHMLOW     2   to  be  true,  but  I  think  it  is  critical  to  focus  on  the  purpose  of  discovery  systems.    As  Donald   Hawkins  recently  wrote  in  a  summary  of  a  workshop  called  “Information  Discovery  and  the   Future  of  Abstracting  and  Indexing  Services,”  “A&I  services  provide  precision  discipline-­‐specific   searching  for  expert  researchers,  and  discovery  services  provide  quick  access  to  full  text.”2     Hawkins  indicates  that  discovery  systems  are  not  meant  to  be  sophisticated  search  tools,  but   rather  a  quick  means  to  search  a  broad  range  of  scholarly  resources  and  I  think  sometimes  a  quick   starting  point  for  researchers.    Because  of  the  nature  of  merging  billions  of  scholarly  records  into  a   single  system,  discovery  systems  will  never  be  able  to  provide  the  same  experience  as  a  native  A&I   system,  nor  should  they.    Over  time,  they  may  become  better  tuned  to  provide  a  better  overall   experience  for  the  three  different  types  of  searchers  we  have  in  higher  education:  novice  users  like   undergraduates  looking  for  a  quick  resource,  advanced  users  like  graduate  students  and  faculty   looking  for  more  comprehensive  topical  coverage,  and  expert  users  like  librarians  who  want   sophisticated  search  features  to  hone  in  on  the  perfect  few  resources.    Many  of  the  discovery   systems  are  working  on  building  these  features,  but  the  industry  will  take  time  to  solve  this   problem,  and  I  tend  to  look  at  things  from  the  lense  of  our  end  users—non-­‐inclusion  of  this   content  directly  impacts  their  overall  discovery  experience.   One  might  ask,  if  the  discovery  system  experience  isn’t  as  precise  and  complete  as  the  native  A&I   experience,  why  bother?    In  addition  to  broadening  the  subject  scope  by  including  many  of  the   more  narrow  and  deep  subject  metadata,  there  is  also  the  importance  of  serendipitous  finding.     That  content,  in  the  context  of  a  quick  user  search,  may  drive  the  user  to  just  the  right  thing  that   they  need.    In  addition,  my  belief  is  that  with  that  content,  we  can  build  search  systems  that  are   deeper  than  Google  Scholar,  and  by  extension  provide  our  end  users  with  a  superior  search   experience.    And  so  I  advocate  for  innovating  now  instead  of  waiting  to  work  out  all  of  the  details.     I  am  not  suggesting  moving  forward  callously,  but  swiftly.    The  work  that  NISO  has  done  on  the   Open  Data  Initiative  has  resulted  in  some  good  recommendations  about  how  to  proceed.    For   example,  they  have  suggested  two  usage  metrics  that  could  be  valuable  for  measuring  A&I  content   use  in  discovery  systems:  search  counts  (by  collection  and  customer  for  A&I  databases)  and   results  clicks  (number  of  times  an  end  user  clicks  on  a  content  provider’s  content  in  a  set  of   results).3     While  I  think  these  types  of  metrics  are  aligned  with  the  types  of  measures  that  libraries  evaluate   A&I  database  usage  by,  I  think  at  the  same  time  they  don’t  really  say  much  about  the  overall  value   of  the  resources  themselves.    Sometimes  in  the  library  profession,  our  obsession  for  counting  stuff   loses  connection  with  collecting  metrics  that  actually  say  something  about  impact.    Of  the  two   counts,  I  could  see  perhaps  counting  the  result  clicks  as  having  more  value.    In  this  instance,   knowing  that  a  user  found  something  of  interest  from  a  specific  resource  at  the  very  least  indicates   that  it  led  the  user  some  place.    I  think  the  measure  of  search  counts  by  collection  is  less  useful.    At   best  it  indicates  that  the  resource  was  searched,  but  it  tells  us  nothing  about  who  was  searching   for  an  item,  what  they  found,  or  what  they  subsequently  did  with  the  item  once  they  found  it.    I  do   think  we  in  libraries  need  to  consider  the  bigger  picture.    Regardless  of  the  number  of  searches   INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  MARCH  2015     3   (which  doesn’t  really  tell  us  anything  anyway),  we  need  to  recognize  the  value  alone  of  including   the  A&I  content,  and  instead  of  trying  to  determine  the  value  of  the  resource  by  the  number  of   times  it  was  searched,  focus  more  on  the  breadth  of  exposure  that  content  is  getting  by  inclusion   in  the  discovery  system.   I  think  a  more  useful  technical  requirement  for  discovery  providers  would  be  to  provide  pathways   to  specific  A&I  resources  within  the  context  of  a  user’s  search—not  dissimilar  to  how  Google   places  sponsored  content  at  the  top  of  their  search  results,  a  kind  of  promotional  widget.    In  this   case,  using  metadata  returned  from  the  query,  the  systems  could  calculate  which  one  or  two   specific  resources  would  guide  the  user  to  more  in  depth  research.    By  virtue  of  inclusion  of  the   resource  in  the  discovery  system,  those  resources  could  become  part  of  the  promotional  widget.     This  would  guide  users  back  to  the  native  A&I  resource  which  both  libraries  and  A&I  providers   want,  and  it  would  do  that  in  a  more  intuitive  and  meaningful  way  for  the  end  user.   All  of  the  parties  involved  in  the  discovery  discussion  can  bring  something  to  the  table  if  we  want   to  solve  these  issues  in  a  timely  way.    I  hope  that  A&I  publishers  and  discovery  system  providers   make  haste  and  get  agreements  underway  for  content  sharing  and  I  would  recommend  that   instead  of  focusing  on  requiring  finished  implementations  based  in  complex  requirement  before   loading  content,  both  of  them  should  instead  focus  on  some  achievable  short  and  long  term  goals.     Integrating  A&I  content  perfectly  will  take  some  time  to  complete  and  the  longer  we  wait,  the   longer  our  users  have  a  sub-­‐optimal  discovery  experience.    Discovery  providers  need  to  make  long   term  commitments  to  developing  mechanisms  that  satisfy  usage  metrics  for  A&I  content,  although   I  would  recommend  defining  measures  that  have  true  value.    A&I  providers  should  be  measured  in   their  demands:  while  their  stakes  in  system  integration  is  real,  there  runs  a  risk  of  content   providers  vying  for  their  content  to  be  preferred  when  relevancy  neutrality  is  paramount  for  a   discovery  system  to  be  effective.    I  think  it  is  worth  lauding  the  efforts  of  a  few  trailblazing  A&I   publishers  such  as  Thomson  Reuters  and  ProQuest  who  have  made  agreements  with  some  of  the   discovery  providers  and  are  sharing  their  A&I  content  already,  providing  some  precedent  for   sharing  A&I  content.    Lastly,  libraries  and  knowledge  workers  need  to  develop  better  means  for   calculating  overall  resource  value,  moving  beyond  strict  counts  to  thinking  of  ways  to  determine   the  overall  scholarly/pedagogical  impact  of  those  resources  and  they  need  to  make  the  fact  alone   that  an  A&I  publisher  shares  its  data  with  a  discovery  provider  indicate  significant  value  for  the   resource.                       EDITORIAL  BOARD  THOUGHTS:  A&I  DATABASES  |  DEHMLOW     4     REFERENCES     1.    NFAIS,  Recommended  Practices:  Discovery  Systems.  NFAIS,  2013.   https://nfais.memberclicks.net/assets/docs/BestPractices/recommended_practices_final_aug_ 2013.pdf.     2.    Hawkins,  Donald  T.,    “Information  Discovery  and  the  Future  of  Abstracting  and  Indexing   Services:  An  NFAIS  Workshop.”    Against  the  Grain.    ,  2013.  http://www.against-­‐the-­‐ grain.com/2013/08/information-­‐discovery-­‐and-­‐the-­‐future-­‐of-­‐abstracting-­‐and-­‐indexing-­‐ services-­‐an-­‐nfais-­‐workshop/.   3.    Open  Discovery  Initiative  Working  Group,  Open  Discovery  Initiative:  Promoting  Transparency  in   Discovery.    Baltimore:  NISO,  2014.   http://www.niso.org/apps/group_public/download.php/13388/rp-­‐19-­‐2014_ODI.pdf.   8650 ---- Data Center Consolidation at the University at Albany Rebecca L. Mugridge and Michael Sweeney INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2015 18 ABSTRACT This paper describes the experience of the University at Albany (UAlbany) Libraries’ migration to a centralized University data center. Following an introduction to the environment at UAlbany, the authors discuss the advantages of data center consolidation. Lessons learned from the project include the need to participate in the planning process, review migration schedules carefully, clarify costs of centralization, agree on a service level agreement, communicate plans to customers, and leverage economies of scale. INTRODUCTION Data centers are facilities that house servers and related equipment and systems. They are distinct from data repositories, which collect various forms of research data, although some data repositories are occasionally called data centers. Many colleges and universities have data centers or server rooms distributed across one or more campuses, as does the University at Albany (UAlbany). This paper reports on the experiences of the Libraries at UAlbany as the Libraries’ application and storage servers were consolidated into a new, state-of-the-art, university data center in a new building on campus. The authors discuss the advantages of consolidation, the planning process for the actual move, and lessons learned from the migration. BACKGROUND The University at Albany is one of four university centers that are part of the State University of New York (SUNY) system. Founded in 1844, UAlbany has approximately 13,000 undergraduates, 4,500 graduate students, and more than 1,000 faculty members. It offers 118 undergraduate majors and minors, and 138 master’s, doctoral, and certificate programs. UAlbany resides on three campuses: Uptown (the main campus), Downtown, and East.1 The Uptown campus was built in the 1960s on grounds formerly owned by the Albany Country Club. The campus was designed by noted architect Edward Durell Stone in 1962–63 and was built in 1963–64. The campus buildings include four residential quadrangles surrounding a central “Academic Podium” consisting of thirteen three-story buildings connected on the surface by an overhanging canopy and below ground by a maze of tunnels and offices. Many of the university’s classrooms, lecture halls, academic and operational offices, and infrastructure are housed within the podium on the basement or subbasement levels. This includes the university’s original data center, which is located in a basement room in the center of the podium. Rebecca L. Mugridge (rmugridge@albany.edu) is Interim Dean and Director and Associate Director for Technical Services and Library Systems, and Michael Sweeney (msweeney2@albany.edu) is Head, Library Systems Department, University Libraries, University at Albany, Albany, New York. mailto:rmugridge@albany.edu mailto:msweeney2@albany.edu DATA CENTER CONSOLIDATION AT THE UNIVERSITY OF ALBANY | MUGRIDGE AND SWEENEY | doi: 10.6017/ital.v34i4.8650 19 While visually striking and unique, the architectural design of the podium presented many challenges since its construction, one of which is regular flooding of the basement and sub- basement levels. The original data center was flooded many times, to the extent that any heavy rainstorm had the potential to disrupt functionality and connectivity. When the university was first built in the 1960s it was not known to what extent computing would become part of the university’s infrastructure, and the room that the data center was housed in was not built to today’s standards for environmental control, such as the need for cooling. At the same time, server rooms sprouted all over the university, with many of the colleges and other units purchasing servers and maintaining server rooms in less than ideal conditions. These included server rooms in the College of Arts and Sciences, the School of Business, the Athletics Department, the University Libraries, and many other units. University Libraries’ Server Room The University Libraries maintained its own server room with two racks full of equipment that supported all of the Libraries’ computing needs. These servers supported our website, MSSQL and MySQL databases, EZProxy, ILLIAD (interlibrary loan service), Ares (electronic reserve service), and our search engine appliance (Google Mini). They also included our domain controller, intranet, and several servers used for backup. Two servers and a storage area network housed our virtual environment, containing an additional nine virtual servers. These included servers to support library blogs, wikis, file storage, development and test servers, and additional backup servers. The only library servers not housed in the Libraries’ server room were the integrated library system (ILS) servers that were maintained primarily by the university’s Information Technology Services (ITS) staff, our backup domain controllers, and a server holding backups of our virtual servers. The ILS production server was housed in UAlbany’s data center and the ILS test/backup server was housed in the alternate data center in another building on campus. Also, two of the Libraries’ backup servers for other applications were housed in the university data center. The Libraries’ server room consisted of a 340 square foot room on the third floor of the main campus library that was networked to support servers housed in two racks protected by a fire suppression system. There were two ceiling-mounted air conditioning units that cooled the room sufficiently for optimum performance. The Libraries’ Windows System Administrator’s office was nearby and had a connecting door to the server room, giving him ready access to the servers when needed. Data Center Consolidation Data center consolidation is defined as “an organization's strategy to reduce IT assets by using more efficient technologies. Some of the consolidation technologies used in data centers today include server virtualization, storage virtualization, replacing mainframes with smaller blade server systems, cloud computing, better capacity planning and using tools for process automation.”2 In addition to the investigation and use of these technologies, the planning for a INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2015 20 new data center often involves the construction of a new building or the renovation of a current building. There were several drivers behind the UAlbany’s decisions to build a new data center. In addition to the concerns mentioned above about the potential flooding risk of the current data center, the ability to manage optimum temperature was also a factor. The current data center was built to house 1960s-era equipment and was not able to keep up with the cooling requirements of the more extensive computing equipment in use in the twenty-first century. The current data center also occupied what is considered prime real estate at the university, at the center of campus and near the Lecture Center, which experiences high foot traffic during the academic year. The new data center was constructed near the edge of campus, with little foot or auto traffic, allowing the space previously occupied by equipment to be repurposed in a way that better meets the university’s needs. Like many other universities, UAlbany is increasingly making use of cloud computing capabilities. For example, the email and calendaring system are cloud-based. Nevertheless, this movement is being made in a deliberate and thoughtful way, leaving many of our administrative computing needs reliant on the use of physical servers. UAlbany and the Libraries have decreased the number of physical servers necessary by relying on a virtualized environment, and part of the project to move to the new data center included a conversion from physical to virtual servers. The Libraries’ ILS production and test servers remain physical, as do several of the other Libraries’ application servers. Many of the Libraries’ backup servers are now virtual. While there was no official mandate to consolidate all of the distributed server rooms across campus into the new data center, everyone involved understood that this was a direction the university administration supported. The Libraries’ dean and director also supported this effort on behalf of the Libraries and charged Libraries’ staff to collaborate with ITS to make this happen. Some of the drivers behind this decision include the promise of a better environment, improved security, backup generators for computing equipment, the use of ITS’s virtual environment, the automation of server management, a faster network, the ability to repurpose the Libraries’ server room, and more. These drivers are described in more detail later in this paper. Construction planning for UAlbany’s new data center began in the mid-2000s and included the identification of funding and the architectural design of the new building, later to be named the Information Technology Building (ITB). The actual construction began in 2013, with an estimated completion date of February 2014 and occupancy in April 2014. Unexpected challenges during construction delayed the timeline somewhat, and the construction was not completed until May 2014. The certificate of occupancy was granted in fall 2014. The data center is certified as Tier III by Uptime Institute,3 and the building is designated LEED Gold. DATA CENTER CONSOLIDATION AT THE UNIVERSITY OF ALBANY | MUGRIDGE AND SWEENEY | doi: 10.6017/ital.v34i4.8650 21 Alternate Data Center Simultaneously with the construction of the new data center, the university entered into an agreement with another SUNY institution to house our alternate data center. This center was originally housed in another building on the UAlbany campus, less than a mile from both the main data center and ITB, in a building leased by UAlbany. This situation left some environmental issues out of our control, not an ideal situation. For example, an air conditioner failure in fall 2013 caused our backup and test ILS server to be down for six days, affecting our ability to use that server for other purposes and holding up several projects. In addition, data center best practice calls for an alternate data center to be housed at a distance from the main data center. In February of 2014, the servers in the alternate data center were moved to their new location. This included the Libraries’ backup ILS server as well as two backup servers formerly housed in the main data center. Advantages to the Libraries Moving to the University Data Center There were many advantages to the Libraries moving to a centralized data center. Many of these advantages also applied to the other units considering a move to the new data center, but for the purposes of this paper, we are addressing them in the context of the Libraries’ experience. Repurpose Space The Libraries’ server room occupied a large office that could be repurposed to house multiple staff offices or student spaces. The Libraries have many group study rooms available for student use; however, they are in great demand, and the possibility of gaining more space for student use was seen as an advantage to making the move to a new data center. Climate Control The new data center is built on a raised floor that allows better air circulation. Hundreds of servers and other pieces of equipment create a lot of excess heat, and raised floor construction allows for better circulation of air. New racks have chimneys that exhaust heat from high-density computing environments. Air conditioners supply a constant stream of air that will maintain the optimum temperature for computing equipment. Censors continually monitor humidity and keep it at an optimal level. This was an improvement over the Libraries’ current server room, which had sufficient air conditioning for our relatively small number of physical servers but did not have backup generators to keep equipment running during a power outage. Backup Generators The new data center was built with two backup generators. If the building suddenly loses power, the backup generators will immediately start and provide a seamless source of energy. A secondary benefit to the university is that the backup generators can also provide a source of energy to other buildings on that side of campus; this area did not previously have a backup source of energy. Again, the Libraries’ server room did not have a redundant electrical supply. In INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2015 22 the event of a power outage, battery units would allow the servers to shut down properly if the outage lasted more than forty-five minutes. Security With server rooms scattered all over the university, security issues were a concern. Now that the servers are housed in one location, the university can provide a highly secure environment in a more cost effective way. The new data center has card-swipe access to the building and biometric access to the data center itself. There are also cameras installed in the building as a further security measure. Virtual Environment Although the Libraries have made strides toward moving into a virtualized environment in the past few years, we had many constraints on our ability to keep up with developments. The Libraries’ virtual environment was two versions behind UAlbany’s virtual environment, and the storage needs of the Libraries’ virtual environment were at capacity. Part of the incentive to moving into the new data center was the ability to downsize some of our physical equipment and migrate some of our physical servers to virtual equivalents. Automation of Server Management One of the benefits of consolidating servers into one environment is that they are in a secure location, but it is still possible to manage them from a distance. The virtual environment has a web-based console that allows system administrators to connect and manage them, and the physical servers can be managed over the network as well. Even though the servers are centralized, our system administrator can work from an office in the library, or from home if needed. Faster Network Part of the project to construct a new data center included the installation of an additional fiber network across campus. The new fiber network connects all buildings on campus with each other and the new data center. All of the network equipment was upgraded, providing faster connections and response time. The additional fiber network is fault tolerant: if the primary network fails, the second fiber network can immediately take its place with no loss of service. Staging and Work Room The new data center was designed to include a staging and work room. This can be used by any of the system administrators who are responsible for equipment housed in the data center, and it allows them to work on equipment in a room adjacent to the locked and secure data center. DATA CENTER CONSOLIDATION AT THE UNIVERSITY OF ALBANY | MUGRIDGE AND SWEENEY | doi: 10.6017/ital.v34i4.8650 23 Equipment Inventory Part of the planning for the migration involved creating a detailed inventory of equipment. The Libraries already had a server inventory, but the information collected for the migration went far beyond just a list of the servers. This helped us identify who was responsible for physical and virtual servers and who was responsible for the services and applications that ran on those servers. Creating the equipment inventory also allowed us to consolidate and decommission equipment that was no longer needed, and additionally helped us determine a prioritization and time line for the move. Applications Inventory In addition to creating an equipment inventory, the Libraries created an applications inventory that included information about the dependencies that applications had on each other. For example, the Libraries’ electronic resources reserve application (Ares) had been integrated into the university’s course management system, Blackboard. That meant that when Blackboard was inaccessible, Ares was as well. All of these dependencies had to be taken into account when planning the schedule for the move. Disadvantages to the Libraries Moving to the University Data Center The Libraries have noted few disadvantages to moving to the new data center. What might seem like disadvantages are in reality just a change in the way we do our work. For example, we have been asked to inform someone in ITS before we go to the new data center to work on a server. This is a simple step, and has not hindered our work at any time. Another change is the need to use a tool created by ITS to configure our virtual servers, and we found that the tool has been configured to give us fewer administrative options than what ITS staff have. This has reinforced our understanding that we need to be present and proactive in representing the Libraries’ interests in managing all of our computing equipment and software. Migration Days The majority of UAlbany’s servers were moved from the main data center to the new one in ITB on August 9, 2014. However, we were unwilling to move all of the Libraries’ servers on that day, which fell in the middle of the summer session. A compromise was reached between the Libraries and ITS that allowed many of the Libraries’ less mission-critical servers to be moved on the same day as the university’s servers. These servers were primarily ones that were used for development and backup purposes, one exception being the server that supported the Libraries’ electronic reserves service. This server was dependent on the university-supported Blackboard server, which was being moved on August 9, so the Libraries’ agreed to move this server that day so there would not be two downtimes for the electronic reserve system. The Libraries’ most critical servers were moved to ITB on August 18, 2014. This was the first day of intersession and would affect students and faculty the least. There were many people involved INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2015 24 in the move, including the Library Systems staff, the migration consulting firm staff, the professional moving company that was hired to carry out the move, and ITS staff who were responsible for the network and other support. Move activities included shutting down and backing up applications, powering off the servers, and packing the equipment. At ITB the equipment was unpacked, placed in its assigned rack location, plugged in, and powered on. Then each server had to be started, and applications tested. All of this activity began at 3:00 a.m. and continued until early afternoon. The day concluded with a conference call between all parties involved to confirm that everything was up and running as expected. Lessons Learned Participate in the Process The Libraries were invited to participate in the planning for a new data center early in the process. ITS, UL, and other units with significant server collections met and discussed their computing needs and respective computing infrastructures. Once the construction of ITB began, the planning ramped up and monthly meetings of stakeholders became weekly meetings. Agendas for these meetings included round robin reports about • construction project oversight; • migration consulting; • partnerships (with other units on campus, including the Libraries); • status of our alternate data center (housed 10 miles away at another SUNY institution); • campus fiber network; • internal wiring and network design; • administrative computing planning and move; • research computing planning and move; • systems management (storage and virtual environment) planning and move; • data center advisory structure; and • campus notification and public relations. These meetings gave us an opportunity to learn about and understand all aspects of the data center migration project. Participants reviewed project timelines and other documents that were housed on a shared wiki space. After the data center migration consultants were hired, they began to use the Microsoft OneDrive collaboration space to share and distribute documents. Meeting regularly with all project participants allowed us to ask questions to clarify priorities and timelines and to advocate for the Libraries’ needs. Review Schedules Carefully As with many construction projects, unexpected delays in the construction of the data center delayed all of our plans. Originally the building was to be completed in February; this was later DATA CENTER CONSOLIDATION AT THE UNIVERSITY OF ALBANY | MUGRIDGE AND SWEENEY | doi: 10.6017/ital.v34i4.8650 25 changed to April and then May. After the construction was complete, the building had to be commissioned, which means that every system within the building had to be tested independently by outside inspectors. Coordination of this work is very time consuming, and the completion of the commissioning delayed occupancy by another few months. The university was finally given permission to move equipment into the data center in July. In the meantime, our consultants were working feverishly to develop timelines for the move, identify, and secure a contract with a professional IT moving company, and create “playbooks” for each move. The playbook is a document that includes • names and contact information of everyone involved with the move; • sequence of events: an hour-by-hour description of all activities; • server overview: including the name, make, model, rack location and elevation, and contact person for each server; and • schematics of both old and new server locations including details about each server as well as rack locations and elevations. Library staff became concerned when the original date scheduled to move most of the university’s application servers, including the Libraries’ ILS server, was in the middle of the summer session. Although the projected downtime was only to be twelve hours (and probably fewer), library staff were not willing to have twelve hours’ downtime during a short four-week summer session. There were concerns that downtime, not only to the online catalog, but also to all of the Libraries’ databases, the website, online reference service, electronic reserves, and other resources would present a severe hardship to faculty and students. We also recognized the risk, however small, of something going wrong during the move that would cause a lengthier downtime. At the same time the university was concerned about pushing the move too close to the start of the fall semester, as well as the increased cost of scheduling a second move date. During these negotiations it became apparent that the Libraries’ needs are different from administrative computing needs. Whereas the middle of a semester is a poor time for Libraries’ servers to experience downtime, it can be a better time for administrative computing, which is often busier during intersession when grading reports are being run and personnel databases are being updated. Ultimately, the Libraries advocated for and secured an agreement for a second move date, scheduled for the first work day after the end of the summer session. Similarly, ITS was encouraging all of its partners across the campus to move as much computing as possible into their virtual environment. This is a worthwhile goal, but again the Libraries had to negotiate to make this change according to the schedule best for the Libraries and its users. The ITS virtual environment was a more current release of the Virtual Machine (VM) software than the Libraries were using, so the Libraries were faced with not only a migration, but also an upgrade. Ultimately, we postponed the VM migration until after the physical migration, and we have INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2015 26 benefited from waiting. Other partners have had to work through a number of kinks in the process, and the Libraries’ VM migration has benefitted from the other partners’ experience. Clarify Costs of Centralization When UAlbany began to consider and plan for a centralized data center, one of the concerns raised by the various data center managers from units other than ITS was the cost of centralizing their servers in another location. Centralized data centers have many costs: heating, cooling, security, staffing, cleaning, backup energy sources, networking costs, and more. The question on everyone’s mind was who was going to pay for these costs. Would each unit have to pay toward the maintenance of the data center? Some objected to the idea of having to pay to be a tenant in a centralized data center, when they already had their own data center or server room at what seemed like no cost. The only cost they experienced was an opportunity cost of what else they could use the server room for. In the Libraries’ case, the server room could be used for group study, office space, or other purposes, but it did not cost the Libraries money to use it as a server room because utilities are covered centrally by the university. On the other hand, by migrating some of our computing to the ITS virtual environment, we may save money in the long run because we will not have to replace hardware and pay warranty fees. After much negotiation the university settled on a five-year commitment to no charges for the partnering units on campus, including the Libraries. This agreement was documented in a partnership agreement drafted by a group of representatives from all of the key units involved. Contribute to the Development of a Service Level Agreement Library staff contributed to the development of a service level agreement (SLA) for our participation in a centralized data center. Having an SLA in place ensures that all parties to the agreement understand their rights and responsibilities. We began by searching other universities’ websites for samples of SLAs, which we shared with ITS staff who were assigned to this project. The establishment of a centralized data center includes several major elements: Data Center as a Service (DCaaS), Infrastructure as a Service (Iaas), as well as the network that connects it all. The SLA that was developed, still in draft form, has elements that address the following: • The length of the agreement • Network uptime • Infrastructure as a service o Server/storage environment and technical support o Access to IaaS o File backup and retention o Maintenance of partner systems o ITS scheduled maintenance o Data ownership, security, responsibility, and integrity DATA CENTER CONSOLIDATION AT THE UNIVERSITY OF ALBANY | MUGRIDGE AND SWEENEY | doi: 10.6017/ital.v34i4.8650 27 o Business continuity, tiering, and disaster recovery o Availability and response time of ITS staff • Data center as a service o Environment and support o Building access and security o Physical rack space o Deliveries o Scheduled maintenance o Communications • Glossary We recommend that institutions considering data center consolidation projects complete their SLA and other agreements before moving servers into a shared environment. In our case, however, we were unable to finalize the SLA prior to the actual move. This was not because of any particular demands that the Libraries were making, but was primarily because of the rapid approach of the deadline for moving into the new data center. It had to be completed before the beginning of fall semester, and preferably with a few weeks to spare in case anything went wrong. While the planning for the data center construction and migration seemed to stretch over a long period of time, the final few months turned into a frenzy of activity that ranged from last-minute construction details to nailing down the exact order in which thousands of pieces of equipment would be moved. Although not every detail was ironed out at the time of the move, the intentions and spirit of the SLA have been documented and it will be completed during 2015. Communicate Developments and Plans During the planning and development for the data center migration project, we recognized that it would be important to communicate any changes to the Libraries’ systems availability to our users. ITS also recognized the need to communicate such changes. Both ITS and the Libraries took a many-pronged approach to communicating developments and plans related to the migration. Within the Libraries we shared updates at Library Faculty meetings as well as meetings of the Library Policy Group (the dean’s administrative policy team). We sought feedback from many groups on proposed move dates, establishing intersession as the preferred time to move any Libraries’ servers that would affect access to resources used by faculty or students. As the moves got closer the communication efforts were ramped up. Within the Libraries, we posted alerts on the Libraries’ webpage that linked to charts indicating what services would be unavailable and when. We also included slides on the Libraries’ main webpage with the same information. The same slide was posted on all three Libraries’ flat-screen monitors, on which we post important news and dates. We sent mass emails to all Libraries’ staff that reminded them when services would be down. Staff members who were responsible for specific services made an effort to contact their customers directly. For example, the head of Access Services contacted INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2015 28 faculty members about the scheduled interruptions of Ares, our electronic resources reserves system. Some of the downtime affected just users, and other downtime also affected staff who could not work in the ILS during the move. We planned alternate activities for staff members who could not work during the down time and had a productive Division Clean Up Day instead. ITS also made great efforts to communicate to the university community about the moves and any potential downtime. Their efforts included mass emails to all faculty, staff, and students. ITS created and posted slides to the Libraries’ flat screen monitors, as well as other monitors throughout the university. ITS also formed a team of liaisons from each school and college, using that group as yet another conduit to communicate changes. They shared draft schedules, seeking input on the effect of downtime on the university’s functions. Leverage Economies of Scale One of the challenges of maintaining a distributed data center environment is that each system administrator or unit had to manage its own servers singlehandedly. In the case of the Libraries at UAlbany, we had moved in the direction of using the power of virtualization to manage many of our servers. Virtualization refers to the process of creating virtual servers within one physical server, thereby multiplying the value of a single server many times. The Libraries had virtualized a number of library servers, saving money by not having to purchase additional costly physical servers. However, ITS, with its greater purchasing power, was using more current and advanced virtualization software, hardware, and services than the Libraries. ITS created a suite of services that allows system administrators access to the virtual environment so they can manage their virtual servers from their own offices. By moving into their virtual environment (IaaS), the Libraries are able to leverage the economies of scale presented by their environment. CONCLUSION The consolidation of distributed data centers or server rooms on university campuses offers many advantages to their owners and administrators, but only minimal disadvantages. The University at Albany carried out a decade-long project to design and build a state-of-the-art data center. The Libraries participated in a two-year project to migrate their servers to the new data center. This included the hire of a data center migration consulting firm, the development of a migration plan and schedule for the physical move that took place late summer 2014. The authors have found that there are many advantages to consolidating data centers, including taking advantage of economies of scale, an improved physical environment, better backup services and security systems, and more. Lessons learned from this experience include the value of participating in the process, reviewing migration schedules carefully, clarifying the costs of consolidation, contributing to the development of an SLA, and communicating all plans and developments to the Libraries’ customers, including faculty, staff, and students. As other university libraries consider the possibility of consolidating their data centers, the authors hope that this paper will provide some guidance to their efforts. DATA CENTER CONSOLIDATION AT THE UNIVERSITY OF ALBANY | MUGRIDGE AND SWEENEY | doi: 10.6017/ital.v34i4.8650 29 REFERENCES 1. “Fast Facts,” University at Albany, accessed March 31, 2015, www.albany.edu/about/about_fastfacts.php. 2. “Data Center Consolidation, IT Consolidation,” Webopedia, www.webopedia.com/TERM/D/data-center-consolidation-it-consolidation.html (accessed March 31, 2015). 3. Uptime Institute, accessed March 31, 2015, https://uptimeinstitute.com/TierCertification/. https://uptimeinstitute.com/TierCertification/ 8804 ---- Microsoft Word - June_ITAL_vacek_final.docx President’s  Message:     Making  an  Impact  in  the  Time     That  is  Given  to  Us   Rachel  Vacek     INFORMATION  TECHNOLOGIES  AND  LIBRARIES  |  JUNE  2015         3   In  an  early  chapter  in  The  Fellowship  of  the  Ring,  by  J.R.R.  Tolkien,  Frodo  laments  having  found  the   one  Ring  and  Gandalf  tries  to  console  him  by  saying,  “All  we  have  to  decide  is  what  to  do  with  the   time  that  is  given  us.”  This  is  one  of  my  favorite  quotes  in  the  Lord  of  the  Rings  series  because  it   inspires  us  to  rise  to  the  occasion  and  perform  to  the  best  of  our  abilities.  It  also  implies  that  that   we  have  a  purpose  to  fulfill  within  a  predetermined  time  period.   Although  my  term  in  office  is  three  years,  I’m  only  LITA  President  for  one  year.  To  set  a  vision  and   goals,  establish  a  sense  of  urgency,  generate  buy-­‐in,  engage  and  empower  the  membership,   implement  sustainable  changes,  and  remain  positive  and  focused  –  all  within  one  year  while   holding  a  full-­‐time  job  –  is  challenging  to  say  the  least.     I’ve  been  very  fortunate  during  my  almost  eight-­‐year  tenure  at  the  University  of  Houston  Libraries   to  participate  in  numerous  professional  development  opportunities,  lead  change,  and  make  a   difference.  Personal  and  professional  growth  has  always  been  very  important  to  me,  and  being  in   an  environment  that  encourages  me  to  become  a  better  librarian,  technologist,  manager,  and   leader  is  not  only  helpful  for  my  career,  but  also  extremely  rewarding  on  an  intellectual  level.  LITA   has  benefited  that  training.   In  today’s  library  technology  landscape,  one  of  the  many  skills  leaders  need  to  possess  is  the   ability  to  effect  change.  As  LITA  President,  I  have  put  many  changes  in  motion  and  am  happy  with   what  I  have  accomplished,  and  proud  of  our  Board  and  the  members  who  volunteer  to  lead  and   effect  change.   As  I  reflect  over  the  past  year,  it’s  fair  to  say  that  LITA,  despite  some  financial  challenges,  has  had   numerous  successes  and  remains  a  thriving  organization.  Three  areas  –  membership,  education,   and  publications  –  bring  in  the  most  revenue  for  LITA.  Of  those,  membership  is  the  largest  money   generator.  However,  membership  has  been  on  a  decline,  a  trend  that’s  been  seen  across  ALA  for   the  past  decade.  In  response,  the  Board,  committees,  interest  groups,  and  many  and  individuals   have  been  focused  on  improving  the  member  experience  to  retain  current  members  and  attract   potential  ones.  With  all  the  changes  to  the  organization  and  leadership,  LITA  is  on  the  road  to   becoming  profitable  again  and  will  remain  one  of  ALA’s  most  impactful  divisions.       Rachel  Vacek  (revacek@uh.edu)  is  LITA  President  2014-­‐15  and  Head  of  Web  Services,  University   Libraries,  University  of  Houston,  Houston,  Texas.     PRESIDENT’S  MESSAGE  |  VACEK       doi:  10.6017/ital.v34i2.8804     4   The  Board  has  taken  numerous  steps  to  stabilize  or  reverse  the  decline  in  revenues  that  has   resulted  from  a  steady  reduction  in  overall  membership.  At  ALA  Annual  2014,  the  Financial   Advisory  Committee  was  established  to  respond  to  recommendations  from  the  Financial   Strategies  Task  Force,  adjusting  the  budget  to  make  a  number  of  improvements  while  planning  for   larger,  more  substantial  changes.   In  Fall  2014  we  took  steps  to  improve  our  communications  by  establishing  the  Communications  &   Marketing  Committee  and  appointing  a  Social  Media  Manager  and  a  Blog  Manager.  The  blog  and   social  media  have  seen  a  steady  upward  trajectory  of  engagement  with  over  27,000  blog  views   since  September  2014  and  over  13,300  followers  on  Twitter.  These  efforts  help  recruit  and  retain   members,  advertise  our  online  education  and  programming,  and  increase  attendance  at   conferences.       Over  the  past  year,  nine  workshops  and  two  web  courses  were  offered,  many  of  which  sold  out   thanks  to  new  marketing  approaches.  The  Forum  remains  popular  and  has  stellar  programming   and  keynote  speakers.  Programs  and  workshops  at  ALA  conferences  are  stronger  than  ever  and   continue  to  be  well  attended.  Publications  also  remain  strong.  Although  only  three  LITA  Guides   were  published  this  year,  partially  due  to  a  change  in  publishers,  there  are  many  more  in  the   pipeline.       Finally,  the  search  for  a  new  Executive  Director  is  underway,  and  with  a  new  leader  comes  fresh   ideas  and  perspectives.  I  am  excited  about  LITA’s  future.  The  incoming  Board,  along  with  a  new   Executive  Director,  has  an  opportunity  to  make  national  and  lasting  impact  as  well  as  collaborate   with  outstanding  librarians  and  staff  in  this  division  and  across  ALA.  LITA’s  challenges  and   successes  are  shared  amongst  a  dedicated  team  of  volunteers,  and  together  we’ve  made  significant   changes.  I  believe  that  LITA  members  will  continue  to  rise  to  the  occasion  and  make  incredible   things  happen  with  “the  time  that  is  given  us.”  LITA  is  an  amazing  organization  because  of  its   members  and  their  passion  and  dedication.  I  couldn’t  be  prouder.  It  has  been  an  honor  and  a   privilege  to  serve  as  your  president.           8805 ---- Microsoft Word - June_ital_gerrity.docx Editor’s Comments Bob Gerrity     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  JUNE  2015       1       Library  Discovery  Circa  1974   Our  ongoing  project  to  digitize  back  issues  of  Information  Technology  and  Libraries  (ITAL)   and  its  predecessor,  Journal  of  Library  Automation  (JOLA),  provides  frequent  reminders  of   what’s  changed  (and  what  hasn’t)  in  library  technology  in  the  past  several  decades.  The   image  above  is  from  a  1974  advertisement  in  JOLA  for  the  “ROM  II  Book  Catalog  on  Microfilm”   from  Information  Design  in  Menlo  Park,  CA.    The  ad  copy  speaks  for  itself:   All  the  advantages  of  a  printed  book  catalog…None  of  the  disadvantages.  Your  staff  and  patrons   can  use  the  catalog  simultaneously  in  many  different  locations.  The  user  can  scan  a  number  of   related  titles  on  the  same  page,  in  contrast  to  the  one-­‐at-­‐a-­‐time  viewing  of  catalog  cards  in  trays.   Manual  filing  routines  and  maintenance  are  eliminated.   Easy  to  use…requires  no  instruction.  An  automatic  index  pointer  shows  your  patron  his  position   in  the  file.  At  the  touch  of  a  button  he  can  scan  forward  or  back  at  high  speed.  Average  look-­‐up   time  is  about  twelve  seconds.  A  staff  member  can  insert  an  updated  catalog  totally  cumulated   on  a  single  reel  of  microfilm  in  about  one  minute.  Your  patrons  never  touch  the  film—your   complete  library  catalog  “Locked-­‐in”!     Bob  Gerrity  (r.gerrity@uq.edu.au)  is  University  Librarian,  University  of  Queensland,  Australia.     EDITOR’S  COMMENTS  |  GERRITY       doi:  10.6017/ital.v34i2.8805   2   My  favorite  bit  is  the  sign  on  the  front  of  the  machine,  proudly  proclaiming:   THESE  ARE  ALL  THE  BOOKS  IN  THE  LIBRARY.   This  month’s  issue  of  ITAL  looks  at  the  current  state  of  library  discovery  from  a  number  of   angles.  Will  Owen  and  Sarah  Michalak  describe  efforts  at  UNC  Chapel  Hill  and  partners  within   the  Triangle  Research  Libraries  Network  to  enhance  the  utility  of  the  library  catalog  as  a  core   tool  for  research,  taking  advantage  of  web-­‐based  search  technologies  while  retaining  many  of   the  unique  attributes  of  the  traditional  catalog.  Joseph  Deodato  provides  a  useful  step-­‐by-­‐ step  guide  to  evaluating  web-­‐scale  discovery  services  for  libraries.  David  Nelson  and  Linda   Turney  analyze  faceted  navigation  capabilities  in  library  discovery  systems  and  offer   suggestions  for  improving  their  usefulness  and  potential.    Julia  Bauder  and  Emma  Lange   describe  a  new  approach  to  subject  searching,  using  an  interactive,  visual  approach.  Yan   Quan  Liu  and  Sarah  Briggs  report  on  the  current  state  of  mobile  services  among  the  top  100   US  university  libraries.    Unrelated  to  discovery  but  certainly  relevant  to  issues  around  library   provision  of  access  to  information,  Jill  Ellern,  Robin  Hitch,  and  Mark  Stoffan  report  on  user   authentication  policies  and  practices  at  academic  libraries  in  North  Carolina.     8919 ---- Microsoft Word - September_ITAL_Colegrove_for_proofing.docx Editorial Board Thoughts: Rise of the Innovation Commons Tod Colegrove   INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  SEPTEMBER  2015             2   That  the  practice  of  libraries  and  librarianship  is  changing  is  an  understatement.  Throughout   their  history,  libraries  have  adapted  and  evolved  to  better  meet  the  needs  of  the  communities   served.  From  content  collected  and/or  archived,  to  facilities  and  services  provided,  a   constant  throughout  has  been  the  adoption,  incorporation,  and  eventual  transition  away   from  technologies  along  the  way:  clay  tablets  and  papyrus  scrolls  giving  way  to  the  codex;  the   printing  press  and  eventual  mass  production  and  collection  of  books  yielding  to  Information   Communication  Technology  such  as  computer  workstations  and  the  Internet.  Indeed,  the   rapid  and  widespread  adoption  of  the  Internet  has  enabled  entire  topologies  of  information   to  change  –  morphing  from  ponderous  print  tomes  into  digital  databases,  effectively  escaping   the  walls  of  libraries  and  archives  altogether.1   In  reflection  of  end-­‐users’  growing  preference  for  easily  accessible  digital  materials,  libraries   have  responded  with  the  creation  of  new  spaces  and  services.  Repositioning  physical,  digital,   human,  and  social  resources  to  better  meet  the  needs  of  the  communities  supported,  the   information  commons2  that  is  the  library  begins  to  acquire  a  more  technological  edge.  The   concept  of  a  library  service  or  area  referred  to  specifically  as  an  information  commons  can  be   traced  to  as  early  as  1992  with  the  opening  of  the  Information  Arcade  at  the  University  of   Iowa  –  specifically  designed  to  provide  end-­‐users  technology  tools,  with  a  stated  mission  “to   facilitate  the  integration  of  new  technology  into  teaching,  learning,  and  research,  by   promoting  the  discovery  of  new  ways  to  access,  gather,  organize,  analyze,  manage,  create,   record,  and  transmit  information.”3   First  mentioned  in  the  literature  in  1994,  discussion  of  the  idea  itself  waited  another  five   years,  with  Donald  Beagle  writing  about  the  theoretical  underpinnings  of  “the  new  service   delivery  model”  in  1999.  Defined  as  “a  cluster  of  network  access  points  and  associated  IT   tools  situated  in  the  context  of  physical,  digital,  human,  and  social  resources  organized  in   support  of  learning.”  A  flurry  of  articles  followed,  with  the  idea  seeming  to  have  caught  the   collective  imagination  of  libraries  generally  by  2004.  Information  commons  as  named  spaces   within  libraries  made  “…  sudden,  dramatic,  and  widespread  appearance  in  academic  and   research  libraries  across  the  country  and  around  the  world.”4  Scott  Bennett  went  further,  in   2008  asking  flatly:  “who  would  today  build  or  renovate  an  academic  library  without   including  an  information  commons?”5   This  proliferation  and  transition  has  not  been  limited  to  academic  libraries;  for  decades,   libraries  of  all  type,  shape,  and  size,  have  been  similarly  provisioning  resources  and     Patrick  “Tod”  Colegrove  (pcolegrove@unr.edu),  a  member  of  the  ITAL  Editorial  Board,  is   Head  of  the  DeLaMare  Science  &  Engineering  Library  at  the  University  of  Nevada,  Reno,  NV.       EDITORIAL  BOARD  THOUGHTS:  RISE  OF  THE  INNOVATION  COMMONS  |  COLEGROVE     doi:  10.6017/ital.v34i3.8919     3   technology  in  the  context  of  end-­‐user  access  and  learning.  By  2006,  a  new  variation  of  the   information  commons  had  entered  the  vernacular:  the  learning  commons.  Defined  by  Beagle   as  the  result  of  information  commons  resources  “organized  in  collaboration  with  learning   initiatives  sponsored  by  other  academic  units,  or  aligned  with  learning  outcomes  defined   through  a  cooperative  process.”6  A  subset  of  the  broader  concept,  when  the  library   collaborates  with  stakeholders  external  to  the  library  to  collaboratively  achieve  academic   learning  outcomes,  it  becomes  operationally  a  learning  commons.  One  can  easily  conceive  of   the  learning  commons  more  broadly  by  considering  learning  outcomes  desirable  within  the   context  of  particular  library  types:  school  libraries  with  offerings  and  programs  in  alignment   with  broader  K-­‐12  curricula;  public  libraries  in  support  of  lifelong  learning  and  participatory   citizenship;  special  libraries  in  alignment  with  other  niche-­‐specific  learning  outcomes.   Note  that  not  all  information  commons  are  learning  commons.  As  defined,  the  learning   commons  depends  on  the  actions  and  involvement  of  other  units  that  establish  the  mission,   and  associated  learning  goals,  of  the  institution.  Others  must  join  with  the  library’s  effort  in   order  to  create  and  nourish  such  spaces  in  a  way  that  is  deeply  responsive  to  the  aspirations   of  the  institution:  “the  fundamental  difference  between  the  information  and  the  learning   commons  is  that  the  former  supports  the  institutional  mission  while  the  latter  enacts  it.”   (Bennett  2008,  emphasis  added)  At  a  time  when  libraries  are  undergoing  such  rapid  and   significant  transformation,  it’s  hard  to  dismiss  such  collaborative  effort  as  merely  trendy  –   such  spaces,  and  the  library  by  extension,  become  of  even  more  fundamental  relevance  to  the   broader  organization.   In  short,  resources  are  provisioned  in  the  information  commons  so  that  learning  can  happen;   collaborative  effort  with  stakeholders  beyond  the  library,  but  within  the  organization,   ensures  that  learning  does  happen.   Drawing  a  parallel,  what  if  the  library  were  to  go  beyond  simply  repositioning  resources  in   support  of  learning  –  indeed,  beyond  working  with  other  units  of  the  organization  to   collaboratively  align  and  provision  resources  in  support  of  achieving  organizational  learning   outcomes?  To  go  beyond  strategic  alignment  with  the  aspirations  of  the  institution,  involving   stakeholders  from  beyond  the  immediate  organization  in  the  creation  and  support  of  such   spaces?  Provisioning  library  spaces  and  services  that  are  deeply  responsive  to  the   aspirations  of  the  greater  community?  Arguably  this  is  where  the  relatively  recent   introduction  of  makerspaces  into  the  library  fits  in.  The  annual  environmental  scan   performed  by  the  New  Media  Consortium  (NMC)  has  for  a  number  of  years  identified   makerspaces  to  be  on  its  short-­‐term  adoption  horizon  –  the  2015  Library  Edition  goes   further,  identifying  a  core  value:     the  introduction  of  makerspaces  in  the  academic  library  is  inspiring  a  mode  of  learning  that   has  immediate  applications  in  the  real  world.  Aspiring  inventors  and  entrepreneurs  are     INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  SEPTEMBER  2015         4   taking  advantage  of  these  spaces  to  access  tools  that  help  them  make  their  dreams  into   concrete  products  that  have  marketable  value.7     Aspects  of  the  information  commons  are  present  in  library  makerspace  –  not  only  in  the   access  to  traditional  library  resources,  but  also  in  the  shift  toward  providing  support  of  21st-­‐ century  literacies  in  the  creation,  design,  and  engineering  of  output.  With  the  acquisition  and   use  of  these  literacies  in  collaboration  with  and  in  support  of  the  goals  of  the  greater   institution,  it  is  also  a  learning  commons;  for  example,  in  the  case  of  a  school  or  public  library   where  makerspace  activities  and  engagement  collaboratively  meet  and  support  learning   outcomes  including  increased  engagement  with  Science,  Technology,  Engineering,  the  Arts,   and  Math  (STEAM)  disciplines.  Consider  the  further  example  of  university  students   leveraging  makerspace  technology  as  part  of  STE(A)M  outreach  efforts  to  local  middle   schools  in  the  hope  of  kindling  interest,  or  partnering  with  the  local  Discovery  Museum  in  the   production  of  a  mini  maker-­‐faire  to  carry  that  interest  forward.  Alternatively,  a  team  of   students  conceiving,  then  prototyping  and  patenting  a  new  technology  with  the  active  and   direct  support  of  the  library  commons,  going  on  to  eventually  launch  as  a  business.  To  the   extent  the  library  can  springboard  off  the  combination  of  makerspace  with  information  or   learning  commons  to  engage  stakeholders  from  beyond  the  institution,  it  can  go  beyond  –   becoming  something  broader,  and  potentially  transformative;  even  as  it  enables  progress   toward  collaboratively  achieving  community  goals,  outcomes,  and  aspirations.   The  hallmark  of  community  engagement  with  such  library  facilities  is  a  spontaneous   innovation  that  seems  to  flow  naturally.  Library?  Information  or  learning  commons?   Arguably  such  spaces  are  more  accurately  named  innovation  commons.   Beyond  solidifying  the  library’s  place  as  a  hub  of  access,  creation,  and  engagement  across   disciplinary  and  organizational  boundaries,  the  direct  support  of  innovation  –  the  process  of   going  from  idea  to  an  actual  good  or  service  with  a  real  perceived  value    –  is  in  potential   alignment  with  the  aspirations  of  the  broader  community.  In  collaboration  with  stakeholders   from  across  the  community,  from  economic  development  and  government  representatives  to   businesses  and  private  individuals,  broader  outcomes  and  aspirations  of  the  greater   community  can  be  identified  and  supported.    Nevertheless,  simply  adding  makerspace   technology  to  an  information  or  learning  commons  does  not  automatically  create  an   innovation  commons.  It  is  in  the  broader  conversation,  along  with  the  catalyzation,   identification  of  and  support  for  the  greater  aspirations  of  the  community,  that  the  commons   begins  to  assume  its  proper  role  in  the  greater  ecosystem.  Leveraging  the  deliberate   application  of  information,  with  imagination,  and  initiative,  enabling  end-­‐users  to  go  from   idea  all  the  way  to  useful  product  or  service  is  something  that  community  stakeholders  see  as   a  tangible  value.           EDITORIAL  BOARD  THOUGHTS:  RISE  OF  THE  INNOVATION  COMMONS  |  COLEGROVE     doi:  10.6017/ital.v34i3.8919     5   The  library  as  innovation  commons  becomes  a  natural  partner  in  the  local  innovation   ecosystem,  working  collaboratively  to  achieve  community  aspirations  and  economic  impact.   Traditional  business  and  industry  reference  support  ramps  up  to  another  level,  providing   active  and  participatory  support  of  coworking,  startup  companies,  and  Etsypreneur8  alike  –   patent  searches  taking  on  an  entirely  new  light  in  support  of  innovators  using  makerspace   resources  to  rapidly  prototype  inventions.  Actualized,  the  library  joins  forces  in  a  deeper  way   with  the  community  in  the  creation  of  new  technologies,  jobs,  and  services,  taking  an  ever   more  active  role  in  building  the  futures  of  the  community  and  its  members.   REFERENCES                                                                                                                               1.    Morgan  Currie,  “What  We  Call  the  Information  Commons,”  institute  of  network  cultures   blog,  July  8,  2010,  http://networkcultures.org/blog/2010/07/08/what-­‐we-­‐call-­‐the-­‐ information-­‐commons/   2.    The  word  commons  reflects  the  shared  nature  of  a  resource  held  in  common,  such  as   grazing  lands.   3.    Robert  A.  Seal,  “Issue  Overview,”  Journal  of  Library  Administration,  50  (2010),  1-­‐6.   http://www.tandfonline.com/doi/pdf/10.1080/01930820903422248   4.    Charles  Forrest  &  Martin  Halbert,  A  field  guide  to  the  Information  Commons.  Lanham,  MD:   Scarecrow,  2009.   5.    Scott  Bennett,  “The  Information  or  the  Learning  Commons:  Which  Will  We  Have?,”  The   Journal  of  Academic  Librarianship,  34,  no.  3  (2008),  183-­‐185.   6.    Donald  Robert,  Donald  Russel  Bailey,  &  Barbara  Tierney,  The  Information  Commons   Handbook,  xviii.  New  York:  Neal  Schuman,  2006.   7.    Larry  Johnson,  Samantha  Adams  Becker,  Victoria  Estrada,  and  Alex  Freeman,  NMC  Horizon   Report:    2015  Library  Edition,  36.  Austin,  TX:  The  New  Media  Consortium,  2015.   8.    The  combination  of  Etsy,  a  peer-­‐to-­‐peer  e-­‐commerce  website  that  focuses  on  selling   handmade,  vintage,  or  unique  items,  and  entrepreneurship.  The  word  “Etsypreneur”   refers  to  someone  who  is  in  the  “Etsy  business”  –  namely,  selling  such  items  via  the   website.  http://etsypreneur.com/the-­‐hidden-­‐danger-­‐of-­‐the-­‐internet-­‐opportunity/   8966 ---- Microsoft Word - September_ITAL_dowling_final.docx President’s  Message   Thomas  Dowling     INFORMATION  TECHNOLOGIES  AND  LIBRARIES  |  SEPTEMBER  2015        doi:  10.6017/ital.v34i3.8966   1   Fall  has  arrived,  faster  than  expected  (as  it  always  does).    It  seems  like  ALA  Annual  just  wrapped   up  in  San  Francisco,  but  we're  already  well  underway  with  the  coming  year's  activities.   The  National  Forum  2015  will  be  here  before  you  know  it.    In  a  fall  season  crowded  with  good   technology  conferences,  LITA  Forum  consistently  proves  its  value  as  a  small,  engaging,  and   focused  meeting.    Technologists,  strategists,  and  front-­‐line  librarians  come  together  to  discuss  the   tools  they  make  and  use  to  provide  cutting  edge  library  services.    In  addition  to  great  LITA   programming,  this  year  we're  working  with    colleagues  from  LLAMA  (the  Library  Leadership  and   Management  Association)  to  provide  a  set  of  programs  focused  on  the  natural  cooperation  of   management  and  technologies  in  libraries.    There  are  great  preconferences  on  makerspaces  and   web  analytics,  keynote  addresses,  over  50  concurrent  sessions,  and  a  lot  of  networking   opportunities.    Click  on  over  to  litaforum.org,  and  I  hope  to  see  you  in  Minneapolis,  November  12-­‐ 15.   Not  too  long  after  Forum,  we'll  be  in  Boston  for  Midwinter,  and  then  Annual  in  Orlando.    The   Program  Planning  Committee  is  already  at  work  selecting  the  best  programs  for  Annual.  Next   summer  is  also  the  start  of  LITA’s  50th  Anniversary  celebrations!   Of  course,  not  everything  we  do  involves  travel  and  in-­‐person  meetings.    LITA’s  fall  schedule  of   webinars  includes  sessions  on  patron  privacy,  Creative  Commons,  personal  digital  archiving,  and  a   second  iteration  of  Top  Technologies  Every  Librarian  Needs  to  Know.   On  the  staff  side,  we  are  happy  to  say  that  Jenny  Levine  has  started  as  LITA’s  new  Executive   Director.    Jenny  comes  to  us  from  ALA’s  IT  and  Telecommunications  Services  department,  where   she  is  still  putting  in  some  time  bringing  a  new  version  of  ALA  Connect  online.    Jenny  and  the   Governing  Board  are  already  working  together  virtually:  we  are  about  to  select  our  Emerging   Leaders  for  the  year  and  are  working  on  an  exercise  to  set  divisional  priorities,  with  an  eye  toward   drafting  a  new  strategic  plan.    The  Board  will  hold  two  online  meetings  this  fall.    As  always,  these   are  open  meetings,  so  if  you’re  interested  in  your  association’s  governance,  you’re  welcome  to  sit   in.    Watch  the  Board’s  area  in  Connect  for  details,  or  look  for  upcoming  posts  to  LITA-­‐L  and   litablog.org.    And  if  you  need  to  contact  the  Board,  you  can  reach  us  at   http://www.ala.org/lita/about/board/contact.   I  hope  to  meet  as  many  LITA  members  as  possible  this  year,  at  one  of  the  upcoming  in-­‐person   meetings  or  online,  or  just  drop  me  a  line  on  Connect.    It’s  going  to  be  a  great  year  for  LITA.     Thomas  Dowling  (dowlintp@wfu.edu)  is  LITA  President  2015-­‐16  and  Director  of  Technologies,   Z.  Smith  Reynolds  Library,  Wake  Forest  University,  Winston-­‐Salem,  North  Carolina.     8967 ---- Microsoft Word - September_ITAL_Cyzyk_final.docx Editorial Board Thoughts: Information Technology in Libraries: Anxiety and Exhilaration Mark Cyzyk   INFORMATION  TECHNOLOGY  AND  LIBRARIES  |  SEPTEMBER  2015               6   A  few  weeks  ago  a  valued  colleague  left  our  library  to  move  his  young  family  back  home  to   Pittsburgh.    Insofar  as  we  were  a  two-­‐man  department,  I  spent  the  weeks  following  the   announcement  of  his  imminent  departure  picking  his  brain  about  various  projects,  their   codebases,  potential  rough  spots,  existing  trouble  tickets,  etc.    He  left,  and  I  immediately   inherited  nine-­‐years-­‐worth  of  projects  and  custom  code  including  all  the  "micro-­‐services"   that  feed  into  our  various  well-­‐designed,  high-­‐profile,  and  high-­‐performing  (thanks  to  him)   Websites.   This  was  all,  naturally,  anxiety-­‐producing.   Almost  immediately,  things  began  to  break.       Early  on,  a  calendar  embedded  in  a  custom  WordPress  theme  crucial  to  the  functioning  of   two  of  our  revenue-­‐generating  departments  broke.    The  external  vendor  simply  made   disappear  the  calendar  we  were  screenscraping.  Poof,  gone.    I  quickly  created  an  OK-­‐but-­‐less-­‐ than-­‐ideal  workaround  and  we  were  back  in  business,  at  least  for  the  time  being.   Then,  two  days  before  the  July  4  holiday,  our  Calendar  Managers  started  reporting  that  our   Google-­‐Calendar-­‐based  system  was  disallowing  a  change  to  "CLOSED"  for  that  Saturday.    I   somehow  forced  a  CLOSED  notification,  at  least  for  our  main  library  building,  but  no  matter   what  any  of  us  did  we  could  not  get  such  a  notification  to  show  up  for  a  few  of  our  other   facilities.    I  spent  quite  bit  of  time  studying  the  custom,  middleware  code  that  sits  between   our  Google  Calendars  and  our  Website,  and  could  see  where  the  magic  was  happening.    I  now   think  I  know  what  to  do  -­‐-­‐  and  all  I  have  to  do  is  express  it  in  that  nutty  programming   language/platform  that  the  kids  are  using  these  days,  Ruby  on  Rails.    I've  never  written  a  line   of  Ruby  in  my  life,  but  it's  now  or  never.   A  little  voice  inside  me  keeps  saying,  "You're  swimming  in  the  deep  end  now  -­‐-­‐  paddle   harder,  and  try  not  to  sink."   While  these  surprise  events  were  happening,  we  also  switched  source  code  management   systems,  so  a  migration  was  in  order  there,  my  longingly-­‐awaited  new  workstation  came  in   (and  I'm  sure  you  all  know  how  painstaking  it  is  to  migrate  all  idiosyncratic   data/apps/settings  to  a  new  workstation  and  ensure  it's  all  present,  functioning,  and  secure   before  DBAN-­‐nuking  your  old  drives),  we  decommissioned  a  central  service  that  had  been  in     Mark  Cyzyk  (mcyzyk@jhu.edu)  a  member  of  the  ITAL  Editorial  Board,  happily  works  and   ages  in  The  Sheridan  Libraries,  Johns  Hopkins  University,  Baltimore,  Maryland,  USA.     INFORMATION  TECHNOLOGY  IN  LIBRARIES:  ANXIETY  AND  EXHILARATION  |  CYZYK       doi:  10.6017/ital.v34i3.8967   7   production  since  2006,  we  fully  upgraded  our  Wordpress  Multisite  including  all  plugins  and   themes,  fixing  what  broke  in  the  upgrade,  and  I  got  into  the  groove  of  working  on  any  and  all   trouble  tickets/change  requests  that  spontaneously  appeared,  popping  up  like  mushrooms  in   the  verdant  vale  of  my  worklife.   This  was  all  largely  in  addition  to  my  own  job.   So  now  I  find  myself  surgically  removing/stitching  up  code  in  recently-­‐diseased  custom   Wordpress  themes,  adding  Ruby  code  to  a  crucial  piece  of  our  Website  infrastructure,  and   learning  as  much  as  I  can  -­‐-­‐  but  quick  -­‐-­‐  about  the  wonderful  and  incredibly  powerful   Bootstrap  framework  upon  which  most  of  our  sites  are  built.   Surely  it's  anxiety-­‐producing?    You  bet.   But  it's  thrilling  and  exhilarating  was  well.    I'm  paddling  hard,  and  so  far  my  head  remains   above  water.    Many  days,  I  just  can't  wait  to  get  to  work  and  start  paddling.       This  aging  IT  Guy  suddenly  feels  ten  years  younger!   (But  isn't  all  this  paddling  supposed  to  somehow  result  in  a  Swimmer's  Body?    Patiently   waiting...)     9098 ---- Reference is Dead, Long Live Reference: Electronic Collections in the Digital Age Heather B. Terrell INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2015 55 ABSTRACT In a literature survey on how reference collections have changed to accommodate patrons’ web- based information-seeking behaviors, one notes a marked “us vs. them” mentality—a fear that the Internet might render reference irrelevant. These anxieties are oft-noted in articles urging libraries to embrace digital and online reference sources. Why all the ambivalence? Citing existing research and literature, this essay explores myths about the supposed superiority of physical reference collections and how patrons actually use them, potential challenges associated with electronic reference collections, and how providing vital e-reference collections benefits the library as well as its patrons. INTRODUCTION Reference collections are intended to meet the immediate information needs of users. Reference librarians develop these collections with the intention of using them to answer in- depth questions and to conduct ready-reference searches on a patron’s behalf. Library users depend on reference collections to include easily navigable finding tools that assist them in locating sources that contain reliable information in a useful, accessible format and can be accessed when the information is needed. The expectation for print reference collections is that they are comprised of high-use materials—the very reason for their designation as noncirculating items is ostensibly so that materials are available for on-demand access by both patrons and staff, who use them frequently. However, librarians and patrons alike have acquired what Margaret Landesman calls “online habits,” to wit, the most-utilized access point to information is often the 24/7 web.1 In a wired world, where the information universe of the Internet is not only on our desktops, but also in our pockets and on our fashion accessories, the role of the print reference collection is less relevant in supporting information and research aims. In no other realm have the common practices of both users and librarians changed more than in how we seek information. Nevertheless, a technology-related panic seems to be at the boil, with article titles like “Are Reference Books Becoming an Endangered Species?”2 and “Off the Shelf: Is Print Reference Dead?”3 Words like “invasion” are used to describe the influx of electronic reference sources. We read about the Heather B. Terrell (hterrell@milibrary.org), a recent MLIS degree graduate from the School of Library and Information Science, San Jose State University, is winner of the 2015 LITA/Ex Libris Student Writing Award. mailto:hterrell@milibrary.org REFERENCE IS DEAD, LONG LIVE REFERENCE: ELECTRONIC COLLECTIONS IN THE DIGITAL AGE | TERRELL | doi: 10.6017/ital.v34i4.9098 56 “unsustainable luxury” of housing hundreds—sometimes thousands—of unused books on the open shelves. All this handwringing leads us to wonder why librarians in the field need this much coaxing to be cajoled into weeding their print reference collections in favor of electronic reference resources. Does this format transition really constitute such a dire situation? What if the decline of print reference usage isn’t a problem? And what’s so luxurious about dusty, overcrowded shelves full of books no one cares to use? In “The Myth of Browsing,” Barclay concludes that “the continued over-my-dead-body insistence that no books be removed [from libraries] is an unsustainable position.”4 A survey of the relevant literature reveals that staff resistant to the transition from print to electronic reference collections often share three core presumptions about reference: • Users prefer using print sources, and the importance of patrons’ ability to browse the physical collection is paramount. • The reliability of web-based reference sources may be questionable, especially when compared with the authority of print reference materials. • Access to print materials is the only option that certain users (namely, those without library cards) have for being connected to information. There also seems to be a more subtle assumption at play in the print vs. electronic reference debate—that print books are more “dignified,” cultivating a scholarly atmosphere in the library. Certain objections to removing print reference collections to closed stacks and using the newly freed public space to build a cooperative learning commons, for instance, tend to devolve into hysterics about the potentiality of libraries becoming “no better than a Starbucks.” The “no better” variable in this equation is a cosmetic one—librarians aren’t worrying about libraries serving up a flavored “information latte” for vast profit margins—they are worrying that libraries will be perceived as a place to loiter, use the Internet, and “hang out,” rather than a place for serious study. One thing for librarians who worry about this potential outcome to consider is that loyal coffee shop denizens would be up in arms to learn that their favorite shop was being closed or its services being reduced or eliminated. The implications are clear. Perhaps libraries should consider the café model: a collaborative “no-shushing” zone—the difference between a library and a coffee house being that at a library, people are able to explore, learn, and be entertained using the resources provided by the institution. At Homer Babbidge Library at the University of Connecticut, staff considered it important to “maintain a vestige of the reference collection, so that students were reminded they had entered a special place where scholarship was tangible.”5 However, users considered the underutilized stacks of books a waste of space that could be better used for cooperative work areas or INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2015 57 computer access stations. The students’ needs and interests were heeded, and Homer Babbidge Library’s Learning Commons has been a successful endeavour. Reference Collections: History and Purpose Brief points about the history of reference services lend context to the arguments presented in favor of building electronic reference collections. Grimmelman points out that “it’s almost a cliché to assert that the Internet is like a vast library, that it causes problems of information overload, or that it contains both treasures and junk in vast quantities.”6 From the earliest dedicated reference departments to the 24/7 reference model developed in response to progressing technology, Tyckoson affirms that “one thing remains constant—users still need help. The question . . . is how to provide the library’s users with the best help.”7 Browsing collections in libraries are newer than one might assume. Prior to World War II, academic library faculty could browse to find reference materials that met the information needs of students, but undergrads weren’t even allowed in the stacks.8 In public libraries, reference collections were open to users, but reference rooms were considered to be, first and foremost, the domain and workplace of the reference.9 This raises the question, what is the domain and workplace of the contemporary reference librarian? Arguably, the answer to this query is wherever the information is, for example, online. Ready reference collections arose from the need to make the most commonly used resources in the library convenient and readily available for patron use.10 The most commonly used resources in contemporary libraries are those found online—again, where the information is. Both users and librarians now turn to the web as the first resort for answering quick reference queries, and they turn to online databases and journals for exploring complex research questions. Meanwhile print works that were once used daily sit moldering, gathering dust on the shelves either because they are outdated or because no one thinks to find them when the answer is available at the swipe of a finger or the click of a mouse from where they sit, whether that’s in the library or in, ahem, a coffee shop. “The convenience, speed, and ubiquity of the Internet is making most print reference sources irrelevant,” Tyckoson says.11 Print Preference, Browsing Collections Library use is increasing—but, as Landesman and others point out, it is increasing because users want access to computers, instruction in technology, study spaces, or just a place to be that’s not home, not school, and not work. Users do not come to the library for reference sources— researchers and scholars prefer to access full-text works via their computing devices.12 The argument that users prefer print sources is antiquated, and the emphasis on building browsing collections of physical reference materials reflects a misguided notion that users crave tactile information. Landesman is blunt: “When it is a core title, users want it online.”13 REFERENCE IS DEAD, LONG LIVE REFERENCE: ELECTRONIC COLLECTIONS IN THE DIGITAL AGE | TERRELL | doi: 10.6017/ital.v34i4.9098 58 Statistics bear her assertions out. Studies show that usage of print reference collections is minimal and that users strongly prefer online access to reference materials. • At Stetson University, usage statistics gathered during the 2003–4 academic year showed that only 8.5 percent of the library’s 25,626 reference volumes were used during that period.14 • A year-long study by Public Libraries revealed that only 13 percent of Winter Park (FL) Public Library’s collection of 8,211 reference items was used.15 • When Texas A&M University converted its reference collection’s primary format from print to e-books, a dramatic increase of the e-versions of reference titles was recorded.16 • In a survey of William & Mary Law Library users, a majority of respondents indicated that they consciously select online reference sources over print, citing convenience and currency as top reasons for doing so.17 Scanning the shelves may seem to some to be the most intuitive way to search for information, but in actual practice, browsing is ineffective—books at eye level are used more often, patrons are limited to sources not being used by another patron at any given moment, and overcrowding of the shelves results in patrons overlooking useful materials.18 Browsing is the least effective way for patrons to “shop” a collection. Searching by electronic means overcomes the obstacles inherent in browsing the physical shelves when using well-designed search algorithms that employ keywords on the basis of accurate metadata. Landesman indicates that if librarians commit to educating patrons on the use of the reference databases, ebooks, and websites they offer, online reference will be “a huge win for users.”19 It should be noted: no one suggests that print reference should be eliminated entirely, at least not yet. Smaller print reference collections result in better-utilized spaces; they ensure that remaining resources in the physical collections are used more effectively—only the items that are actually high-use are included, which makes these sources easier to locate; books formerly classified as reference materials are able to circulate to those interested in their specialized content. Smaller print reference collections serve patrons in myriad ways, including freeing up funds that can be used to enhance electronic reference collections. Digital reference services are just another way of organizing information—there is no revolution here, unless it is in providing information with more efficiency—with breadth, depth, and access that surpasses what is possible via a print-only reference collection. The inevitable digital shift is a very natural evolution of patron-driven library services rather than cause for consternation on the part of library service providers. Web Reliability: Google and Wikipedia Those who argue that the reliability of online sources is questionable are typically referring to Google results and Wikipedia entries, which have little bearing on a library’s electronic collections INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2015 59 of databases and e-books, but has plenty to do with a library’s reference services: these two sources are very often used in lieu of printed fact-finding sources such as atlases, subject or biographical dictionaries, and specialized sources like Bartlett’s Familiar Quotations—which was last printed in 2012 and has recently gone digital. For questions of fact, Google is often a convenient and “reliable enough” source for most queries; authority of the results yielded by a Google search is not always detectable, and is sometimes intentionally obscured, so the librarian must vet results carefully and select the most reliable sources when providing ready reference to patrons. However, Google is far more than just its main search page. For instance, Google Scholar allows searchers to locate full-text articles as well as citations for scholarly papers and peer-reviewed journal articles. In general, there are many tools on the web, and librarians must expend effort determining how to make the best use of each. In particular, Google is better suited to some information tasks than others—it’s up to the librarian to know when to use this tool and when to eschew it. Wikipedia has been the subject of much heated debate since its inception in 2001, but in a study conducted by Nature magazine, the encyclopedias Britannica and Wikipedia were evaluated on the basis of a review of fifty pairs of articles by subject experts and found to be comparable in terms of the number of content errors—2.9 and 3.9, respectively.20 Deliberate misstatements of fact, usually in biographical entries, are cited as evidence that Wikipedia is utterly unreliable as a reference source. In fact, print sources have been plagued with the same issues. For many years, the Dictionary of American Biography contained an entry based on a hoax claiming a (nonexistent) diary of Horatio Alger—and while the entry was removed in later editions, the article was still referred to in the index for several years after its removal.21 If anything, it seems that format might provide a false sense of assurance that a source’s authority is infallible. All reference sources include bias, and all will include faulty information. The major difference between print and electronic sources is that in the digital era, using the tools of technology, these errors can be corrected quickly. What some see as declining quality of a source based on its format is simply a longstanding feature of human-produced reference works, dissociated from any print vs. web debate. Access: Collection vs. Policy Some academic and public libraries intend to decrease or discontinue purchasing print-based reference sources so funds can be diverted to build electronic reference collections; they weed print reference to make room for information commons containing technology used for accessing these electronic collections. The basic assumption in the objection to this practice is that the traditional model of in-person reference is integral to a functioning reference collection, that access to information depends on that information being printed on a physical page. Reference services are provided virtually via chat, IM, and email. Reference services are provided via the library’s website. Reference services are provided by roving librarians, REFERENCE IS DEAD, LONG LIVE REFERENCE: ELECTRONIC COLLECTIONS IN THE DIGITAL AGE | TERRELL | doi: 10.6017/ital.v34i4.9098 60 librarians engaging in one-on-one literacy sessions, and in large-group training sessions. Long gone are the days of the reference librarian who waits patiently at her station for a patron to approach with a question. Since the reference services model no longer mandatorily includes a stationary point on the library map, nor does providing quality reference depend solely on the depth and breadth of the print reference collection, how are print reference collections used? As indicated previously, about 10 percent of print reference collections are used by patrons on a regular basis. Concern for the information needs of library users who do not have library cards is well-intentioned, but the question remains: if 90 percent of a collection goes unused, even when those users without library cards have access to these materials, is the collection useful? As Stewart Bodner of NYPL says, “It is a disservice to refer a user to a difficult print resource when its electronic counterpart is a far superior product.”22 How users want to receive their information matters—access should not depend on whether a user can obtain a library card. For those libraries with high concentrations of patrons who do not qualify for library cards (e.g., individuals who do not have a fixed home address, or who cannot obtain a state-issued ID card), libraries might reconsider their policies rather than their collections. Computer-only access cards can be provided on a temporary basis for visitors and others who are unable to obtain permanent cards. San Francisco Public Library recently instituted a Welcome Card for those members of the community who cannot meet identification requirements for full library privileges. The Welcome Card allows the user full access to computers and online resources and permits the patron to check out one physical item at a time.23 When compared with purchasing, housing, and maintaining vast print reference collections, this is a significantly less costly and far more patron-centered solution to the problem of access to electronic information sources— librarians should be advocates for users, with the goal being access to knowledge, no matter its format. Conclusion: Building Better Hybrid Collections Most library professionals agree that libraries should collect both print and electronic sources for their reference collections, but the ratio of print to digital is up for debate. As more formats with improved capabilities appear, researchers find that patrons prefer those sources that provide them with the best functionality. It is essential to look to the principles on which reference services are founded. One of those principles is to build collections on the basis of user preferences. Librarians must consider what the reference collection is for and whether assumptions about patron preferences are backed by evidence. In essence, considering what “reference” means to users rather than defaulting to the status quo. A reference collection development policy must be based on what is actually used often, not on what has the potential to maybe be used sometime in the future. The library is not an archive, preserving great tomes for posterity—the collections in a library are for use. With less emphasis on print materials, librarians might focus on the wealth of sources available electronically via INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2015 61 databases and ebooks, as well as open-source, free online resources. Librarians must cultivate an understanding of the resources patrons use and the formats in which they prefer to access information. As Heintzelman and coauthors state, “A reference collection should evolve into a smaller and more efficient tool that continually adapts to the new era, merging into a symbiotic relationship with electronic resources.”24 Rules of reference that were devised when print works were the premier sources of reference information no longer apply. Reference librarians must lead the way in responding to the digital shift—creating electronic collections centered on web-based recommendations, licensed databases of journals, and ebooks—with a focus on rich, interactive, and unbiased content. Weeding reference collections of outdated and unused tomes, moving some materials to the closed stacks while allowing others to circulate, and building e-book reference collections allows libraries to provide effective reference services by cultivating collections that patrons want to use. Much of the transition from print to electronic reference collections can be accomplished by ensuring that resources are promoted to patrons and staff, that training in using these tools is provided to patrons and staff, that librarians become involved in the selection of digital collections, and that the spaces where print collections were formerly housed are used in ways the community finds valuable. One need not worry about the “invasion” of e-reference or the “death” of print reference. The two can coexist peacefully and vitally, as long as librarians maintain focus on selecting the best material for their reference collections, no matter its format. REFERENCES 1. Margaret Landesman, “Getting It Right—The Evolution of Reference Collections,” Reference Librarian 44, no. 91–92 (2005): 8. 2. Nicole Heintzelman, Courtney Moore, and Joyce Ward, “Are Reference Books Becoming an Endangered Species? Results of a Yearlong Study of Reference Book Usage at the Winter Park Public Library,” Public Libraries 47, no. 5 (2008): 60–64. 3. Sue Polanka, “Off the Shelf: Is Print Reference Dead?” Booklist 104, no. 9/10 (January 1 & 15, 2008): 127. 4. Donald A. Barclay, “The Myth of Browsing: Academic Library Space in the Age of Facebook,” American Libraries 41, no. 6–7 (2010): 52–54. 5. Scott Kennedy, “Farewell to the Reference Librarian,” Journal of Library Administration 51, no. 4 (2011): 319–25. 6. James Grimmelmann, “Information Policy for the Library of Babel,” Journal of Business & Technology Law 3 (2008): 29. REFERENCE IS DEAD, LONG LIVE REFERENCE: ELECTRONIC COLLECTIONS IN THE DIGITAL AGE | TERRELL | doi: 10.6017/ital.v34i4.9098 62 7. David A. Tyckoson, “Issues and Trends in the Management of Reference Services: A Historical Perspective,” Journal of Library Administration 51, no. 3 (2011): 259–78. 8. Donald A. Barclay, “The Myth of Browsing: Academic Library Space in the Age of Facebook,” American Libraries 41, no. 6–7 (2010): 52–54. 9. Tyckoson, “Issues and Trends in the Management of Reference Services.” 10. Carol A. Singer, “Ready Reference Collections,” Reference & User Services Quarterly 49, no. 3 (2010): 253–64. 11. Tyckoson, Issues and Trends in the Management of Reference Services,” 293. 12. Landesman, “Getting It Right,” 8. 13. Ibid., 10. 14. Jane T. Bradford, “What’s Coming Off the Shelves? A Reference Use Study Analyzing Print Reference Sources Used in a University Library,” Journal of Academic Librarianship 31, no. 6 (2005): 546–58. 15. Heintzelman, Moore, and Ward, “Are Reference Books Becoming an Endangered Species?” 16. Dennis Dillon, “E-books: The University of Texas Experience, Part 1,” Library Hi Tech 19, no. 2 (2001): 113–25. 17. Paul Hellyer, “Reference 2.0: The Future of Shrinking Print Reference Collections Seems Destined for the Web,” 13 AALL Spectrum 24–27 (March 2009). 18. Barclay, “The Myth of Browsing.” 19. Landesman, “Getting It Right.” 20. Jim Giles, “Internet Encyclopaedias Go Head to Head,” Nature 438, no. 7070 (2005): 900–901. 21. Denise Beaubien Bennett, “The Ebb and Flow of Reference Products,” Online Searcher 38, no. 4 (2014): 44–52. 22. Mirela Roncevic, “THE E-REF INVASION-Now that E-reference is Ubiquitous, Has the Confusion in the Reference Community Subsided?” Library Journal 130, no. 19 (2005): 8–16. 23. San Francisco Public Library, “Welcome Card,” sfpl.org/pdf/services/sfpl314.pdf (2014): 1–2. 24 . Heintzelman, Moore, and Ward, “Are Reference Books Becoming an Endangered Species?” http://sfpl.org/pdf/services/sfpl314.pdf INTRODUCTION 9150 ---- President’s Message Thomas Dowling INFORMATION TECHNOLOGIES AND LIBRARIES | DECEMBER 2015 doi: 10.6017/ital.v34i4.9150 1 The LITA Governing Board has had a productive autumn, and I wanted to share a few highlights. Keeping an eye on how better to understand and improve the member experience, we have a couple of new groups getting down to work. LITA Local Task Force I'm writing this shortly after returning from LITA Forum 2015, which was a fantastic meeting. I'm glad that so many people were able to attend, and I hope even more will come to Forum 2016. But we know many members cannot regularly travel to national meetings, and even the best online experience can lack the serendipitous benefits that so often come from face-to-face meetings. The new LITA Local Task Force will be responsible for creating a toolkit to facilitate local groups’ ability to host events, including information on event planning, accessibility, and ensuring an inclusive culture at meetings. So you’ll be able to host a LITA event in your own backyard! (If your backyard has a couple of meeting rooms and good wireless.) Forum Assessment and Alternatives Task Force As we begin work on LITA Local events, we are also turning our eyes to our national meeting. Planning the next LITA Forum is essentially a year-round process. We assess the work we’ve done on previous forums, of course, but the annual schedule often doesn’t afford an opportunity to strategically rethink what Forum is and how it can best serve the members. To address that issue, we’re convening another new task force, on Forum Assessment and Alternatives. This group will look critically at how Forum advances our strategic priorities, and will also look at other library technology conferences to help identify how Forum can continue to distinguish itself in a rapidly changing environment. LITA Personas Task Force Finally, as I write this, the board is in the final stages of creating a Personas Task Force as a tool for better understanding our current and potential new members. A well-constructed set of personas, representing both people who are LITA members and people who aren’t—but who could be or should be—will become a valuable tool for membership development, programming, communications, assessment, and other purposes. Each of these task forces will work throughout 2016 and deliver their results by Midwinter 2017. It is worth noting that we could only convene these groups because we have a strong list of volunteers on tap. If you haven’t filled out a LITA volunteer form recently, please considering doing so at http://www.ala.org/lita/about/committees. Thomas Dowling (dowlintp@wfu.edu) is LITA President 2015-16 and Director of Technologies, Z. Smith Reynolds Library, Wake Forest University, Winston-Salem, North Carolina. http://www.ala.org/lita/about/committees mailto:dowlintp@wfu.edu LITA Local Task Force Forum Assessment and Alternatives Task Force LITA Personas Task Force 9151 ---- Editorial Board Thoughts: Library Analytics and Patron Privacy Ken Varnum INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2015 2 Two significant trends are sweeping across the library landscape: assessment (and the corresponding collection and analysis of data) and privacy (of records of user interactions with our services). Libraries, perhaps more than any other public service organization, are strongly motivated to assess their offerings with dual aims. The first might be thought of as an altruistic goal: understanding the needs of their particular clientele and improving library services to meet those needs. The second is perhaps more existential: helping justify the value libraries create to whatever sources of funding are necessary to impress. Both are valid and important. It is hard to argue that improving services, focusing on actual needs, and maintaining funding are in any way improper goals. However, this desire is often seen as being in conflict with exploring too deeply the actions or needs of individual constituents, despite librarians’ historical and deeply-held belief that each constituent’s precise information needs should be explored and provided for through personalized, tailored services. Solid assessment cannot happen without solid data. Libraries have historically relied on qualitative surveys of their users, asking users to evaluate the quality of the services they receive. Being able to know more details and ask directed questions of individuals who used services is possible in the traditional library setting through invitations to complete surveys after individual interactions such as a reference or circulation desk interaction, library program, visit to a physical location, or even a community-wide survey invitation. Focus groups can be assembled as well, of course, once a library has identified a real-world group to study. However, those samples are more often convenience samples or—unless a library is able to successfully contact and receive responses from across the entire community— somewhat self-selected. Assessment that leads to new or improved services relies much more heavily on broad-based understanding of the users of a system. Libraries have been able to do limited quantitative studies of library usage—at its simplest, counting how many of this were checked out, how many of that was accessed, and how many users were involved. These metrics are useful, but also limited, particularly at the scale of a single library. Knowing that a pool of resources is heavily used is helpful; even knowing that a suite of resources is frequently used collectively is beneficial. However, tying use of resources to specific information needs or information seekers, whether this is defined as Ken Varnum (varnum@umich.edu), a member of the ITAL Editorial Board, is Senior Program Manager for Discovery, Delivery, and Learning Analytics at the University of Michigan Library, Ann Arbor, Michigan. mailto:varnum@umich.edu EDITORIAL BOARD THOUGHTS: LIBRARY ANALYTICS AND PATRON PRIVACY | VARNUM doi: 10.6017/ital.v34i4.9151 3 individuals or ad hoc collections of users based on situational factors such as academic level, course enrollments, etc. These more specific grouping rely on granular data that for many libraries—especially academic ones—are increasingly electronic. We are at a point in time when we have the potential to leverage wide swathes of user data. And this is where the second trend, privacy, comes to bear. Protecting user privacy has been a guiding principle of librarianship in the United States (in particular) since the 1970s, as a strong reaction to U.S. government (through the FBI) requests to provide access to circulation logs for individuals under suspicion of espionage. This was in the early days of library automation, when large libraries with automated ILS systems could prevent future disclosure through the straightforward strategy of purging transaction records as soon as the item was returned. This practice became standard operating procedure in libraries, and expanded into new information service domains as they evolved over the following forty years. With good intentions, libraries have ensured that they maintain no long-term history for most online services. As a profession, we have begun to realize that the straightforward (and arguably simplistic) approaches we have relied on for so long may no longer be appropriate or helpful. Over the past year, these conversations found focus through a project coordinated by the National Information Standards Organization thanks to a grant from the Andrew W. Mellon Foundation.1 The range of issues discussed here was far-reaching and touched on virtually every aspect of privacy and assessment imaginable. The resulting draft document, Consensus Framework to Support Patron Privacy in Digital Library and Information Systems,2 outlines 12 principles that libraries (and the information service vendors they partner with) should follow as they establish “practices and procedures to protect the digital privacy of the library user.” This new consensus framework sets a series of guidelines for us to consider as we begin to move into this uncharted (for libraries) territory. If we are to record and make use of our users’ online (and offline, for that matter) footprints to improve services, improve the user experience, and justify our value, this document gives us an outline of the issues to consider. It is time (and probably long past time) that we make conscious decisions about how we assess our online resources, in particular, and do so with a deeper knowledge of both the resources used and the people using them. At the exact moment in our technological history when we find ourselves able to provide automated services at scale to our users through the Internet and simultaneously record and analyze the intricate details of those transactions, we need to come to think clearly about what questions we have, what data we need to answer them, and be explicit about how those data points are treated. It is important that we start this process now and change our blunt practices into more strategic data collection and analysis. Where 40 years ago we opted to INFORMATION TECHNOLOGY AND LIBRARIES | DECEMBER 2015 4 bluntly enforce user privacy by deleting the data, we should now take a more nuanced approach and store and analyze data in the service of improved services and tools for our user communities. We have the opportunity, through technology and a more nuanced understanding of privacy, to conduct a protracted reference interview with our virtual users over multiple interactions… and thereby improve our services. REFERENCES 1. http://www.niso.org/topics/tl/patron_privacy/ 2. http://www.niso.org/apps/group_public/download.php/15863/NISO%20Consensus %20Principles%20Users%20Digital%20Privacy.pdf http://www.niso.org/topics/tl/patron_privacy/ http://www.niso.org/apps/group_public/download.php/15863/NISO%20Consensus%20Principles%20Users%20Digital%20Privacy.pdf http://www.niso.org/apps/group_public/download.php/15863/NISO%20Consensus%20Principles%20Users%20Digital%20Privacy.pdf